Skip to content

Release IPv6 address if IPv6 is disabled on an interface#48971

Merged
robmry merged 5 commits intomoby:masterfrom
robmry:ipv6_disabled_on_interface
Sep 16, 2025
Merged

Release IPv6 address if IPv6 is disabled on an interface#48971
robmry merged 5 commits intomoby:masterfrom
robmry:ipv6_disabled_on_interface

Conversation

@robmry
Copy link
Contributor

@robmry robmry commented Nov 27, 2024

- What I did

When running:

  docker network create --ipv6 b46
  docker run --rm -ti \
    --network name=b46,driver-opt=com.docker.network.endpoint.sysctls=net.ipv6.conf.IFNAME.disable_ipv6=1 \
     busybox

IPv6 is enabled in the container and the network, so an IPv6 address will be allocated for the endpoint.

But, when the sysctl is applied, the IPv6 address will be removed from the interface ... so, no unsolicited neighbour advertisement should be (or can be) sent and, the endpoint should not be treated as dual-stack when selecting a gateway endpoint and, if it is selected as the gateway endpoint, setting up an IPv6 route via the network will fail.

In 27.x (the release that added per-endpoint sysctls), the error is ...

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: failed to set IPv6 gateway while updating gateway: route for the gateway fd4b:b3cd:ce0c::1 could not be found: network is unreachable: unknown.

In 28.0 (because the SetKey hook isn't used any more, in this case), it's ...

docker: Error response from daemon: failed to set up container networking: failed to add interface veth3368cd8 to sandbox: failed to advertise addresses: write ip ::1->ff02::1: sendmsg: invalid argument

- How I did it

See the individual commit messages ... refactoring, with the bug fix in the final commit.

The refactoring means Sandbox and Endpoint config happens in the same order, whether or not the OS Sandbox has been created when the Endpoint is added.

- How to verify it

Exiting and new integration tests.

- Description for the changelog

- Fix an issue preventing container startup or selection of its network gateway when IPv6 is only disabled on a specific interface.

@thaJeztah
Copy link
Member

@robmry was this accidentally closed or superseded by the other PRs?

@robmry
Copy link
Contributor Author

robmry commented Jan 10, 2025

@robmry was this accidentally closed or superseded by the other PRs?

Oh - thank you! How do you manage to spot this stuff?!

I thought it was superseded by the unsolicited ARPs PR, but you prompted me to double-check - the last commit here is just stacked on that other change. I've forgotten where I got to with this one, but it probably needs resurrecting.

@robmry robmry reopened this Jan 10, 2025
@robmry robmry force-pushed the ipv6_disabled_on_interface branch 2 times, most recently from baa0db9 to 6d1c268 Compare January 10, 2025 19:43
@thaJeztah
Copy link
Member

How do you manage to spot this stuff?!

LOL, any activity gets in my notifications inbox; I tend to check up quickly on all of them, but also look for "this looks non-standard" things (like closing a PR); I didn't see a comment "why" it was closed, but I've had it myself that I accidentally pushed to the wrong PR branch, and GitHub automatically closing a PR, so thought "let me check!" ❤️

@robmry robmry force-pushed the ipv6_disabled_on_interface branch 7 times, most recently from 5903609 to d88aa44 Compare January 16, 2025 16:58
@robmry robmry removed this from the 28.0.0 milestone Jan 16, 2025
@robmry
Copy link
Contributor Author

robmry commented Jan 16, 2025

As this was already broken in 27.x, it fails with an error, and it's a corner case (disabling IPv6 on a single endpoint in an IPv6 network, rather than disabling it in the container) ... I've dropped the 28.0 milestone.

It's working, but I'll give it a check over before marking it as ready for review.

@robmry robmry force-pushed the ipv6_disabled_on_interface branch from d88aa44 to c5da96d Compare January 28, 2025 19:18
@robmry robmry added this to the 29.0.0 milestone Jan 30, 2025
@robmry robmry requested a review from akerouanton January 30, 2025 17:02
@robmry robmry marked this pull request as ready for review February 4, 2025 09:30
@robmry robmry force-pushed the ipv6_disabled_on_interface branch from c5da96d to 7440d06 Compare June 12, 2025 11:22
@thompson-shaun thompson-shaun modified the milestones: 29.0.0, 29.1.0 Jul 10, 2025
@robmry robmry force-pushed the ipv6_disabled_on_interface branch from 7440d06 to 634c7fa Compare September 3, 2025 14:36
@robmry robmry marked this pull request as draft September 3, 2025 14:37
@robmry robmry added kind/bugfix PR's that fix bugs kind/refactor PR's that refactor, or clean-up code labels Sep 3, 2025
@robmry robmry force-pushed the ipv6_disabled_on_interface branch 3 times, most recently from 1f84a50 to 082d73e Compare September 3, 2025 18:22
@robmry robmry marked this pull request as ready for review September 3, 2025 18:27
@robmry robmry requested a review from corhere September 3, 2025 18:30
Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM aside from the review comments.

Comment on lines +205 to +206
if eps := sb.Endpoints(); len(eps) > 0 {
sb.finishEndpointConfig(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOCTTOU: sb.Endpoints() is called here, then immediately called again in finishEndpointConfig. Use the same copy of the endpoints slice for both the test and the loop. Either pass the slice of endpoints into sb.finishEndpointConfig, or move the len(eps) > 0 check into finishEndpointConfig and call it unconditionally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved it into finishEndpointConfig to avoid creating the extra slice. Although, now sb.joinLeaveMu is held at this point, the set of endpoints won't be modified by anything else.

Comment on lines 201 to 207
// If the Sandbox already has endpoints it's because sbJoin has been called for
// them - but configuration of addresses/routes (and so on) didn't complete because
// there was nowhere for it to go before the osSbox was set up. So, finish that
// configuration now.
if eps := sb.Endpoints(); len(eps) > 0 {
sb.finishEndpointConfig(ctx)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is populating network resources always an idempotent operation? Otherwise we might have a problem. If (*Endpoint).sbJoin is called concurrently with SetKey, the joined endpoint could be configured twice. Since (*Sandbox).canPopulateNetworkResources() returns true as soon as sb.osSbox != nil, the sbJoin may populate the network resources before SetKey returns, which would result in the resources getting populated twice in a row if the sbJoin call adds the endpoint to the store before SetKey calls finishEndpointConfig.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetKey now acquires sb.joinLeaveMu before updating sb.osSbox to make sure the list of endpoints is stable.

return nil
}

func (d *driver) ReleaseIPv6(ctx context.Context, nid, eid string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it would lead to buggy behaviour if a driver which intended to implement IPv6Releaser was to no longer implement it, please add a compile-time assertion that the bridge driver implements the interface.

var _ driverapi.IPv6Releaser = (*driver)(nil)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@robmry robmry force-pushed the ipv6_disabled_on_interface branch from 082d73e to ad33bc4 Compare September 15, 2025 09:15
When the SetKey hook is used (by a build container) it's called after
Endpoint.sbJoin, which will have called Sandbox.populateNetworkResources
to set up address, routes, sysctls and so on - but it's not able to do
any config until the osSbox exists. So, Sandbox.populateNetworkResources
is called again by SetKey to finish that config.

But, that means the rest of Endpoint.sbJoin has already happened before
the osSbox existed - it will have configured DNS, /etc/hosts, gateways
and so on before anything was set up for the OS.

So, if the osSbox configuration isn't applied as expected (for example,
a sysctl disables IPv6 on the endpoint), that sbJoin configuration is
incorrect.

To avoid unnecessary config+cleanup in thoses cases - delay the config
currently done by sbJoin until the osSbox exists.

Signed-off-by: Rob Murray <[email protected]>
DNS is set up when the endpoint is joined to a network.

It was added in commit 4850c5f (Avoid duplicate entries in
/etc/hosts) then simplified in bcca214 (libnetwork: open-code
updating svc records) and seems to be related to setting up a
name on a swarm node that isn't running the container with the
endpoint.

But, all callers of Network.createEndpoint follow up with an
Endpoint.Join, which also sets up the DNS entry. Those callers are:
  Network.createLoadBalancerSandbox
  Network.CreateEndpoint
    - called by Daemon.connectToNetwork
    - called by Sandbox.setupDefaultGateway
    - called by builder-net/executor.go: iface.init

None of them bail out before the Join for a Swarm case.

So, it looks like enough has changed that the createEndpoint code is
no longer needed (it predates the internal DNS server) ... remove it.

Signed-off-by: Rob Murray <[email protected]>
And move the Endpoint.populateNetworkResources code into the
all-platforms part of the Sandbox method.

Signed-off-by: Rob Murray <[email protected]>
When running:
  docker network create --ipv6 b46
  docker run --rm -ti \
    --network name=b46,driver-opt=com.docker.network.endpoint.sysctls=net.ipv6.conf.IFNAME.disable_ipv6=1 \
     busybox

IPv6 is enabled in the container and the network, so an IPv6 address
will be allocated for the endpoint.

But, when the sysctl is applied, the IPv6 address will be removed
from the interface ... so, no unsolicited neighbour advertisement
should be (or can be) sent and, the endpoint should not be treated
as dual-stack when selecting a gateway endpoint and, if it is
selected as the gateway endpoint, setting up an IPv6 route via the
network will fail.

So, if the IPv6 address disappears after sysctls have been applied,
release the address and remove it from the endpoint's config.

Signed-off-by: Rob Murray <[email protected]>
@robmry robmry force-pushed the ipv6_disabled_on_interface branch from ad33bc4 to 2bb0443 Compare September 15, 2025 09:39
@robmry robmry requested a review from corhere September 15, 2025 09:41
Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@akerouanton akerouanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor nit but feel free to ignore it. LGTM otherwise.

Comment on lines +563 to 566
// Current endpoint(s) providing external connectivity for the Sandbox.
// If ep is selected as a gateway endpoint once it's been added to the Sandbox,
// these are the endpoints that need to be un-gateway'd.
gwepBefore4, gwepBefore6 := sb.getGatewayEndpoint()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be moved inside of the condition below. We can also consider swapping the condition - call storeEndpoint and early return when !sb.canPopulateNetworkResources().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case where the old gateways are needed, they have to be looked up before sb.addEndpoint(ep) - because adding the endpoint might change the gateway selection. And, addEndpoint needs to be called when they're not needed.

So, it could be guarded with sb.canPopulateNetworkResources() ... but I'm not sure it's worth updating - in the common case, "canPopulate" is true, and in the buildkit case there's probably only one gateway endpoint to select from anyway (so getGatewayEndpoint is cheap).

I remember dithering about whether to just store the endpoint and early return - but decided to keep the store at the end, just in case the endpoint is updated by the populate/update calls, and those updates don't make it into the store.

@robmry robmry merged commit b0226d5 into moby:master Sep 16, 2025
179 checks passed
@robmry robmry deleted the ipv6_disabled_on_interface branch September 16, 2025 16:54
@thompson-shaun thompson-shaun modified the milestones: 29.1.0, 29.0.0 Nov 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants