-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the documentation examples and advice for native IPv6 Docker networking #19556
Comments
Thank you @daryll-swer for raising this issue, and for providing so much detail and offering to help out! Let me loop in @robmry and @akerouanton, maintainers of libnetwork in moby. I think describing some of these concept from a network engineer's perspective sounds like a great enhancement, if we could pull that off! |
I can only agree to what @daryll-swer has brought up. Thanks! |
Yes, these sound like good changes! @dvdksn - what's the best approach, I guess Daryll can create PRs for us to review and discuss? |
While a PR would make sense for the actual documentation change. I think we should probably get on a call or something first, to discuss some ideas and concepts I have in mind about native IPv6 networking on Docker, and see if we can then, from there, make a plan of action to update the Docker IPv6 docs (through the PRs?). |
Yes, sure - let's do that. I'll try to sync with Albin, and we'll get something set up ... |
So how can the IPv6 network of the docker host communicate with the IPv6 subnet in docker? I haven't seen any documentation on this. |
I'm not sure what you mean @caliban511. |
What I mean is that the IPv6 subnet inside docker cannot communicate with the outside. |
By using DHCPv6-PD every subnet gets its own /64 The host should be able to request at least a /60 for such usage. And yes that requires that the network is setup correctly, with v6 you need to get more than a /64. |
What you are referring to is just routing. Docker's job is not network architecture and design, Docker's job is container abstraction tooling and deployment. So it's not a surprise that they didn't document how to design an IPv6-native network and how to route a prefix to the Docker host. That being said, I am talking to Docker net dev team via emails and I did suggest they add some docs on routing basic for IPv6. This is my copy/paste from the email I sent them:
This problem is more generalised and not limited to Docker situation, I don't know why but network engineering knowledge seems to be considered obsolete/unnecessary by many people in this industry to the point that everyone just knows “NAT” [as they don't know how to route from edge router all the way down to the host using eBGP driven design (for example)] and overlapping RFC1918 ranges (as they do not even know how to subnet).
@NiKiZe for Docker-specific-only host, you don't really need more than a single /64 routed to that host, because you will anyways, just create a Linux bridge and slap a /64 on that bridge and all containers then reside behind that bridge and get a /128 out of the /64 routed address space. I've never found a use-case for Docker networking to go beyond one bridge or beyond a single cleanly routed /64. Docker Swarm is something I've not used, but I suppose there may be a use case for more than one /64 there. |
Thank you for your patience I am not a network engineer, but I am just an ordinary Docker user. I try to create an IPv6 subnet for Docker according to some online tutorials, but it is always impossible to pass when the IPv6 test connected to the container connected to the IPv6 subnet. What can I pass the test? Is 2400: 7060 a prefix for your public network? Except for directly using the public network prefix to build a subnet, can there be no other way to connect the container to the IPv6 public network? Because our IPv6 public network prefix will be changed in a few days as a cycle. Using this method means that you need to manually delete and rebuild the IPv6 subnet every few days, This is obviously not the right way ,,, |
That is exactly the right way, you have a prefix from your ISP that you route internally, no more private IP series, and no more NAT. The issue here is that many have learnt a IPv4 NAT mindset, which is not the right way. |
2400:7060::/32 is a Global Unicast Address block that I own. If I were to use this block in a production, I would subnet it according to my architecture model here. Internet<>Edge router<>Layer 3 spine<>layer 3 leaf<>server node
IPv6 prefixes are NOT supposed to "changed in a few days as a cycle" - This is not a Docker problem, this is a network problem. Your network provider failed to adopt IPv6 standards and best practices. Ask them to deploy BCOP-690 compliant IPv6 in addition to asking them to read my IPv6 guide. If they do not comply with BCOP-690, you'll forever have broken IPv6. |
He's talking about lack of BCOP-690 compliance, forcing the imposition of NAT66 upon the user by the network operator who's providing him IPv6 connectivity. |
Thank you, It seems that I can't use IPv6 for the container of Docker for the time being, that's it, thank you again. |
Thank you, |
Hostile “expert” network providers are typically the norm in this industry, unfortunately. From Tier 1 cloud providers to small networks. A lot of app developers or end-users have complained about hostile network team for a very long time. I'm of the opinion that software engineers, network engineers and app devs should work together in an organisation with a Venn diagram intersection — But sadly, that's wishful thinking. |
Important topic. In my opinion, IPv6 is not actually that complex like people think. It is IPv4/IPv6 dual-stack which causes all the problems. That why I would start tackle this issue by adding support to create Docker networks where IPv4 is disabled.
With default settings Swarm reserves 10.0.0.0/8 and takes /24 slices from that for overlay networks. However, service discovery inside of Swarm is DNS based so IP addresses does not matter that much. So instead of having just one subnet, each Swarm scoped network would have as many subnets than there is nodes. That way there needs to be always just one /64 route to each host.
There is also GoBGP which might be easier integrate with Docker. |
Unless I'm missing something, Docker IPv6 is default NAT/PAT-disabled, is it not? So that's a non-issue for starters anyway. I have Docker running right now with native IPv6, no NAT/PAT for v6. Not sure what you mean by dual stack causing problems, dual stack is independent of each other by default anyway.
I never worked with K8s. In theory, I think, regarding Docker swarm, you could simplify this further by making use of VXLAN/EVPN, whereby all swarm worker nodes' “Docker bridge” are members of the same layer 2 domain across all nodes. On a layer 3 basis, this would mean the single /64 is anycast routing, however this would mean, that the nodes need hypercube interconnection in order for the layer 3 routing to be natively routed between the hosts based on the routing table (FRR or GoBGP etc). A single /64 can hold billions of containers, so if we had 9000 containers across 100 nodes, it's not an issue as each unique container has non-overlapping addresses anyway, and you could always advertise the /128 unique IP addresses over BGP back to the layer 3 leaf (ToR) switch - Or you could avoid hypercube as well this way, as each unique container on each unique host is advertised with the /128 and more specific address wins in the routing table, so single /64 for all nodes. However, this is just my understanding of swarm, and it needs labbing to actually verify this concept works. |
I mean that as long you have IPv4 included (dual stack) then you need choose between:
That why I see that it would be much simpler to have pure IPv6 only configuration option. In additionally, based on documentation experimental setting
But internally Docker still adds IPv4 addresses to all those networks and each container have also IPv4 address right?
When it comes to Kubernetes world. I found that there was same limitation in k3s and k0s and fixed those with k3s-io/k3s#4450 and k0sproject/k0s#1292 so now it is possible to run on both of those systems in way that there is no any IPv4 address at all on host or any of the containers. That is what I call for native IPv6. EDIT: Just remembered that I actually tried implement IPv6 only mode for Docker two years ago but failed because too many place in code expected IPv4 addresses to be available. Old draft is available in (not sure if useful) https://github.com/olljanat/moby/commits/disable-ipv4/ But might be different story now because there have been some refactoring in libnetwork. EDIT2: Made rebased + updated version of that which looks to be working quite well (still needs some work to make sure that it does not break IPv4 use case). Can be found from olljanat/moby@4534a75 and commit message contains examples how to use locally and with Swarm. |
I disagree on point 1 — How are they “inconsistent”? They are completely independent protocols and therefore in the case of legacy exhausted IPv4, there's NAT, and v6, there's none, that is consistent. On security, v4 may be NATted, v6 should just have accept established, related, untracked (I explained why we should NoTrack control plane and MGMT traffic and BUM traffic here), accept ICMPv6 and finally accept only “exposed” port, say 80/443 or whatever service you're running. Problem solved. I would strongly note that NAT is not and never was a security feature:
I agree that IPv6-only mode should be a feature for both Docker and Docker Swarm. However, you cannot force everyone to use IPv6-only mode, as IPv4 is still part of the internet ecosystem whether we like it or not. Please remind yourself, I'm a public IPv6 advocate and wrote a detailed IPv6 guide from a network architectural standpoint, I love IPv6, but I'm not delusional about an IPv6-only world in 2024.
This is/was part of my original plan to correct the wordings/naming/concepts in the docs. I agree “port mapping” nonsense shouldn't be part of ANY IPv6 doc/talk/implementation at all.
Like I said earlier:
Can we do load-balancing with native-only IPv6 without any Destination-NAT/NAT66 on K8s implementations? I'm interested to see some docs on this or examples if you have any. On the underlay network, I would of course have BGP peering with each hosts and make good use of BGP multipathing and BGP link bandwidth community.
I was (is?) communicating with the Docker net dev team via email, and I was supposed to get an invitation to their IPv6-specific meeting(s) and discuss this further, however I haven't yet received the invite. |
Ah, now I got what you mean. That is good idea and I would like to have support for that in IPv4 too, especially when NAT is not used and there is direct routes between hosts. That would provide consistent
Agree that it was not supposed to be security feature but technically it some places is and it is used like that by many. In Docker it actually depends on which network driver is used. When However, when you are using overlay driver, it is not possible to directly communicate with those containers expect through ports which are published (PAT).
I said that adding support for IPv6 would be one way to start solving this issue, not that we would force everyone using only that configuration. Also if NAT is needed then I prefer to have some network device doing IPv4 -> IPv6 NAT instead of configuring of all them with dual stack but that is just my preference. However, your idea is better, if
You can at least get very near by of that with this Calico feature and having
I mean all those libnetwork pull requests in https://github.com/moby/moby/pulls?q=is%3Apr+libnetwork which started from moby/moby#42262 and after that there have been a lot of refactoring done. |
Exactly, yes.
Yes and yes, but IPv4 is scare and nobody will really route a /24 or whatever, to each node, they will just NAT IPv4 and route a /64 (or larger) to the host and put it on the bridge — This is what I do anyway. I would suggest avoiding using the term “PAT” as it confuses many people, even though yes I know, NAT and PAT are two different things, but in today's world they've come synonymous in terms of actual config on the system. I haven't seen stateless NAT (real NAT) anywhere for IPv4 in production, short of NPTv6 for Provider Aggregateble address space.
Do you have access to a network infrastructure that's IPv6-native and you have control over the entire network? I would like to potentially work with you and run a Docker/Docker Swarm lab to test out some ways to acheieve IPv6-native-only load balancing etc. It would require the underlay network to support VXLAN/EVPN and for at least two physical server nodes to be plain bare-metal so that we could run FRR (or GoBGP or whatever) between the node and the layer 3 ToR switch (leaf). We could also play with hyerpcube network topology and run BGP between the nodes directly. I currently no longer work in DC networks and work full-time in SP networks, so I do not have access to a DC env.
Yes, you can create 1:1 iptables rules for v4/v6. Accept established, related untracked, ICMPv4/v6 and finally “exposed” port 80/443 1:1 on both tables. This removes the false NAT-as-a-security-service model completely from Docker paradigm. However, this requires Docker net devs to come together and work on such a re-work of the underlying code base.
We are going off-topic here, but I'd like to discuss more on K8s IPv6-native-only load balancing using eBGP driven networking (underlay network + host + network topology design), can you please email me at contact@daryllswer.com? |
I have used AS Per Rack model. Anyway, I got inspiration from this discussion to try build similar solution with Docker and it is now available in https://github.com/olljanat/docker-bgp-lb and I just added also IPv6 support tracking issue to there. Feel free to try it and provide feedback there so we don't go too much off-topic in here. |
The problem with that Meta+Arista RFC is that it complicates route filters and potential loop avoidance issues at scale, it leads down the horrible path of eBGP over eBGP and iBGP over eBGP, it's further explained here (also read comments).
How does this differ from simply using FRR for advertising the ranges to the upstream router or layer 3 leaf/ToR switch? And is your plugin for Docker Swarm mode or something else? Needs better readme with network topology example. |
It appears, Docker v27.x.x, including their IPv6-specific documentation have been updated and improved upon. Happy to see IPv6-native fixes and changes for Docker. I hope it keeps getting better and better over time, as more and more IPv6-specific features/sub-protocols gets introduced into the network world. |
Just a quick update to let everyone in the community know, that I am collaborating with Docker Inc.'s team on writing the networking (IPv6) documentation that will cover some key implementation details. However, due to a busy schedule with my consulting work, it may take some time before I can fully focus on this. It will happen, and I will do my best to make it happen. I’ll keep you all posted as things progress—thanks for your patience! |
Is this a docs issue?
Type of issue
Other
Description
Few issues:
Location
https://docs.docker.com/config/daemon/ipv6/
Suggestion
For point 1:
We simply replace the “2001:0DB8::/112” string with “2001:db8::/64”
The idea for a minimum /64 came from the fact that IPv6 networking was to be based on prefix-length, and not “number of addresses” as the address space is 128-Bits. This was reflected in the original SLAAC specifications, in addition to additional operational information on BCOP-690. We should not be promoting archaic IPv4-centric mentality in native IPv6 networking.
For point 2:
Firstly, the alphabets in an IPv6 address is lower-case always, secondly we always remove all leading zeros in the compressed IPv6 notation format, meaning in effect:
Please refer to section 4 of RFC5952
For point 3, 4 & 5:
I am willing to help improve this aspect of the Docker IPv6 documentation by integrating network engineering perspective and operational insights directly into the Docker docs.
The basic idea of native IPv6 networking is: No NAT66/NPTv6 or ULA.
I'm of course aware of poorly implemented IPv6 in popular cloud providers/IaaS companies, whereby the user is forced to rely on ULA/NAT66 or some hacks with NDP Proxy or MACVLAN, but of course, this is not a valid reason to push for ULA/NAT66/NPTv6 from the official documentation of Docker.
I authored an extensive native IPv6 best practices guide below, that folks may want to give a read on to get fully thorough information that simply cannot be reproduced in a tiny GitHub issue:
https://blog.apnic.net/2023/04/04/ipv6-architecture-and-subnetting-guide-for-network-engineers-and-operators/
I've written extensively on various topics of network engineering, in particular IPv6. I'm personally willing to help improve the Docker docs to push for native IPv6 networking using some realistic examples. I'm not sure how the Docker docs writing/improvement process is handled, but if I could get in a direct discussion with the relevant folks, it would be much appreciated.
Tasks
The text was updated successfully, but these errors were encountered: