-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve load balancing scalability in swarm mode #2138
Conversation
Looks like CI failed |
Sorry, forgot to run golint. Adjusted for golint errors and re-pushed. |
Codecov Report
@@ Coverage Diff @@
## master #2138 +/- ##
=========================================
Coverage ? 40.46%
=========================================
Files ? 139
Lines ? 22548
Branches ? 0
=========================================
Hits ? 9123
Misses ? 12096
Partials ? 1329
Continue to review full report at Codecov.
|
if n.hasLoadBalancerEndpoint() { | ||
emptyCount = 1 | ||
} | ||
if !force && n.getEpCnt().EndpointCnt() > emptyCount { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why do you have emptyCount variable. Anyways you are using only one place. instead , you can directly use n.hasLoadBalancerEndpoint().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is attempting to determine whether there are any *container& endpoints left in the network. If there are, then it can not delete the network. The code does this by querying n.getEpCnt().EndpointCnt()
to get the current number of endpoints connected to the network. If the network has a load balancer endpoint, then a count of 1 indicates that there are no container endpoints in the network. If there is no load balancer endpoint, then only a count of 0 indicates that there are no more container endpoints.
return err | ||
} | ||
// continue deletion when force is true even on error | ||
logrus.Warnf("Error deleting load balancer sandbox: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we used warnf but in the message it says Error. Is it just warning or error message ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I was just keeping it consistent with the previous behavior which put it at the warning level. Presumably this behavior was because failure to remove may be due to the fact that it already got inadvertantly removed. In this case, the error is not fatal to the system since the network is being removed anyway. (see the comment above the warning). If this is the rationale, then warning level seems appropriate. The error message was, I think, the same, but we could change the wording in principle.
service_linux.go
Outdated
if sb.ingress { | ||
var gwIP net.IP | ||
if ep := sb.getGatewayEndpoint(); ep != nil { | ||
gwIP = ep.Iface().Address().IP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was told this is old style C way of coding ..Go they generally use gwIP := ep.Iface().Address().IP instead of declaring variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, except for the fact that the gwIP address needs to live outside of the scope of the if-statement and should contain its default value if sb.getGatewayEndpoint()
returns nil
. It's something akin to writing:
var x int = 0
if something {
x = 1
}
// Do something with 'x'
Except that golint complains if you initialize a newly declared variable to the default value. In C it would look much nicer of course. :)
int x = something ? 1 : 0;
one genreal comment : There are many new functions and changes , But I dont see any unit test. Is it possible to add unit test? |
Perhaps there is a way to do unit tests for a feature like this, but I'm not clear as to how myself. To really test it we need containers, namespaces, etc.. (i.e. what moby does) We currently don't have any unit tests that validate addServiceBinding(). We could create tests that would validate things like the service binding data structures themselves by stubbing out the backend code and that could be worthwhile. But that won't validate this patchset because all the changes for this set are in the backend code. But if you have some ideas about how we could approach this, please do suggest it. Otherwise it will likely have to be done in integration tests w/ Moby. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
preliminary set of comments, I put them on single commits so it's possible that some of them do not apply on the latest patch. Still reviewing the second last commit though
network.go
Outdated
} | ||
|
||
// Check that the network is empty | ||
var emptyCount uint64 = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no real need to set it to 0, in go variables are already initialized to 0 on creation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, will fix.
network.go
Outdated
func (n *network) lbSandboxName() string { | ||
name := n.name + "-sbox" | ||
if n.ingress { | ||
name = "lb-" + n.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will be a name change for the ingress endpoint, will this create any issue on upgrade now that the name change? will the delete work properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh! That's a logic flip. ingress was supposed to remain the same, while new endpoint sandboxes were supposed to change. Will fix.
service_linux.go
Outdated
if !ok { | ||
return nil, nil, fmt.Errorf("Unable to get sandbox for %s(%s) in for %s", ep.Name(), ep.ID(), n.ID()) | ||
} | ||
// XXX really? really? ugh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should remove this :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rats. Forgot to get rid of that!
Per review w/ @fcrisciani, updating this to remove the backward-compatible code. |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok completed a full pass, will have to take a look deeper to a couple of things but happy to discuss the current comments
default_gateway.go
Outdated
if len(sb.containerID) <= gwEPlen { | ||
gwName = "gateway_" + sb.containerID | ||
} else { | ||
gwName = "gateway_" + sb.id[0:gwEPlen] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, you can omit the 0
network.go
Outdated
// a load balancing endpoint that libnetwork will not automatically | ||
// remove when the remaining endpoints leave the network. So, to remove | ||
// an ingress network requires a special function declaring said intent. | ||
DeleteIngress() error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't love this one, an ingress network is itself a network. but did not arrive yet to the code that will actually uses it. Is it possible to consider adding a flag to the Delete that is like releaseEndpoints
?
If possible I would like to keep the interface clean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is a choice of ugliness. We can either change the libnetwork API in a backwards incompatible way by changing the Delete() signature or we can add a new method. I was opting for the latter. Will change the signature to make it more semantically clear.
network.go
Outdated
// If we got to this point, then the following must hold: | ||
// * force is true, OR | ||
// * endpoint count == 1 AND: | ||
// * n.ingress is false OR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does not seem that n.ingress play a role at this point:
ingress || !ingress
is always true. Do you want to keep it around in the comment for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder at this point if we want to move the logic at line 979 here and use the n.Ingress
or keep it as is and leave a comment saying that the ingress deletion is already avoided above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that the comment is confusing and superfluous. Will simplify. Agree that we don't need to mention ingress endpoint deletion: it's explained above.
} | ||
|
||
func (n *network) lbEndpointName() string { | ||
return n.name + "-endpoint" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the default gw we put a restriction on the max string length, do we need something similar for the lb endpoints or sandbox names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason we restrict the length of the names for gateway sandboxes to 12 characters plus the prefix "gateway_". I'm not sure why we do this for sandboxes only but there doesn't seem to be such a restriction for endpoints, so this should be safe.
network.go
Outdated
epOptions := []EndpointOption{ | ||
CreateOptionIpam(n.loadBalancerIP, nil, nil, nil), | ||
CreateOptionLoadBalancer(), | ||
} | ||
if n.hasLoadBalancerEndpoint() && !n.ingress { | ||
epOptions = append(epOptions, CreateOptionAnonymous()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess worth spending a comment saying that Anonymous is to avoid it to be resolvable from the DNS
if !ok { | ||
return nil, nil, fmt.Errorf("Unable to get sandbox for %s(%s) in for %s", ep.Name(), ep.ID(), n.ID()) | ||
} | ||
ep = sb.getEndpoint(ep.ID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aaarghh, every time that we need a logic like this, one part of me dies....
service_linux.go
Outdated
eIP = ep.Iface().Address() | ||
if epi.LoadBalancer() { | ||
var err error | ||
ep, err = n.getEndpointFromStore(e.ID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e is already coming from store, can we use that already?
service_linux.go
Outdated
} | ||
return "" | ||
} | ||
|
||
// Add loadbalancer backend to all sandboxes which has a connection to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update comment, not all
sandboxes
service_linux.go
Outdated
|
||
// Add loadbalancer backend into one connected sandbox. | ||
func (sb *sandbox) addLBBackend(ip, vip net.IP, fwMark uint32, ingressPorts []*PortConfig, eIP *net.IPNet, gwIP net.IP, isIngressNetwork bool) { | ||
func (n *network) addLBBackend(ip net.IP, lb *loadBalancer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a not for myself, maybe I miss it but if the wiring of this method is the same as before, are we calling it for each container? if so the findLBEndpointSandbox is actually pretty expensive in term of loops that is doing for the same disconnection due to the existence of the store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The merge made some of the changes here a bit confusing because of similarly named methods for both sandbox and network structures. This function does indeed only add a single LB entry per network. (Once the final patch is applied: if one is only looking at the commit that refactors the API then one will still see the "walk endpoints" loop as that commit doesn't change the loadbalancing yet.)
// network. If 'rmService' is true, then remove the service entry as well. | ||
// If 'fullRemove' is true then completely remove the entry, otherwise | ||
// just deweight it for now. | ||
func (n *network) rmLBBackend(ip net.IP, lb *loadBalancer, rmService bool, fullRemove bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for this method, are we calling it for each container?
Pushed an update to address comments and fix new bugs found w/ better e2e tests. |
service_linux.go
Outdated
return | ||
} | ||
if n.ingress && !sb.ingress { | ||
logrus.Errorf("addLBBackend %s/%s: unable to find LB OS sandbox", n.ID(), n.Name()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the past I remember these 2 to be very frequent, is it still the case? if this is not a fatal condition I would not use Error.
Strangely, github refuses to show this comment in my web browser. FWIW, I
have specifically looked for these returns during automated testing and
haven't seen it trigger yet. A lack of load balancer sandbox will
definitely be an error when adding a binding because we are now locking the
network ID (see 2 commits prior) and testing for the presence of the
network before attempting to add load balancers. The only way that sandbox
should be gone is if the network is being deleted. But that also takes the
same network lock, so the two should never happen together.
As far as the osSbox condition, I don't mind downgrading that ... It
should probably be at least warning level since it means that the load
balancing doesn't get programmed and this could mean complete service
disruption for single-container services.
…On Fri, Jun 22, 2018 at 1:14 AM, Flavio Crisciani ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In service_linux.go
<#2138 (comment)>:
> return
}
if sb.osSbox == nil {
- return
- }
- if n.ingress && !sb.ingress {
+ logrus.Errorf("addLBBackend %s/%s: unable to find LB OS sandbox", n.ID(), n.Name())
In the past I remember these 2 to be very frequent, is it still the case?
if this is not a fatal condition I would not use Error.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2138 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AK8iXqbYPteTJqib5rINsv4NBLCwTiomks5t_H0cgaJpZM4Tadmu>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an expert in some parts of the code. From what I understand the code LGTM. One Minor change.
At a high level like we discussed I would typically avoid expanding interface like in case of Network (Delete and RemoveLBAndDelete) so that interface is as lean as realistically possible. But that is not a blocker atm.
network.go
Outdated
return err | ||
} | ||
return sb.EnableService() | ||
|
||
err = sb.EnableService() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return sb.EnableService() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very deliberately no. This was the error that the patch needed to fix. If one just does return sb.EnableService()
then if EnableService
returns an error that value does not get propagated to the function's err
variable. This, in turn, means that the cleanup defer functions will not detect the error and will thus not clean up after the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok then you can do this :)
func (n *network) createLoadBalancerSandbox() (err error) {
This would keep the code clean. And avoid things like >>>>>>here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That gets rid of the first line (err declaration for the function), but that's about it. I'm not sure how that avoids things like ep, err := n.CreateEndpoint(...)
. The :=
is (presumably) in the code because we don't want to declare the type of ep
. It's not to declare err
. Of course that has its own dangers. If that statement were inside a loop, if-statement, switch statement or function body it would shadow the outer declaration causing disaster as well. But that's true for named return values in the same way. So I'm not sure what you're getting at. Are you saying it should be changed to:
var ep Endpoint
ep, err = n.CreateEndpoint(...)
...
As far as the named return values and use of bare returns -- shrug it could be argued that this is less clear:
x, err := foo()
if err {
return // Looks at first glance like we aren't returning the error
}
and some prominent golang folks have argued that these should be used sparingly. But I can put it in if desired. The real issue is that the shadowing rules in golang are a bit of a misfeature. One can see what they were aiming for, but there really aren't any clean-looking-while-still-robust patterns w/ regard to error handling because of them. (although I'd be happy to learn otherwise)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can always do return err
with the suggested syntax. It maps it to the defer value check and thats used in a lot of big golang code bases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (n *network) createLoadBalancerSandbox() (retErr error) {
sandboxName := n.lbSandboxName()
sbOptions := []SandboxOption{}
if n.ingress {
sbOptions = append(sbOptions, OptionIngress())
}
sb, err := n.ctrlr.NewSandbox(sandboxName, sbOptions...)
if err != nil {
return err
}
defer func() {
if retErr != nil {
if e := n.ctrlr.SandboxDestroy(sandboxName); e != nil {
logrus.Warnf("could not delete sandbox %s on failure on failure (%v): %v", sandboxName, err, e)
}
}
}()
endpointName := n.lbEndpointName()
epOptions := []EndpointOption{
CreateOptionIpam(n.loadBalancerIP, nil, nil, nil),
CreateOptionLoadBalancer(),
}
if n.hasLoadBalancerEndpoint() && !n.ingress {
// Mark LB endpoints as anonymous so they don't show up in DNS
epOptions = append(epOptions, CreateOptionAnonymous())
}
ep, err := n.createEndpoint(endpointName, epOptions...)
if err != nil {
return err
}
defer func() {
if retErr != nil {
if e := ep.Delete(true); e != nil {
logrus.Warnf("could not delete endpoint %s on failure on failure (%v): %v", endpointName, err, e)
}
}
}()
err := ep.Join(sb, nil)
if err != nil {
return err
}
return sb.EnableService()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so a non-blank return statement assigns to the named return variable! Perfect. Much cleaner. Will change. Thanks!
network.go
Outdated
c.cleanupServiceDiscovery(n.ID()) | ||
// Capture remaining binding info to clear after deleting | ||
// the network from the store. | ||
bindings = c.gatherBindingInfo(n.ID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fetching the info here, and unfortunately because the gatherBinding requires the network, does not seem to give any more guarantee on the cleanup. I wonder if at this point we should just leverage the cleaning done from the networkDB
if n.hasLoadBalancerEndpoint() && !n.ingress { | ||
// Mark LB endpoints as anonymous so they don't show up in DNS | ||
epOptions = append(epOptions, CreateOptionAnonymous()) | ||
} | ||
ep, err := n.createEndpoint(endpointName, epOptions...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here
network.go
Outdated
return err | ||
} | ||
return sb.EnableService() | ||
|
||
err = sb.EnableService() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok then you can do this :)
func (n *network) createLoadBalancerSandbox() (err error) {
This would keep the code clean. And avoid things like >>>>>>here
@ctelfer can you also rebase? |
Signed-off-by: Chris Telfer <ctelfer@docker.com>
3 changes to the latest push:
|
Change the Delete() method to take optional options and add NetworkDeleteOptionRemoveLB as one such option. This option allows explicit removal of an ingress network along with its load-balancing endpoint if there are no other endpoints in the network. Prior to this, the libnetwork client would have to manually search for and remove the ingress load balancing endpoint from an ingress network. This was, of course, completely hacky. This commit will require a slight modification in moby to make use of the option when deleting the ingress network. Signed-off-by: Chris Telfer <ctelfer@docker.com>
Default gateways truncate the endpoint name to 12 characters. This can make network endpoints ambiguous especially for load-balancing sandboxes for networks with lenghty names (such as with our prefixes). Address this by detecting an overflow in the sanbox name length and instead opting to name the gateway endpoint "gateway_<id>" which should never collide. Signed-off-by: Chris Telfer <ctelfer@docker.com>
Error unwinding only works if the error variable is used consistently and isn't hidden in the scope of other if statements. Signed-off-by: Chris Telfer <ctelfer@docker.com>
This method returns the name of the interface from the perspective of the host OS pre-container. This will be required later for finding matching a sandbox's interface name to an endpoint which is, in turn, requied for adding an IP alias to a load balancer endpoint. Signed-off-by: Chris Telfer <ctelfer@docker.com>
New load balancing code will require ability to add aliases to load-balncer sandboxes. So this broadens the OSL interface to allow adding aliases to any interface, along with the facility to get the loopback interface's name based on the OS. Signed-off-by: Chris Telfer <ctelfer@docker.com>
This was passing extra information and adding confusion about the purpose of the load balancing structure. Signed-off-by: Chris Telfer <ctelfer@docker.com>
This is the heart of the scalability change for services in libnetwork. The present routing mesh adds load-balancing rules for a network to every container connected to the network. This newer approach creates a load-balancing endpoint per network per node. For every service on a network, libnetwork assigns the VIP of the service to the endpoint's interface as an alias. This endpoint must have a unique IP address in order to route return traffic to it. Traffic destined for a service's VIP arrives at the load-balancing endpoint on the VIP and from there, Linux load balances it among backend destinations while SNATing said traffic to the endpoint's unique IP address. The net result of this scheme is that each node in a swarm need only have one set of load balancing state per service instead of one per container on the node. This scheme is very similar to how services currently operate on Windows nodes in libnetwork. It (as with Windows nodes) costs the use of extra IP addresses in a network (one per node) and an extra network hop in the stack, although, always in the stack local to the container. In order to prevent existing deployments from suddenly failing if they failed to allocate sufficient address space to include per-node load-balancing endpoint IP addresses, this patch preserves the existing functionality and activates the new functionality on a per-network basis depending on whether the network has a load-balancing endpoint. Eventually, moby should always set this option when creating new networks and should only omit it for networks created as part of a swarm that are not marked to use endpoint load balancing. This patch also normalizes the code to treat "load" and "balancer" as two separate words from the perspectives of variable/function naming. This means that the 'b' in "balancer" must be capitalized. Signed-off-by: Chris Telfer <ctelfer@docker.com>
Lock the network ID in the controller during an addServiceBinding to prevent racing with network.delete(). This would cause the binding to be silently ignored in the system. Signed-off-by: Chris Telfer <ctelfer@docker.com>
Add debug and error logs to notify when a load balancing sandbox is not found. This can occur in normal operation during removal. Signed-off-by: Chris Telfer <ctelfer@docker.com>
For those interested, the moby patch to make use of this code (outside of vendoring) is quite thin:
plus remove the now superfluous |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@ctelfer, can you please elaborate a bit this sentence:
What exactly can't the user do now? |
Hey @orenye ... Sorry for the confusion, but that statement is actually erroneous and was caused by me misremembering an ingress processing scenario when I wrote it. Users should be able to do everything with the new load balancing scheme that they could with the previous one. |
Hi @ctelfer , great, thanks! |
Docker swarm mode supports load balancing traffic to endpoints in a server using a distributed load balancing approach. In this model, all containers a swarm network can act as load balancers for any service resident on a network that the container is present on. This has the advantage of not requiring service traffic to cross the network twice (once to the load balancer and once to the endpoint) as in a more traditional architecture. It also avoids dangers of having a single point of failure and the problems of having to scale the capacity of the load balancer itself. Rather, the load balancing performance can scale as the compute power and network capacity of the scales.
The main problem with the current approach is its own strength. It relies on distributing the load balancing configuration to every container in the swarm. This means that the amount of state that each node must maintain grows as the square of the number of services in the swarm and the square of the average number of containers per service.
This patch conceptually borrows from the architecture of the Windows swarm service support by creating a load-balancer endpoint for each network on each node in the swarm. Libnetwork assigns this endpoint its own IP address on the network as well as the VIPs of every service with on the network. Every container on the node will send its service requests via the VIPs of the service which will therefore arrive at the load balancing endpoint. This endpoint will perform load balancing using IPVS as was done in the service mesh except that the outgoing service request will use the endpoint's private IP address as the source address for the request (via NAT). This is so that return traffic will first go to the load balancing endpoint in order to hit the load balancing state mapping and from there the Linux stack will redirect the response traffic back to the connection initiator.
While this approach drastically improves the load balancing scalability it has two potential negative impacts. First, the original behavior included the ability to load balance to a service by sending to any task IP address for the service. While convenient, this is not likely to be a widely used feature since the DNS entries for a service only point to the VIP and not to the backends. Second, this scheme requires allocating extra IP addresses in each overlay network (one per node that has a task in the network). If a network was near IP address exhaustion, this could cause an existing swarm network to break during an upgrade of the moby engine. The problem can safely be avoided for all new networks by simply allocating sufficiently large address-blocks for each network.
This patch is staged with a series smaller patches that preserve libnetwork's current functioning, followed by a larger patch that contains the main change itself. The patch also includes the addition of a DeleteIngress() method to the Network interface to move load balancing cleanup logic from moby back into libnetwork where it belongs. This was discussed in moby/moby#36538.
Incorporating this into moby requires a small set of changes to enable the functionality in moby (and remove the deprecated functionality mentioned above).