Improve load balancing scalability in swarm mode #2138

ctelfer · 2018-04-18T17:24:19Z

Docker swarm mode supports load balancing traffic to endpoints in a server using a distributed load balancing approach. In this model, all containers a swarm network can act as load balancers for any service resident on a network that the container is present on. This has the advantage of not requiring service traffic to cross the network twice (once to the load balancer and once to the endpoint) as in a more traditional architecture. It also avoids dangers of having a single point of failure and the problems of having to scale the capacity of the load balancer itself. Rather, the load balancing performance can scale as the compute power and network capacity of the scales.

The main problem with the current approach is its own strength. It relies on distributing the load balancing configuration to every container in the swarm. This means that the amount of state that each node must maintain grows as the square of the number of services in the swarm and the square of the average number of containers per service.

This patch conceptually borrows from the architecture of the Windows swarm service support by creating a load-balancer endpoint for each network on each node in the swarm. Libnetwork assigns this endpoint its own IP address on the network as well as the VIPs of every service with on the network. Every container on the node will send its service requests via the VIPs of the service which will therefore arrive at the load balancing endpoint. This endpoint will perform load balancing using IPVS as was done in the service mesh except that the outgoing service request will use the endpoint's private IP address as the source address for the request (via NAT). This is so that return traffic will first go to the load balancing endpoint in order to hit the load balancing state mapping and from there the Linux stack will redirect the response traffic back to the connection initiator.

While this approach drastically improves the load balancing scalability it has two potential negative impacts. First, the original behavior included the ability to load balance to a service by sending to any task IP address for the service. While convenient, this is not likely to be a widely used feature since the DNS entries for a service only point to the VIP and not to the backends. Second, this scheme requires allocating extra IP addresses in each overlay network (one per node that has a task in the network). If a network was near IP address exhaustion, this could cause an existing swarm network to break during an upgrade of the moby engine. The problem can safely be avoided for all new networks by simply allocating sufficiently large address-blocks for each network.

This patch is staged with a series smaller patches that preserve libnetwork's current functioning, followed by a larger patch that contains the main change itself. The patch also includes the addition of a DeleteIngress() method to the Network interface to move load balancing cleanup logic from moby back into libnetwork where it belongs. This was discussed in moby/moby#36538.

Incorporating this into moby requires a small set of changes to enable the functionality in moby (and remove the deprecated functionality mentioned above).

selansen · 2018-04-18T17:34:35Z

Looks like CI failed

ctelfer · 2018-04-18T17:45:30Z

Sorry, forgot to run golint. Adjusted for golint errors and re-pushed.

codecov-io · 2018-04-18T18:23:50Z

Codecov Report

❗ No coverage uploaded for pull request base (master@19f814d). Click here to learn what that means.
The diff coverage is 9.54%.

@@            Coverage Diff            @@
##             master    #2138   +/-   ##
=========================================
  Coverage          ?   40.46%           
=========================================
  Files             ?      139           
  Lines             ?    22548           
  Branches          ?        0           
=========================================
  Hits              ?     9123           
  Misses            ?    12096           
  Partials          ?     1329

Impacted Files	Coverage Δ
endpoint_info.go	`59.33% <0%> (ø)`
store.go	`62.13% <0%> (ø)`
controller.go	`36.65% <0%> (ø)`
endpoint.go	`53.48% <0%> (ø)`
osl/namespace_linux.go	`35.1% <0%> (ø)`
default_gateway.go	`28.26% <0%> (ø)`
service_linux.go	`1.38% <1.21%> (ø)`
sandbox.go	`47.92% <100%> (ø)`
network.go	`62.7% <28.88%> (ø)`
service_common.go	`6.71% <7.54%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 19f814d...878dad0. Read the comment docs.

selansen · 2018-04-18T19:19:00Z

network.go

+	if n.hasLoadBalancerEndpoint() {
+		emptyCount = 1
+	}
+	if !force && n.getEpCnt().EndpointCnt() > emptyCount {


Not sure why do you have emptyCount variable. Anyways you are using only one place. instead , you can directly use n.hasLoadBalancerEndpoint().

The code is attempting to determine whether there are any *container& endpoints left in the network. If there are, then it can not delete the network. The code does this by querying n.getEpCnt().EndpointCnt() to get the current number of endpoints connected to the network. If the network has a load balancer endpoint, then a count of 1 indicates that there are no container endpoints in the network. If there is no load balancer endpoint, then only a count of 0 indicates that there are no more container endpoints.

selansen · 2018-04-18T19:31:16Z

network.go

+				return err
+			}
+			// continue deletion when force is true even on error
+			logrus.Warnf("Error deleting load balancer sandbox: %v", err)


we used warnf but in the message it says Error. Is it just warning or error message ?

I believe I was just keeping it consistent with the previous behavior which put it at the warning level. Presumably this behavior was because failure to remove may be due to the fact that it already got inadvertantly removed. In this case, the error is not fatal to the system since the network is being removed anyway. (see the comment above the warning). If this is the rationale, then warning level seems appropriate. The error message was, I think, the same, but we could change the wording in principle.

selansen · 2018-04-18T20:10:26Z

service_linux.go

+		if sb.ingress {
+			var gwIP net.IP
+			if ep := sb.getGatewayEndpoint(); ep != nil {
+				gwIP = ep.Iface().Address().IP


I was told this is old style C way of coding ..Go they generally use gwIP := ep.Iface().Address().IP instead of declaring variable.

Ah, except for the fact that the gwIP address needs to live outside of the scope of the if-statement and should contain its default value if sb.getGatewayEndpoint() returns nil. It's something akin to writing:

var x int = 0 if something { x = 1 } // Do something with 'x'

Except that golint complains if you initialize a newly declared variable to the default value. In C it would look much nicer of course. :)

int x = something ? 1 : 0;

selansen · 2018-04-18T20:24:37Z

one genreal comment : There are many new functions and changes , But I dont see any unit test. Is it possible to add unit test?

ctelfer · 2018-04-18T21:00:56Z

Perhaps there is a way to do unit tests for a feature like this, but I'm not clear as to how myself. To really test it we need containers, namespaces, etc.. (i.e. what moby does)

We currently don't have any unit tests that validate addServiceBinding(). We could create tests that would validate things like the service binding data structures themselves by stubbing out the backend code and that could be worthwhile. But that won't validate this patchset because all the changes for this set are in the backend code. But if you have some ideas about how we could approach this, please do suggest it. Otherwise it will likely have to be done in integration tests w/ Moby.

fcrisciani

preliminary set of comments, I put them on single commits so it's possible that some of them do not apply on the latest patch. Still reviewing the second last commit though

fcrisciani · 2018-04-19T22:17:10Z

network.go

+	}
+
+	// Check that the network is empty
+	var emptyCount uint64 = 0


no real need to set it to 0, in go variables are already initialized to 0 on creation

yep, will fix.

fcrisciani · 2018-04-23T16:31:59Z

network.go

+func (n *network) lbSandboxName() string {
+	name := n.name + "-sbox"
+	if n.ingress {
+		name = "lb-" + n.name


I think this will be a name change for the ingress endpoint, will this create any issue on upgrade now that the name change? will the delete work properly?

Doh! That's a logic flip. ingress was supposed to remain the same, while new endpoint sandboxes were supposed to change. Will fix.

fcrisciani · 2018-04-23T21:29:16Z

service_linux.go

+	if !ok {
+		return nil, nil, fmt.Errorf("Unable to get sandbox for %s(%s) in for %s", ep.Name(), ep.ID(), n.ID())
+	}
+	// XXX really?  really?  ugh


maybe we should remove this :D

Rats. Forgot to get rid of that!

ctelfer · 2018-05-10T17:53:08Z

Per review w/ @fcrisciani, updating this to remove the backward-compatible code.

selansen · 2018-05-10T20:54:10Z

LGTM

fcrisciani

ok completed a full pass, will have to take a look deeper to a couple of things but happy to discuss the current comments

fcrisciani · 2018-06-04T23:17:48Z

default_gateway.go

+	if len(sb.containerID) <= gwEPlen {
+		gwName = "gateway_" + sb.containerID
+	} else {
+		gwName = "gateway_" + sb.id[0:gwEPlen]


nit, you can omit the 0

fcrisciani · 2018-06-04T23:30:52Z

network.go

+	// a load balancing endpoint that libnetwork will not automatically
+	// remove when the remaining endpoints leave the network.  So, to remove
+	// an ingress network requires a special function declaring said intent.
+	DeleteIngress() error


Don't love this one, an ingress network is itself a network. but did not arrive yet to the code that will actually uses it. Is it possible to consider adding a flag to the Delete that is like releaseEndpoints?
If possible I would like to keep the interface clean

Yeah, this is a choice of ugliness. We can either change the libnetwork API in a backwards incompatible way by changing the Delete() signature or we can add a new method. I was opting for the latter. Will change the signature to make it more semantically clear.

fcrisciani · 2018-06-04T23:45:26Z

network.go

+		// If we got to this point, then the following must hold:
+		//  * force is true, OR
+		//  * endpoint count == 1 AND:
+		//      * n.ingress is false OR


does not seem that n.ingress play a role at this point:
ingress || !ingress is always true. Do you want to keep it around in the comment for some reason?

I wonder at this point if we want to move the logic at line 979 here and use the n.Ingress or keep it as is and leave a comment saying that the ingress deletion is already avoided above

Agree that the comment is confusing and superfluous. Will simplify. Agree that we don't need to mention ingress endpoint deletion: it's explained above.

fcrisciani · 2018-06-05T00:05:28Z

network.go

+}
+
+func (n *network) lbEndpointName() string {
+	return n.name + "-endpoint"


for the default gw we put a restriction on the max string length, do we need something similar for the lb endpoints or sandbox names?

For some reason we restrict the length of the names for gateway sandboxes to 12 characters plus the prefix "gateway_". I'm not sure why we do this for sandboxes only but there doesn't seem to be such a restriction for endpoints, so this should be safe.

fcrisciani · 2018-06-05T00:11:07Z

network.go

 	epOptions := []EndpointOption{
 		CreateOptionIpam(n.loadBalancerIP, nil, nil, nil),
 		CreateOptionLoadBalancer(),
 	}
+	if n.hasLoadBalancerEndpoint() && !n.ingress {
+		epOptions = append(epOptions, CreateOptionAnonymous())


I guess worth spending a comment saying that Anonymous is to avoid it to be resolvable from the DNS

fcrisciani · 2018-06-05T00:26:14Z

service_linux.go

+	if !ok {
+		return nil, nil, fmt.Errorf("Unable to get sandbox for %s(%s) in for %s", ep.Name(), ep.ID(), n.ID())
+	}
+	ep = sb.getEndpoint(ep.ID())


aaarghh, every time that we need a logic like this, one part of me dies....

fcrisciani · 2018-06-05T00:27:37Z

service_linux.go

-				eIP = ep.Iface().Address()
+		if epi.LoadBalancer() {
+			var err error
+			ep, err = n.getEndpointFromStore(e.ID())


e is already coming from store, can we use that already?

fcrisciani · 2018-06-05T00:30:11Z

service_linux.go

 	}
+	return ""
 }

 // Add loadbalancer backend to all sandboxes which has a connection to


update comment, not all sandboxes

fcrisciani · 2018-06-05T00:53:36Z

service_linux.go

-
-// Add loadbalancer backend into one connected sandbox.
-func (sb *sandbox) addLBBackend(ip, vip net.IP, fwMark uint32, ingressPorts []*PortConfig, eIP *net.IPNet, gwIP net.IP, isIngressNetwork bool) {
+func (n *network) addLBBackend(ip net.IP, lb *loadBalancer) {


this is a not for myself, maybe I miss it but if the wiring of this method is the same as before, are we calling it for each container? if so the findLBEndpointSandbox is actually pretty expensive in term of loops that is doing for the same disconnection due to the existence of the store

The merge made some of the changes here a bit confusing because of similarly named methods for both sandbox and network structures. This function does indeed only add a single LB entry per network. (Once the final patch is applied: if one is only looking at the commit that refactors the API then one will still see the "walk endpoints" loop as that commit doesn't change the loadbalancing yet.)

fcrisciani · 2018-06-05T00:55:11Z

service_linux.go

+// network. If 'rmService' is true, then remove the service entry as well.
+// If 'fullRemove' is true then completely remove the entry, otherwise
+// just deweight it for now.
+func (n *network) rmLBBackend(ip net.IP, lb *loadBalancer, rmService bool, fullRemove bool) {


same for this method, are we calling it for each container?

ctelfer · 2018-06-20T13:26:00Z

Pushed an update to address comments and fix new bugs found w/ better e2e tests.

fcrisciani · 2018-06-22T05:14:00Z

service_linux.go

-		return
-	}
-	if n.ingress && !sb.ingress {
+		logrus.Errorf("addLBBackend %s/%s: unable to find LB OS sandbox", n.ID(), n.Name())


In the past I remember these 2 to be very frequent, is it still the case? if this is not a fatal condition I would not use Error.

ctelfer · 2018-06-22T14:00:11Z

Strangely, github refuses to show this comment in my web browser. FWIW, I have specifically looked for these returns during automated testing and haven't seen it trigger yet. A lack of load balancer sandbox will definitely be an error when adding a binding because we are now locking the network ID (see 2 commits prior) and testing for the presence of the network before attempting to add load balancers. The only way that sandbox should be gone is if the network is being deleted. But that also takes the same network lock, so the two should never happen together. As far as the osSbox condition, I don't mind downgrading that ... It should probably be at least warning level since it means that the load balancing doesn't get programmed and this could mean complete service disruption for single-container services.

…

On Fri, Jun 22, 2018 at 1:14 AM, Flavio Crisciani ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In service_linux.go <#2138 (comment)>: > return } if sb.osSbox == nil { - return - } - if n.ingress && !sb.ingress { + logrus.Errorf("addLBBackend %s/%s: unable to find LB OS sandbox", n.ID(), n.Name()) In the past I remember these 2 to be very frequent, is it still the case? if this is not a fatal condition I would not use Error. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2138 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AK8iXqbYPteTJqib5rINsv4NBLCwTiomks5t_H0cgaJpZM4Tadmu> .

abhi

Not an expert in some parts of the code. From what I understand the code LGTM. One Minor change.

At a high level like we discussed I would typically avoid expanding interface like in case of Network (Delete and RemoveLBAndDelete) so that interface is as lean as realistically possible. But that is not a blocker atm.

abhi · 2018-06-25T19:48:40Z

network.go

 		return err
 	}
-	return sb.EnableService()
+
+	err = sb.EnableService()


return sb.EnableService() ?

Very deliberately no. This was the error that the patch needed to fix. If one just does return sb.EnableService() then if EnableService returns an error that value does not get propagated to the function's err variable. This, in turn, means that the cleanup defer functions will not detect the error and will thus not clean up after the error.

ok then you can do this :)

func (n *network) createLoadBalancerSandbox() (err error) {

This would keep the code clean. And avoid things like >>>>>>here

That gets rid of the first line (err declaration for the function), but that's about it. I'm not sure how that avoids things like ep, err := n.CreateEndpoint(...). The := is (presumably) in the code because we don't want to declare the type of ep. It's not to declare err. Of course that has its own dangers. If that statement were inside a loop, if-statement, switch statement or function body it would shadow the outer declaration causing disaster as well. But that's true for named return values in the same way. So I'm not sure what you're getting at. Are you saying it should be changed to:

var ep Endpoint ep, err = n.CreateEndpoint(...) ...

As far as the named return values and use of bare returns -- shrug it could be argued that this is less clear:

x, err := foo() if err { return // Looks at first glance like we aren't returning the error }

and some prominent golang folks have argued that these should be used sparingly. But I can put it in if desired. The real issue is that the shadowing rules in golang are a bit of a misfeature. One can see what they were aiming for, but there really aren't any clean-looking-while-still-robust patterns w/ regard to error handling because of them. (although I'd be happy to learn otherwise)

you can always do return err with the suggested syntax. It maps it to the defer value check and thats used in a lot of big golang code bases.

func (n *network) createLoadBalancerSandbox() (retErr error) { sandboxName := n.lbSandboxName() sbOptions := []SandboxOption{} if n.ingress { sbOptions = append(sbOptions, OptionIngress()) } sb, err := n.ctrlr.NewSandbox(sandboxName, sbOptions...) if err != nil { return err } defer func() { if retErr != nil { if e := n.ctrlr.SandboxDestroy(sandboxName); e != nil { logrus.Warnf("could not delete sandbox %s on failure on failure (%v): %v", sandboxName, err, e) } } }() endpointName := n.lbEndpointName() epOptions := []EndpointOption{ CreateOptionIpam(n.loadBalancerIP, nil, nil, nil), CreateOptionLoadBalancer(), } if n.hasLoadBalancerEndpoint() && !n.ingress { // Mark LB endpoints as anonymous so they don't show up in DNS epOptions = append(epOptions, CreateOptionAnonymous()) } ep, err := n.createEndpoint(endpointName, epOptions...) if err != nil { return err } defer func() { if retErr != nil { if e := ep.Delete(true); e != nil { logrus.Warnf("could not delete endpoint %s on failure on failure (%v): %v", endpointName, err, e) } } }() err := ep.Join(sb, nil) if err != nil { return err } return sb.EnableService() }

Ah, so a non-blank return statement assigns to the named return variable! Perfect. Much cleaner. Will change. Thanks!

fcrisciani · 2018-06-25T23:37:51Z

network.go

-	c.cleanupServiceDiscovery(n.ID())
+	// Capture remaining binding info to clear after deleting
+	// the network from the store.
+	bindings = c.gatherBindingInfo(n.ID())


fetching the info here, and unfortunately because the gatherBinding requires the network, does not seem to give any more guarantee on the cleanup. I wonder if at this point we should just leverage the cleaning done from the networkDB

abhi · 2018-06-27T14:45:56Z

network.go

+	if n.hasLoadBalancerEndpoint() && !n.ingress {
+		// Mark LB endpoints as anonymous so they don't show up in DNS
+		epOptions = append(epOptions, CreateOptionAnonymous())
+	}
 	ep, err := n.createEndpoint(endpointName, epOptions...)


abhi · 2018-06-27T14:46:53Z

network.go

 		return err
 	}
-	return sb.EnableService()
+
+	err = sb.EnableService()


ok then you can do this :)

func (n *network) createLoadBalancerSandbox() (err error) {

This would keep the code clean. And avoid things like >>>>>>here

fcrisciani · 2018-06-27T23:40:01Z

@ctelfer can you also rebase?

Signed-off-by: Chris Telfer <ctelfer@docker.com>

ctelfer · 2018-06-28T16:05:35Z

3 changes to the latest push:

revised createLoadBalancerSandbox()'s error handling per @abhi 's suggestion
also removed the RemoveLBEndpointAndDelete() in favor of passing an option (which can otherwise be omitted) instead. This leaves the API backward compatible while avoiding adding a new method to the interface. (also @abhi 's suggestion)
rebased to head of master

Change the Delete() method to take optional options and add NetworkDeleteOptionRemoveLB as one such option. This option allows explicit removal of an ingress network along with its load-balancing endpoint if there are no other endpoints in the network. Prior to this, the libnetwork client would have to manually search for and remove the ingress load balancing endpoint from an ingress network. This was, of course, completely hacky. This commit will require a slight modification in moby to make use of the option when deleting the ingress network. Signed-off-by: Chris Telfer <ctelfer@docker.com>

Default gateways truncate the endpoint name to 12 characters. This can make network endpoints ambiguous especially for load-balancing sandboxes for networks with lenghty names (such as with our prefixes). Address this by detecting an overflow in the sanbox name length and instead opting to name the gateway endpoint "gateway_<id>" which should never collide. Signed-off-by: Chris Telfer <ctelfer@docker.com>

Error unwinding only works if the error variable is used consistently and isn't hidden in the scope of other if statements. Signed-off-by: Chris Telfer <ctelfer@docker.com>

This method returns the name of the interface from the perspective of the host OS pre-container. This will be required later for finding matching a sandbox's interface name to an endpoint which is, in turn, requied for adding an IP alias to a load balancer endpoint. Signed-off-by: Chris Telfer <ctelfer@docker.com>

New load balancing code will require ability to add aliases to load-balncer sandboxes. So this broadens the OSL interface to allow adding aliases to any interface, along with the facility to get the loopback interface's name based on the OS. Signed-off-by: Chris Telfer <ctelfer@docker.com>

This was passing extra information and adding confusion about the purpose of the load balancing structure. Signed-off-by: Chris Telfer <ctelfer@docker.com>

This is the heart of the scalability change for services in libnetwork. The present routing mesh adds load-balancing rules for a network to every container connected to the network. This newer approach creates a load-balancing endpoint per network per node. For every service on a network, libnetwork assigns the VIP of the service to the endpoint's interface as an alias. This endpoint must have a unique IP address in order to route return traffic to it. Traffic destined for a service's VIP arrives at the load-balancing endpoint on the VIP and from there, Linux load balances it among backend destinations while SNATing said traffic to the endpoint's unique IP address. The net result of this scheme is that each node in a swarm need only have one set of load balancing state per service instead of one per container on the node. This scheme is very similar to how services currently operate on Windows nodes in libnetwork. It (as with Windows nodes) costs the use of extra IP addresses in a network (one per node) and an extra network hop in the stack, although, always in the stack local to the container. In order to prevent existing deployments from suddenly failing if they failed to allocate sufficient address space to include per-node load-balancing endpoint IP addresses, this patch preserves the existing functionality and activates the new functionality on a per-network basis depending on whether the network has a load-balancing endpoint. Eventually, moby should always set this option when creating new networks and should only omit it for networks created as part of a swarm that are not marked to use endpoint load balancing. This patch also normalizes the code to treat "load" and "balancer" as two separate words from the perspectives of variable/function naming. This means that the 'b' in "balancer" must be capitalized. Signed-off-by: Chris Telfer <ctelfer@docker.com>

Lock the network ID in the controller during an addServiceBinding to prevent racing with network.delete(). This would cause the binding to be silently ignored in the system. Signed-off-by: Chris Telfer <ctelfer@docker.com>

Add debug and error logs to notify when a load balancing sandbox is not found. This can occur in normal operation during removal. Signed-off-by: Chris Telfer <ctelfer@docker.com>

ctelfer · 2018-06-28T17:59:41Z

For those interested, the moby patch to make use of this code (outside of vendoring) is quite thin:

diff --git a/daemon/network.go b/daemon/network.go
index 4263409..56d1067 100644
--- a/daemon/network.go
+++ b/daemon/network.go
@@ -4,7 +4,6 @@ import (
        "context"
        "fmt"
        "net"
-       "runtime"
        "sort"
        "strconv"
        "strings"
@@ -230,9 +229,7 @@ func (daemon *Daemon) releaseIngress(id string) {
                return
        }
 
-       daemon.deleteLoadBalancerSandbox(n)
-
-       if err := n.Delete(); err != nil {
+       if err := n.Delete(libnetwork.NetworkDeleteOptionRemoveLB); err != nil {
                logrus.Errorf("Failed to delete ingress network %s: %v", n.ID(), err)
                return
        }
@@ -349,7 +346,7 @@ func (daemon *Daemon) createNetwork(create types.NetworkCreateRequest, id string
                nwOptions = append(nwOptions, libnetwork.NetworkOptionConfigFrom(create.ConfigFrom.Network))
        }
 
-       if agent && driver == "overlay" && (create.Ingress || runtime.GOOS == "windows") {
+       if agent && driver == "overlay" {
                nodeIP, exists := daemon.GetAttachmentStore().GetIPForNetwork(id)
                if !exists {
                        return nil, fmt.Errorf("Failed to find a load balancer IP to use for network: %v", id)
@@ -512,37 +509,6 @@ func (daemon *Daemon) DeleteNetwork(networkID string) error {
        return daemon.deleteNetwork(n, false)
 }

plus remove the now superfluous daemon.deleteLoadBalancerSandbox() function.

fcrisciani

LGTM

orenye · 2018-07-30T06:21:35Z

@ctelfer, can you please elaborate a bit this sentence:

First, the original behavior included the ability to load balance to a service by sending to any task IP address for the service.

What exactly can't the user do now?

ctelfer · 2018-07-30T14:08:35Z

Hey @orenye ... Sorry for the confusion, but that statement is actually erroneous and was caused by me misremembering an ingress processing scenario when I wrote it. Users should be able to do everything with the new load balancing scheme that they could with the previous one.

orenye · 2018-07-31T06:50:44Z

Hi @ctelfer , great, thanks!

fcrisciani requested review from fcrisciani, ddebroy and abhi April 18, 2018 17:33

ctelfer force-pushed the scalable-lb branch from e1008c0 to 47aac5e Compare April 18, 2018 17:44

ctelfer force-pushed the scalable-lb branch from 47aac5e to c89e6bd Compare April 18, 2018 18:01

selansen reviewed Apr 18, 2018

View reviewed changes

fcrisciani reviewed Apr 23, 2018

View reviewed changes

ctelfer force-pushed the scalable-lb branch from c89e6bd to a5b898f Compare May 10, 2018 17:51

ctelfer force-pushed the scalable-lb branch from a5b898f to b58e5e9 Compare May 10, 2018 18:27

fcrisciani reviewed Jun 5, 2018

View reviewed changes

ctelfer force-pushed the scalable-lb branch from b58e5e9 to 4047961 Compare June 20, 2018 13:25

ctelfer force-pushed the scalable-lb branch from fa13e11 to 7657856 Compare June 21, 2018 20:34

fcrisciani reviewed Jun 22, 2018

View reviewed changes

abhi approved these changes Jun 25, 2018

View reviewed changes

ctelfer force-pushed the scalable-lb branch from 7657856 to 8c6447e Compare June 25, 2018 22:16

fcrisciani reviewed Jun 25, 2018

View reviewed changes

ctelfer force-pushed the scalable-lb branch from 8c6447e to 878dad0 Compare June 26, 2018 18:34

abhi reviewed Jun 27, 2018

View reviewed changes

Generate LB sandbox/endpoint names in one place

5111c24

Signed-off-by: Chris Telfer <ctelfer@docker.com>

ctelfer force-pushed the scalable-lb branch from 878dad0 to 452d056 Compare June 28, 2018 15:59

ctelfer added 9 commits June 28, 2018 12:08

Fix error handling in createLoadBalncerSandbox()

07d3c5c

Error unwinding only works if the error variable is used consistently and isn't hidden in the scope of other if statements. Signed-off-by: Chris Telfer <ctelfer@docker.com>

Refactor [add|rm]LBBackend() to use lb struct

2166946

This was passing extra information and adding confusion about the purpose of the load balancing structure. Signed-off-by: Chris Telfer <ctelfer@docker.com>

Prevent race between add-binding and net-delete

fd7f30d

Lock the network ID in the controller during an addServiceBinding to prevent racing with network.delete(). This would cause the binding to be silently ignored in the system. Signed-off-by: Chris Telfer <ctelfer@docker.com>

Adjust warnings for transient LB endpoint conds

366b911

Add debug and error logs to notify when a load balancing sandbox is not found. This can occur in normal operation during removal. Signed-off-by: Chris Telfer <ctelfer@docker.com>

ctelfer force-pushed the scalable-lb branch from 452d056 to 366b911 Compare June 28, 2018 16:08

fcrisciani approved these changes Jun 28, 2018

View reviewed changes

fcrisciani merged commit b018663 into moby:master Jun 28, 2018

ctelfer mentioned this pull request Jul 9, 2018

Improve scalability of the Linux load balancing moby/moby#37372

Merged

thaJeztah mentioned this pull request Jul 9, 2018

[18.06] Improve scalability of the Linux load balancing docker-archive/engine#16

Merged

ctelfer deleted the scalable-lb branch July 30, 2018 14:08

thaJeztah mentioned this pull request Nov 14, 2018

Mapping port not exposed by stack deploy! boot2docker/boot2docker#1349

Closed

thaJeztah mentioned this pull request Mar 4, 2020

Cleanup VFP during overlay network removal #2524

Merged

trapier mentioned this pull request Mar 5, 2020

VFP leak in windows overlay network deletion #2527

Closed

Improve load balancing scalability in swarm mode #2138

Improve load balancing scalability in swarm mode #2138

Conversation

ctelfer commented Apr 18, 2018 • edited Loading

selansen commented Apr 18, 2018

ctelfer commented Apr 18, 2018

codecov-io commented Apr 18, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ctelfer Apr 18, 2018 • edited Loading

Choose a reason for hiding this comment

selansen commented Apr 18, 2018

ctelfer commented Apr 18, 2018

fcrisciani left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ctelfer commented May 10, 2018

selansen commented May 10, 2018

fcrisciani left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ctelfer commented Jun 20, 2018

Choose a reason for hiding this comment

ctelfer commented Jun 22, 2018 via email

abhi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fcrisciani commented Jun 27, 2018

ctelfer commented Jun 28, 2018 • edited Loading

ctelfer commented Jun 28, 2018

fcrisciani left a comment

Choose a reason for hiding this comment

orenye commented Jul 30, 2018 • edited Loading

ctelfer commented Jul 30, 2018

orenye commented Jul 31, 2018

ctelfer commented Apr 18, 2018 •

edited

Loading

codecov-io commented Apr 18, 2018 •

edited

Loading

ctelfer Apr 18, 2018 •

edited

Loading

ctelfer commented Jun 28, 2018 •

edited

Loading

orenye commented Jul 30, 2018 •

edited

Loading