Don't delete record of IP allocation immediately after container death #1191

bboreham · 2015-07-17T16:56:08Z

This fixes #1047, without any special reference to the restart command, because the same container ID will get back the same address as long as you start/attach it within the timeout window of 5 seconds.

Perhaps the timeout should be configurable?

bboreham · 2015-07-17T17:03:24Z

As it stands, this change has the effect that you can starve the IP allocator by starting and stopping a lot of containers in succession. We could, if we run out of free addresses, start taking the oldest ones from the dead list.

Similarly, we will be unable to donate space to a peer if we have lots of dead ones piled up.

bboreham · 2015-07-20T09:20:47Z

New idea:
Do not retain IP addresses for un-named containers.
Retain named containers' IP addresses for longer than 5 seconds. 5 minutes?
Discard named containers' IP addresses when we see a 'destroy' event.

bboreham · 2015-07-20T14:52:11Z

closed in favour of #1196

bboreham · 2015-07-22T11:13:00Z

Maybe we do want to do it this way after all.

bboreham · 2015-07-22T16:28:19Z

Need to bring docs over from #1196, removing the comment about not working on auto-restart once #1210 is merged.

bboreham · 2015-07-23T18:50:42Z

Brought docs and weave restart command in from #1196.
The docs are written assuming #1210 is merged.

site/features.md

@@ -118,6 +118,22 @@ IP addresses of external services the hosts or containers need to
 connect to. The same IP range must be used everywhere, and the
 individual IP addresses must, of course, be unique.

+If you restart a container, it will retain the same IP addresses on
+the weave network:


tomwilkie · 2015-08-21T09:18:00Z

Minor comments but generally looks fine.

bboreham · 2015-08-24T13:22:19Z

I believe I have addressed all comments.

bboreham · 2015-11-09T15:30:53Z

I have filed the extra issues, rebased and updated the doc as suggested.

awh · 2016-01-11T17:50:33Z

This looks good, but I am seeing occasional failures of 640_proxy_restart_reattaches_test.sh. In the cases when it passes, it completes in ~50s; otherwise it times out at ~120s. Instrumentation reveals that the run_many loop is hanging after some number of iterations (e.g. 37, 28) in the failure cases; will investigate further.

awh · 2016-01-12T13:04:48Z

docker events normally repeats the following every iteration:

2016-01-12T12:35:59.388983735Z 322c11c6e0ddd246e8f44e9d2953e84ddfb79a80591429270e2c193677e2b95a: (from alpine) create
2016-01-12T12:35:59.392560828Z 322c11c6e0ddd246e8f44e9d2953e84ddfb79a80591429270e2c193677e2b95a: (from alpine) attach
2016-01-12T12:35:59.410298728Z 322c11c6e0ddd246e8f44e9d2953e84ddfb79a80591429270e2c193677e2b95a: (from alpine) start
2016-01-12T12:35:59.596727907Z 2e23062526708e30f9801ac9faa97b6ab8adcc1cd2c8245d058a1b0421b53778: (from weaveworks/weaveexec:latest) create
2016-01-12T12:35:59.598303701Z 2e23062526708e30f9801ac9faa97b6ab8adcc1cd2c8245d058a1b0421b53778: (from weaveworks/weaveexec:latest) attach
2016-01-12T12:35:59.601584397Z 2e23062526708e30f9801ac9faa97b6ab8adcc1cd2c8245d058a1b0421b53778: (from weaveworks/weaveexec:latest) start
2016-01-12T12:35:59.645652816Z 2e23062526708e30f9801ac9faa97b6ab8adcc1cd2c8245d058a1b0421b53778: (from weaveworks/weaveexec:latest) die
2016-01-12T12:35:59.651364857Z 2e23062526708e30f9801ac9faa97b6ab8adcc1cd2c8245d058a1b0421b53778: (from weaveworks/weaveexec:latest) destroy
2016-01-12T12:35:59.708844581Z 322c11c6e0ddd246e8f44e9d2953e84ddfb79a80591429270e2c193677e2b95a: (from alpine) die
2016-01-12T12:35:59.751633179Z 322c11c6e0ddd246e8f44e9d2953e84ddfb79a80591429270e2c193677e2b95a: (from alpine) destroy

However when the test hangs the last output is this:

2016-01-12T12:35:59.779496735Z aba7f36e7755112caff1e70771637c37dc04c894edc8bd5162cd9f4069e64d73: (from alpine) create
2016-01-12T12:35:59.782579941Z aba7f36e7755112caff1e70771637c37dc04c894edc8bd5162cd9f4069e64d73: (from alpine) attach
2016-01-12T12:35:59.789890407Z aba7f36e7755112caff1e70771637c37dc04c894edc8bd5162cd9f4069e64d73: (from alpine) start
2016-01-12T12:36:00.094434404Z 378c9d457981165f64ca654f8eec4d2286563ce0b36bda3ec8cea134c0db260d: (from weaveworks/weaveexec:latest) create
2016-01-12T12:36:00.096374780Z 378c9d457981165f64ca654f8eec4d2286563ce0b36bda3ec8cea134c0db260d: (from weaveworks/weaveexec:latest) attach
2016-01-12T12:36:00.099326907Z 378c9d457981165f64ca654f8eec4d2286563ce0b36bda3ec8cea134c0db260d: (from weaveworks/weaveexec:latest) start
2016-01-12T12:36:00.261661612Z 378c9d457981165f64ca654f8eec4d2286563ce0b36bda3ec8cea134c0db260d: (from weaveworks/weaveexec:latest) die
2016-01-12T12:36:00.267873661Z 378c9d457981165f64ca654f8eec4d2286563ce0b36bda3ec8cea134c0db260d: (from weaveworks/weaveexec:latest) destroy

In this particular example, the hang occurred on only the second iteration! It looks like the test container isn't dieing for some reason?

awh · 2016-01-12T13:11:01Z

It looks like the test container isn't dieing for some reason?

The docker run ... --rm ... /bin/true process that launched the hanging container persists even after the smoke test aborted.

bboreham · 2016-01-12T13:25:06Z

Is it hanging in weavewait?

awh · 2016-01-12T13:26:38Z

Is it hanging in weavewait?

I'll check. Might be an idea to see if you can reproduce it too - it's failing for me about 20% of the time.

awh · 2016-01-12T14:18:47Z

Is it hanging in weavewait?

It looks like it is; given that I've seen it hang on the second iteration, it would appear to be unrelated to address exhaustion...

and call tryPendingOps() from it so we eventually re-try operations

so they will be reclaimed if the container is restarted. Still cancel pending operations immediately a container dies

but remove them if they are destroyed, to reduce starvation when users are creating and destroying containers quickly.

awh · 2016-01-12T14:51:42Z

Let me know when you've rebased on the weavewait race fix and I'll give it another try.

bboreham · 2016-01-12T14:53:55Z

Rebased and succeeded on CircleCI

Don't delete record of IP allocation immediately after container death

bboreham closed this Jul 20, 2015

rade deleted the 1047-same-ip-on-restart branch July 20, 2015 22:56

rade added this to the 1.1.0 milestone Jul 21, 2015

bboreham restored the 1047-same-ip-on-restart branch July 22, 2015 11:12

bboreham reopened this Jul 22, 2015

bboreham self-assigned this Jul 22, 2015

bboreham removed their assignment Jul 23, 2015

bboreham mentioned this pull request Jul 23, 2015

Catch 'restart' command in proxy and retain IP address #1196

Closed

rade removed this from the 1.1.0 milestone Jul 27, 2015

bboreham self-assigned this Aug 19, 2015

bboreham force-pushed the 1047-same-ip-on-restart branch 4 times, most recently from 4522ba3 to d6d66f6 Compare August 20, 2015 10:00

bboreham removed their assignment Aug 20, 2015

tomwilkie self-assigned this Aug 20, 2015

tomwilkie reviewed Aug 21, 2015
View reviewed changes

tomwilkie assigned bboreham and unassigned tomwilkie Aug 21, 2015

bboreham removed their assignment Aug 24, 2015

This was referenced Nov 9, 2015

Additional DNS records are lost on container restart #1654

Open

Additional IP addresses are lost on container restart #1655

Open

bboreham force-pushed the 1047-same-ip-on-restart branch from a2731b1 to 33fb077 Compare November 9, 2015 15:30

bboreham force-pushed the 1047-same-ip-on-restart branch from 33fb077 to f98e7f9 Compare November 18, 2015 11:05

awh force-pushed the master branch from 2cfaf5a to 1f438db Compare December 2, 2015 14:32

bboreham force-pushed the 1047-same-ip-on-restart branch 4 times, most recently from bc43ae8 to b55b737 Compare January 7, 2016 17:28

awh self-assigned this Jan 11, 2016

bboreham added 6 commits January 12, 2016 14:38

Generalise ticker so we can use it for more than paxos,

4672e9e

and call tryPendingOps() from it so we eventually re-try operations

Leave container allocations for a time after death

f84b933

so they will be reclaimed if the container is restarted. Still cancel pending operations immediately a container dies

Add 'weave restart' command

7985e43

Reclaim all addresses on restart

2f885a7

Added 'ContainerDestroyed()' callback from docker client

b2b293a

Extend the timeout for dead containers to 30 seconds,

e468399

but remove them if they are destroyed, to reduce starvation when users are creating and destroying containers quickly.

bboreham force-pushed the 1047-same-ip-on-restart branch from b55b737 to e468399 Compare January 12, 2016 14:39

awh added this to the 1.5.0 milestone Jan 13, 2016

awh added a commit that referenced this pull request Jan 13, 2016

Merge pull request #1191 from weaveworks/1047-same-ip-on-restart

29ebeea

Don't delete record of IP allocation immediately after container death

awh merged commit 29ebeea into master Jan 13, 2016

awh deleted the 1047-same-ip-on-restart branch January 13, 2016 09:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't delete record of IP allocation immediately after container death #1191

Don't delete record of IP allocation immediately after container death #1191

bboreham commented Jul 17, 2015

bboreham commented Jul 17, 2015

bboreham commented Jul 20, 2015

bboreham commented Jul 20, 2015

bboreham commented Jul 22, 2015

bboreham commented Jul 22, 2015

bboreham commented Jul 23, 2015

This comment was marked as abuse.

This comment was marked as abuse.

tomwilkie commented Aug 21, 2015

bboreham commented Aug 24, 2015

bboreham commented Nov 9, 2015

awh commented Jan 11, 2016

awh commented Jan 12, 2016

awh commented Jan 12, 2016

bboreham commented Jan 12, 2016

awh commented Jan 12, 2016

awh commented Jan 12, 2016

awh commented Jan 12, 2016

bboreham commented Jan 12, 2016

Don't delete record of IP allocation immediately after container death #1191

Don't delete record of IP allocation immediately after container death #1191

Conversation

bboreham commented Jul 17, 2015

bboreham commented Jul 17, 2015

bboreham commented Jul 20, 2015

bboreham commented Jul 20, 2015

bboreham commented Jul 22, 2015

bboreham commented Jul 22, 2015

bboreham commented Jul 23, 2015

This comment was marked as abuse.

This comment was marked as abuse.

tomwilkie commented Aug 21, 2015

bboreham commented Aug 24, 2015

bboreham commented Nov 9, 2015

awh commented Jan 11, 2016

awh commented Jan 12, 2016

awh commented Jan 12, 2016

bboreham commented Jan 12, 2016

awh commented Jan 12, 2016

awh commented Jan 12, 2016

awh commented Jan 12, 2016

bboreham commented Jan 12, 2016