Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Don't delete record of IP allocation immediately after container death #1191

Merged
merged 6 commits into from
Jan 13, 2016

Conversation

bboreham
Copy link
Contributor

This fixes #1047, without any special reference to the restart command, because the same container ID will get back the same address as long as you start/attach it within the timeout window of 5 seconds.

Perhaps the timeout should be configurable?

@bboreham
Copy link
Contributor Author

As it stands, this change has the effect that you can starve the IP allocator by starting and stopping a lot of containers in succession. We could, if we run out of free addresses, start taking the oldest ones from the dead list.

Similarly, we will be unable to donate space to a peer if we have lots of dead ones piled up.

@bboreham
Copy link
Contributor Author

New idea:
Do not retain IP addresses for un-named containers.
Retain named containers' IP addresses for longer than 5 seconds. 5 minutes?
Discard named containers' IP addresses when we see a 'destroy' event.

@bboreham
Copy link
Contributor Author

closed in favour of #1196

@bboreham bboreham closed this Jul 20, 2015
@rade rade deleted the 1047-same-ip-on-restart branch July 20, 2015 22:56
@rade rade added this to the 1.1.0 milestone Jul 21, 2015
@bboreham bboreham restored the 1047-same-ip-on-restart branch July 22, 2015 11:12
@bboreham
Copy link
Contributor Author

Maybe we do want to do it this way after all.

@bboreham bboreham reopened this Jul 22, 2015
@bboreham bboreham self-assigned this Jul 22, 2015
@bboreham
Copy link
Contributor Author

Need to bring docs over from #1196, removing the comment about not working on auto-restart once #1210 is merged.

@bboreham
Copy link
Contributor Author

Brought docs and weave restart command in from #1196.
The docs are written assuming #1210 is merged.

@bboreham bboreham removed their assignment Jul 23, 2015
@rade rade removed this from the 1.1.0 milestone Jul 27, 2015
@bboreham bboreham self-assigned this Aug 19, 2015
@bboreham bboreham force-pushed the 1047-same-ip-on-restart branch 4 times, most recently from 4522ba3 to d6d66f6 Compare August 20, 2015 10:00
@bboreham bboreham removed their assignment Aug 20, 2015
@tomwilkie tomwilkie self-assigned this Aug 20, 2015
@@ -118,6 +118,22 @@ IP addresses of external services the hosts or containers need to
connect to. The same IP range must be used everywhere, and the
individual IP addresses must, of course, be unique.

If you restart a container, it will retain the same IP addresses on
the weave network:

This comment was marked as abuse.

This comment was marked as abuse.

@tomwilkie
Copy link
Contributor

Minor comments but generally looks fine.

@tomwilkie tomwilkie assigned bboreham and unassigned tomwilkie Aug 21, 2015
@bboreham bboreham removed their assignment Aug 24, 2015
@bboreham
Copy link
Contributor Author

I believe I have addressed all comments.

@bboreham
Copy link
Contributor Author

bboreham commented Nov 9, 2015

I have filed the extra issues, rebased and updated the doc as suggested.

@bboreham bboreham force-pushed the 1047-same-ip-on-restart branch from 33fb077 to f98e7f9 Compare November 18, 2015 11:05
@bboreham bboreham force-pushed the 1047-same-ip-on-restart branch 4 times, most recently from bc43ae8 to b55b737 Compare January 7, 2016 17:28
@awh awh self-assigned this Jan 11, 2016
@awh
Copy link
Contributor

awh commented Jan 11, 2016

This looks good, but I am seeing occasional failures of 640_proxy_restart_reattaches_test.sh. In the cases when it passes, it completes in ~50s; otherwise it times out at ~120s. Instrumentation reveals that the run_many loop is hanging after some number of iterations (e.g. 37, 28) in the failure cases; will investigate further.

@awh
Copy link
Contributor

awh commented Jan 12, 2016

docker events normally repeats the following every iteration:

2016-01-12T12:35:59.388983735Z 322c11c6e0ddd246e8f44e9d2953e84ddfb79a80591429270e2c193677e2b95a: (from alpine) create
2016-01-12T12:35:59.392560828Z 322c11c6e0ddd246e8f44e9d2953e84ddfb79a80591429270e2c193677e2b95a: (from alpine) attach
2016-01-12T12:35:59.410298728Z 322c11c6e0ddd246e8f44e9d2953e84ddfb79a80591429270e2c193677e2b95a: (from alpine) start
2016-01-12T12:35:59.596727907Z 2e23062526708e30f9801ac9faa97b6ab8adcc1cd2c8245d058a1b0421b53778: (from weaveworks/weaveexec:latest) create
2016-01-12T12:35:59.598303701Z 2e23062526708e30f9801ac9faa97b6ab8adcc1cd2c8245d058a1b0421b53778: (from weaveworks/weaveexec:latest) attach
2016-01-12T12:35:59.601584397Z 2e23062526708e30f9801ac9faa97b6ab8adcc1cd2c8245d058a1b0421b53778: (from weaveworks/weaveexec:latest) start
2016-01-12T12:35:59.645652816Z 2e23062526708e30f9801ac9faa97b6ab8adcc1cd2c8245d058a1b0421b53778: (from weaveworks/weaveexec:latest) die
2016-01-12T12:35:59.651364857Z 2e23062526708e30f9801ac9faa97b6ab8adcc1cd2c8245d058a1b0421b53778: (from weaveworks/weaveexec:latest) destroy
2016-01-12T12:35:59.708844581Z 322c11c6e0ddd246e8f44e9d2953e84ddfb79a80591429270e2c193677e2b95a: (from alpine) die
2016-01-12T12:35:59.751633179Z 322c11c6e0ddd246e8f44e9d2953e84ddfb79a80591429270e2c193677e2b95a: (from alpine) destroy

However when the test hangs the last output is this:

2016-01-12T12:35:59.779496735Z aba7f36e7755112caff1e70771637c37dc04c894edc8bd5162cd9f4069e64d73: (from alpine) create
2016-01-12T12:35:59.782579941Z aba7f36e7755112caff1e70771637c37dc04c894edc8bd5162cd9f4069e64d73: (from alpine) attach
2016-01-12T12:35:59.789890407Z aba7f36e7755112caff1e70771637c37dc04c894edc8bd5162cd9f4069e64d73: (from alpine) start
2016-01-12T12:36:00.094434404Z 378c9d457981165f64ca654f8eec4d2286563ce0b36bda3ec8cea134c0db260d: (from weaveworks/weaveexec:latest) create
2016-01-12T12:36:00.096374780Z 378c9d457981165f64ca654f8eec4d2286563ce0b36bda3ec8cea134c0db260d: (from weaveworks/weaveexec:latest) attach
2016-01-12T12:36:00.099326907Z 378c9d457981165f64ca654f8eec4d2286563ce0b36bda3ec8cea134c0db260d: (from weaveworks/weaveexec:latest) start
2016-01-12T12:36:00.261661612Z 378c9d457981165f64ca654f8eec4d2286563ce0b36bda3ec8cea134c0db260d: (from weaveworks/weaveexec:latest) die
2016-01-12T12:36:00.267873661Z 378c9d457981165f64ca654f8eec4d2286563ce0b36bda3ec8cea134c0db260d: (from weaveworks/weaveexec:latest) destroy

In this particular example, the hang occurred on only the second iteration! It looks like the test container isn't dieing for some reason?

@awh
Copy link
Contributor

awh commented Jan 12, 2016

It looks like the test container isn't dieing for some reason?

The docker run ... --rm ... /bin/true process that launched the hanging container persists even after the smoke test aborted.

@bboreham
Copy link
Contributor Author

Is it hanging in weavewait?

@awh
Copy link
Contributor

awh commented Jan 12, 2016

Is it hanging in weavewait?

I'll check. Might be an idea to see if you can reproduce it too - it's failing for me about 20% of the time.

@awh
Copy link
Contributor

awh commented Jan 12, 2016

Is it hanging in weavewait?

It looks like it is; given that I've seen it hang on the second iteration, it would appear to be unrelated to address exhaustion...

and call tryPendingOps() from it so we eventually re-try operations
so they will be reclaimed if the container is restarted.
Still cancel pending operations immediately a container dies
but remove them if they are destroyed, to reduce starvation
when users are creating and destroying containers quickly.
@bboreham bboreham force-pushed the 1047-same-ip-on-restart branch from b55b737 to e468399 Compare January 12, 2016 14:39
@awh
Copy link
Contributor

awh commented Jan 12, 2016

Let me know when you've rebased on the weavewait race fix and I'll give it another try.

@bboreham
Copy link
Contributor Author

Rebased and succeeded on CircleCI

@awh awh added this to the 1.5.0 milestone Jan 13, 2016
awh added a commit that referenced this pull request Jan 13, 2016
Don't delete record of IP allocation immediately after container death
@awh awh merged commit 29ebeea into master Jan 13, 2016
@awh awh deleted the 1047-same-ip-on-restart branch January 13, 2016 09:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retain IP address for container across restart
4 participants