-
Notifications
You must be signed in to change notification settings - Fork 670
weave launch-router fails when trying to start after a stop #1772
Comments
We had a report of this before, but not with a repro. The same sequence of steps works fine in our dev/test envs. So might be something kernel version specific. What kernel are you running? |
|
I can reproduce this now. Note that a second Looking at the centos kernel source, it looks like the release of the vxlan UDP socket occurs asynchronously. So when we delete a vxlan vport and recreate it, the old one is not quite gone. However, the relevant section of kernel code ( |
@dpw and I have just discussed a fix for this, which is to implement a limited retry loop with a short sleep (e.g. 5 tries, 10ms) on that particular error. |
appears this happens on fresh installs also, i've just tried to install on a fresh install and it refused to start, looking at the container logs, i saw the error, removing the bridge manually and starting fixed it right up. |
Is it the exact same error? If you've got an existing bridge on the machine from an earlier version of weave, and then try to use fast datapath you will need to |
…-in-use Introduce limited retry on vxlan vport creation. LGTM; fixes #1772.
If you stop the weave router for any reason and then try to launch it again it fails. Interestingly it starts without any issues at the next launch attempt.
How to reproduce on CentOS 7.1 and weave 1.3.1
Docker logs shows the following
So naturally checked the interfaces
However the router will launch successfully at the next attempt
Workaround
Here's the workaround that I've been using to launch the router (without a launch failure) after stopping it for any reason. Need to manually delete the weave interface/bridge before starting the router
(stopping the router since its already running)
Manually removing the interface/bridge
Starting weave
Starts without an issue
If this is a bug it would be great if there is a fix for this soon.
The text was updated successfully, but these errors were encountered: