-
Notifications
You must be signed in to change notification settings - Fork 673
Weave fail to assign IPv4 / Remove ephemeral peers from Weave Net via AWS ASG lifecycle hook #2970
Comments
Hi @hollowimage, thanks for raising this issue. |
unfortunately I do not. at the time we were rather in a fire mode. I reported the issue only after the cluster came back up without problems at the end of the day monday, by which point the pods were destroyed and recreated. I think last time i looked at their logs, there were some tidbits about failing to bind to |
@hollowimage, the only thing which comes to mind with:
would be if you have two instances of Weave Net running, e.g.:
|
i do not do manual weave starts. its all done through DS. I was using weave 1.9.3 I think at the time, since then, as part of troubleshooting, updated the DS definition to pull 1.9.5. to elaborate again: everything was fine. then one day when my cluster scaled back up from 0 kubelets to 2 (we scale it down at night), the above behavior started happening. |
This happened again.
Here are some logs from the weave container:
|
|
|
Looks like this is the same as #2797 |
As discussed on Slack workaround is to |
Also, given the workers are shut down via an AWS' Auto-Scaling Group, implementing a lifecycle hook might help in this case. |
Removing all terminated peers did the trick and cluster instantly reallocated the private range to weave pods and everything came back to life. In our case this was safe to do since they have been prior permanently destroyed. |
For reference, similar situation here today. Got a cluster with some history (many nodes rotated over 3rd party api termination, not exactly a graceful process), recently our nodes started getting stuck in a weird state where the CNI was not configuring network as expected, which resulted in a bunch of errors like
after |
Same issue here, on a cluster with some autoscaling happening daily (more or less 10 node creation action per day). |
Hello, this issue is still happening in with AWS ASG, any news for a fix? Workaround is to fix with a weave rmpeer. |
this was fixed per #2797 |
kubernetes/kubernetes#45858
x-submitting here per request.
In short: through some set of circumstances, weave-net pods failed to snag ipv4 interface and the rest of the cluster was pretty much down.
The text was updated successfully, but these errors were encountered: