-
Notifications
You must be signed in to change notification settings - Fork 672
Weave NPC Container Crashes and Keeps Restarting #3247
Comments
Is that the whole log? There should be a line like this:
|
@bboreham here is the log for it. Do you need more? huge log it is
|
Yes, I know the log is huge; it's huge because we logged everything we would need to debug complex issues. And now you have an issue and I can't debug it unless I can see everything I need to see. In particular, the one ipset from your error message should have been flushed and then destroyed, with each step logged. But it would be more efficient for you just to upload the whole log. |
@bboreham here is the log for the
|
@bboreham The weave-npc crashes because it is trying to create a ipSet which already exists. It already exists because the last deletion did not clear the ipSet. The ipSet mentioned here Line 380 in 2ab01b5
I was able to reproduce the issue by doing below.
Now we need to find out why one of the two ipSet does not get deleted on deleting the namespace |
@bboreham When namespace is getting terminated and at that time if we try destroying the
|
That's great troubleshooting, @alok87 ! Is that |
@bboreham I do not see any such msg in the log. I think weave npc is not logging it or is it? |
Do you mean they keep restarting every time you delete a namespace, or they restart again and again without pause? If the problem is triggered by a particular namespace delete event, why would the pods keep on restarting? |
@bboreham So we have a case where we create and delete namespace multiple times during the day. Whenever a namespace is deleted alls fine weave-npc does not restart. But whenever we create the namespace again with the same name, the weave-npc crashes with the error as mentioned above. And it then starts. |
OK, just wanted to check the crash was associated with a particular event. |
We are seeing the exact same issue here. Just wanted to corroborate @alok87's finding. The following sequence of commands brings down all weave pods:
The logs from a weave pod while running those commands:
|
@petergardfjall Yes it is the same issue, the fix PR #3250 is under approval. You can use this image |
Delete defaultAllowIPSet on namespace delete Fix #3247 Duplicate merge commit as the PR got merged to "master" instead of "2.2"
FYI, the fix has been released in Weave Net 2.2.1: https://github.com/weaveworks/weave/releases/tag/v2.2.1 |
What you expected to happen?
I expected weave-npc container to not crash and keep running
What happened?
Our weave npc container keeps crashing with the error show below. And Kubernetes restarts it.
How to reproduce it?
Not sure how to reproduce it.
Anything else we need to know?
Cloud provider: aws
Kubernetes version: 1.7.6
Weave version: 2.2.0 https://github.com/weaveworks/weave/releases/tag/v2.2.0
The text was updated successfully, but these errors were encountered: