-
Notifications
You must be signed in to change notification settings - Fork 673
weave attempting to connect to deleted nodes, and failing with functioning ones #3021
Comments
FYI, the most troublesome other services are:
|
Feh. CoreOS rebooted itself, now it doesn't appear to be giving those messages anymore. Instead, every minute I am getting:
What can I do to help here? |
Hi @deitch, thanks for reporting this issue.
|
Output from all three workers:
|
I have similar problems, however my status looks a lot worse:
|
Same here, using kops after a rolling-update
|
@renewooller @SharpEdgeMarshall I think the point you are raising is tracked at #2797, and we will shortly release a new version which should improve matters. "attempting to connect" should not impact function in any way; would be nice to avoid it but not critical. |
Please note the code to remove deleted Kubernetes peers from the IP ownership list was released today, in Weave Net version 2.1.1. |
Somehow I missed this notification. So as of 2.1.1, if a peer is unreachable and no longer listed among kubernetes nodes, weave will remove it automatically and reap its addresses? |
Ah, but not on some sort of schedule? I guess in a cloud environment, if one is going away, others are coming back with some regularity.
I see. ETA/targeted release? |
v2.1.3 is released now. |
Thanks @bboreham . We will use that. Closing this issue, then. |
Note older peers will still attempt to connect to deleted nodes, because that part hasn't changed. When they (eventually) restart they will pick up an updated list of peers and stop trying to connect to the deleted ones. |
When you say "older peers", you mean "peers running pre-2.1.3 versions of weave"? |
"older" meaning "started prior to the time a node was deleted". The new release addresses the point made at #3021 (comment) ; it does not address "attempting to connect" which was the original subject of this issue. |
Result: B and C will continue attempting to connect to A, while D will not.
Result: C will continue attempting to connect to A and B, while D and E will not. Is that it? And if so, the IP addresses nonetheless will be reaped, because D (and later E), running 2.1.3 will recognize the disappearance of A (later B) and reap those addresses, which get synced up among all nodes?
True, but a- part of the issue was deleted nodes; b- this is old enough and older releases enough ago (is that English?), that I am willing to close the issue I opened, will open a new one if I see recurrence in 2.1.3+ But I only opened the issue, do not maintain; if you want to reopen to track, by all means. |
@deitch yes, you have it. |
Excellent, thanks. We should be upgrading our weave pods in the next few days... and rather looking forward. |
What you expected to happen?
weave-kube should be able to connect consistently to functional nodes and ignore deleted ones
What happened?
Logs show "flapping", in/out, but constantly tries to connect to deleted nodes
How to reproduce it?
Unsure. I am a little nervous to delete the DaemonSet for fear of losing the problem until it shows up again in prod
Anything else we need to know?
Running on kube on AWS.
Versions:
Logs:
or, if using Kubernetes:
In the above,
10.50.21.91
and10.50.21.226
and10.50.20.63
are old nodes that have since been deleted from kube (notNotReady
but actually deleted).I do not know if this is affecting some oddness with other pods, like services being unable to find endpoints even though they are fine, and sometimes nodes being unable to find the internal kube API service endpoint
10.200.0.1:443
. FWIW,weave-kube
accesses it just fine:The text was updated successfully, but these errors were encountered: