You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.
Peer connection attempts to self not to be reported as failed.
Background
We got alarmed by these. After some research, the takeway is that it's noise. However, also saw a lot of issues about peers not being able to join shortly after CONN_LIMIT. We don't scale nodes much, so less of a concern for us, but still thought we should pay attention to this metric just in case. Our workaround is simply to treat N failed connections for N nodes in the cluster as normal.
Adding special handling in the metrics reporting to ignore when Info ~ "cannot connect to ourself" might be too brittle.
Patch something in the kubernetes add-on so that a node doesn't try to connect to itself. The hope is that something in kubernetes land can better inform weave which peers to try.
If core devs provide a bit of guidance, we might be able to submit a PR.
If core devs provide a bit of guidance, we might be able to submit a PR.
@erik-stephens Apologies for the delay in response. Please give it a try to submit a PR.
Please take a look at the kube-utils program which when called returns the list of the current set of Kubernetes node Ip'ss. That list is passed by launch.sh to launch main weave program by passing passing the llist as argument.
So try to exclude the self node in the returned list by kube-utils.
What you expected to happen?
Peer connection attempts to self not to be reported as
failed
.Background
We got alarmed by these. After some research, the takeway is that it's noise. However, also saw a lot of issues about peers not being able to join shortly after
CONN_LIMIT
. We don't scale nodes much, so less of a concern for us, but still thought we should pay attention to this metric just in case. Our workaround is simply to treat N failed connections for N nodes in the cluster as normal.Some Ideas
Info ~ "cannot connect to ourself"
might be too brittle.If core devs provide a bit of guidance, we might be able to submit a PR.
Versions:
Logs:
The text was updated successfully, but these errors were encountered: