-
Notifications
You must be signed in to change notification settings - Fork 679
Log shows frequent connection deleted message #3609
Comments
thanks for reporting this issue. there is description of the situations when this can arise: Please try the solution described in #1946 (comment) |
Thank you for getting back to me.
Since it's ignored on k8s cluster, how do I proceed? One of my ideas is to delete every node from k8s cluster and delete weave daemon set. Then reinstall weave daemon set, and add each node back one by one. Do you think it can help? I try to avoid doing this because it will cause maintenance window. But if there is no other solutions, I'll have to go by this. Thank you |
Its odd every node in the cluster you are seeing this error. Just confirming. Error Could you please share
If you are on Kubernetes, then triggering a rolling update would be less disruptive. Manifest has |
Thank you very much for the reply and rollingupdate ideas
I doubled checked all nodes, there are 23 out of 26 nodes have this error, including master node. The rest 3 nodes have error of connection reset due to multiple entries to master node, and 1 of them have a weird line saying:
Here is the weave status, it's quite long because we have 26 nodes in k8s cluster. I added it as a gist |
That sound like IPAM data is in inconsistent state across all of the cluster nodes. Please delete the DB file on each node and restart the corresponding weave-net pod, or perform rolling upgrade. See if this helps before you attempt to delete and readd the nodes. |
I tried rolling update but seems not working, so in the end I have to start a maintenance window, delete the daemon set, and reinstall weave ds. |
@murali-reddy Command runs weave connections and grep $ kn get pods -n kube-system -o wide | grep weave | awk '{print $1}' | xargs -I {} kubectln exec {} -c weave -n kube-system -- /home/weave/weave --local status connections | grep Merge
-> 10.2.20.25:6783 failed Merge of incoming data causes: Entry 100.96.0.0-100.96.0.1 reporting too much free space: 2030 > 1, retry: 2020-01-26 15:24:58.430309658 +0000 UTC m=+57784.687751066
-> 10.2.21.252:6783 failed Merge of incoming data causes: Entry 100.96.0.0-100.96.0.1 reporting too much free space: 2030 > 1, retry: 2020-01-26 15:24:58.528000782 +0000 UTC m=+28655.705965770 How to find which node is causing inconsistency ? |
@alok87 please open a seperate issue on what symptoms you are seeing.
IPAM inconsistency is not necessaruly result of single node. Some of the cases are outlined in #1962 If you are running 2.6 we have debug logging specific to IPAM to help find the sequnce of IP allocations an gossip resulintg in inconsistency. |
What you expected to happen?
No error message in the log
What happened?
In our kubernetes cluster, the following error messages happen every 3 minutes in the log file, connections between nodes seems fine, but we saw duplicate ip once. After following the instruction of deleting db file under /var/lib/docker and reboot, duplicate ip issue is gone, however still seeing the following error messages in the log, will such error cause any problems?
How to reproduce it?
Not exactly sure when this happens, but seems showing up after we removed 2 nodes from k8s clusters
Anything else we need to know?
Our cluster is run on bare metal, here is the yaml file
We use k8s yaml file. Let me know if you want to see the configure yaml file. We only added env variable IPALLOC_RANGE=172.18.128.0/17
Versions:
$ weave version
2.5.1
$ docker version
18.06.1
$ uname -a
4.4.159-1.el7.elrepo.x86_64
$ kubectl version
1.12.1
The text was updated successfully, but these errors were encountered: