-
Notifications
You must be signed in to change notification settings - Fork 670
Weave net randomly blocking connections #3829
Comments
Corresponding issues: - weaveworks/weave#3829 - weaveworks/weave#3761 Signed-off-by: Maxim Filatov <pipopolam@gmail.com>
@bcollard thanks for supplying the logs. Note the way Weave-NPC works is it sets up a rule to drop everything by default, then adds more rules to allow traffic according to network policies. The first set of 'blocked' reports is very simply explained:
This comes around 100ms before Weave-NPC learns about this pod and is therefore able to open up traffic for it:
100ms is a plausible delay on a machine with a lot going on. The same pod gets blocked again a couple of seconds later, so that must be for a different reason:
Looking at the destination address, it is for this pod:
Weave-NPC has at this point received nine
The namespace of the source pod looks like this:
as far as I can see, rule Sorry, need to stop now, will continue analysis later. |
OK, let's look again. The block of interest is:
Here's the policy that seems to match the labels:
Weave-NPC creates a rule to match that policy:
When the destination pod appears it is added to sets including the
Source pod is added to the
Everything so far is working as intended. TBC... |
So I went to look at the iptables rules you supplied, and I notice there are two places where we bump the counter that causes the "blocked" log:
but only one of them actually drops the traffic. This changed in #3639 - prior to that we dropped in both places, but after we only drop if an egress policy is added. And you only have ingress policies. So, @bcollard, is it possible this is a spurious log and the traffic is not actually dropped? |
This traffic still shouldn't be getting logged by the egress rules because we added the source address to this set:
and we mark traffic originating in that set:
So I'm not sure what is happening there. |
Most "blocked by" messages around this time are either shortly before a pod's IP address is known, or for the pair A different kind is:
The first time we hear of 10.244.32.3 from the API server is half a second later:
but because it is in phase "Succeeded", i.e. it has exited, Weave-NPC doesn't bother adding it to the rules. however it is still apparently trying to talk one second later:
a similar sequence happens for IP 10.244.32.6 at 15:30:03. So that's the first 20 minutes or so in your logs. None of it is "random", but one particular set of reports is a mystery. @bcollard given the lag in notifications, I recommend you check that Weave Net, and your nodes in general, have enough resources. In the supplied manifest we specify quite a small CPU It's also not great to have the two different IP ranges in use - only one of them is known to kube-proxy and while I cannot find a direct connection between that and your troubles, it may be related. I suggest you do a rolling reboot of all nodes to clear out all interfaces and settings from the older range. |
Thanks for your feedbacks, I'll have a look on next week, as I'm having a few days off. |
What you expected to happen?
No connection randomly blocked, especially when there are no netpol (network policies) or when netpols target different ports and pods. See logs at the end of this issue.
Same issue than #3761
What happened?
Looking at the logs of my weave-net pods, I can see multiple connections blocked by weave-npc. Looks like it's random.
How to reproduce it?
Installed Weave CNI plugin with this method: https://www.weave.works/docs/net/latest/kubernetes/kube-addon/#-installation
Anything else we need to know?
On-prem DC. K8s installed with kubeadm. Here is my kubeadm
ClusterConfiguration
: https://gist.github.com/bcollard/b55108e4355b4edab169a025e02723e8I first installed Weave-net without specifying the pod subnet. So pods spawn with IP address within the 10.32.0.0/12 range. Then, I redeployed the weave daemonset with the IPALLOC_RANGE set to 10.244.0.0/16, which is the subnet configured for pods in kubeadm.
That's why you'll see pods with these 2 kinds of IP; and also a bit more IP routes and rules.
Versions:
Logs:
https://gist.github.com/bcollard/bda85abdcb2ee9c5779dc38512b494eb
$ journalctl -u docker.service --no-pager
https://gist.github.com/bcollard/0b95c22958cd1b8f71ab264eaf19e8ef
$ journalctl -u kubelet --no-pager
https://gist.github.com/bcollard/000ad7e54a696a7cb3d87c526bcab847
$ kubectl get events
-- executed in a namespace called 'r3' where connections between pods are blocked sometimes:https://gist.github.com/bcollard/bf50d813b7cf355638aa9f94873b5ea7
Network:
https://gist.github.com/bcollard/8ff28dcef4dbe721baa6db5efadd4117
Weave npc logs:
Filtered logs:
https://gist.github.com/bcollard/4a5da7f16a1903a0a5755fc9ecd47163
Full logs:
https://gist.githubusercontent.com/bcollard/531596b08ba27a5dc2a931caec1c5ede/raw/54851c90f5fc291f1b63758d23542fcba824072c/weave-npc%2520full%2520logs
The text was updated successfully, but these errors were encountered: