Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

"Troubleshooting Blocked Connections" -> "connection from x to y blocked by Weave NPC" lacks reason #3839

Open
ensonic opened this issue Jul 29, 2020 · 7 comments

Comments

@ensonic
Copy link

ensonic commented Jul 29, 2020

What you expected to happen?

The weave-npc logs are quite busy and not super easy to act on for me. I'd like to see what things are intentional and what are just status. I gets frequent messages like the one below on the WARN level, which makes me think it is bad. The troubleshooting guide https://www.weave.works/docs/net/latest/kubernetes/kube-addon/#-troubleshooting-blocked-connections also stops where it gets interesting. It could at least say if e.g. under normal circumstances one should never see those messages.

Next steps would be to understand why the connections are blocked to help people figure out what to do if the connection should not be blocked.

Some additional steps for the docs could be e.g.:

$ weave ps | grep 92.168.9.5
c26bb46773f3 1e:09:2e:df:98:67 192.168.9.5/24
ff11a844cb3d 1e:09:2e:df:98:67 192.168.9.5/24
$ docker ps -f Id=c26bb46773f3
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
c26bb46773f3        70f311871ae1        "/coredns -conf /etc…"   4 weeks ago         Up 4 weeks                              k8s_coredns_coredns-6955765f44-bmlfj_kube-system_c99072ac-ae93-4f7b-bac1-e3b3a053e7ec_1
$ docker ps -f Id=ff11a844cb3d
CONTAINER ID        IMAGE                  COMMAND             CREATED             STATUS              PORTS               NAMES
ff11a844cb3d        k8s.gcr.io/pause:3.1   "/pause"            4 weeks ago         Up 4 weeks                              k8s_POD_coredns-6955765f44-bmlfj_kube-system_c99072ac-ae93-4f7b-bac1-e3b3a053e7ec_1

(this leads my to the question why core-dns gets blocked?)

What happened?

Seeing lots of those in the logs:

WARN: 2020/07/29 10:33:58.547437 UDP connection from 192.168.9.19:32855 to 192.168.9.5:53 blocked by Weave NPC.
WARN: 2020/07/29 10:33:58.547446 UDP connection from 192.168.9.19:52196 to 192.168.9.5:53 blocked by Weave NPC.
WARN: 2020/07/29 10:33:52.153153 UDP connection from 192.168.9.18:50087 to 192.168.9.4:53 blocked by Weave NPC.
@Bregor
Copy link
Contributor

Bregor commented Aug 5, 2020

Duplicate of #3764, #3761, #3829 and so on? ;)

@bboreham
Copy link
Contributor

The way Weave-NPC works is it sets up a rule to drop everything by default, then adds more rules to allow traffic according to network policies. So we can see which rules are allowing things, but we cannot see anything specific about the absence of a rule which causes something to be blocked.

If you look at the logs do those blocking occasions happen right after the pod IP is notified, or a long time later?

@bboreham
Copy link
Contributor

@Bregor details matter absolutely, and without more detail we cannot say whether this issue is the same as any other.

@Bregor
Copy link
Contributor

Bregor commented Aug 25, 2020

Sure. Didn't want to offend anyone :)

@bboreham
Copy link
Contributor

bboreham commented Sep 2, 2020

Thinking about how hard it is to diagnose these issues (look at #3829 for instance), some tools could make it easier:

  • print out which network policies apply to pod A.
  • print out which network policies apply to a connection from pod A to pod B
  • given pods A and B and NetworkPolicy P, state whether P applies and if not why not

All of the above would be generic: the rules should be the same for any Kubernetes NetworkPolicy implementation.
Maybe something like this already exists? Please comment below if you know of one.

Specifically for Weave Net, then, we could say which iptables rules and ipsets are supposed to implement a given policy. Maybe even check that the ipset contains the correct address, and show how hit-counts on rules are changing to indicate whether a rule is operating.

@ensonic
Copy link
Author

ensonic commented Sep 9, 2020

I don't have any network policies:

kubectl get networkpolicies --all-namespaces
No resources found

We have a bunch of integration test machines, where this is the biggest source for flakiness. There we install helm-charts (to distinct namespaces), run stuff and tear things down. Quite often we experience either delays or conenction failure for grpc services. A connection is retried for up to 5 min. I would not expect the weave-npc to spend more than 5 min setting up iptable rules, right?

The warnings seems to appear whenever a test starts to run. I'll try to find a way to correlate events. If that is not easy, we'll probably disable weave-npc. Given the number of users that seem to hit this, it would be good to invest more time here to reproduce this from your side, otherwise people just turn off weave-npc and you'll never hear from them again.

Maybe consider to disable it by default to only have people running into this, if they actually use it.

@ensonic
Copy link
Author

ensonic commented Sep 9, 2020

Here is my repro:

( kubectl get events -w -o custom-columns=FirstSeen:.firstTimestamp,From:.source.component,Message:.message | grep "chartassignment.*chart$" & kubectl logs --follow -n kube-system -l name=weave-net  -c weave-npc | grep blocked )

The grep for chartassignment is our controller that creates a namespace and installs the yaml (https://github.com/googlecloudrobotics/core/tree/master/src/go/pkg/controller/chartassignment). With that we get the output below, where we can see that the warnings are printed right after a new 'app' got installed. From the output I can't tell if weave-npc is still busy applying policies though.

2020-09-09T13:43:00Z   chartassignment-controller   update chart
WARN: 2020/09/09 13:43:06.963438 UDP connection from 192.168.9.17:43912 to 192.168.9.5:53 blocked by Weave NPC.
WARN: 2020/09/09 13:43:06.963487 UDP connection from 192.168.9.17:46236 to 192.168.9.2:53 blocked by Weave NPC.
WARN: 2020/09/09 13:43:06.963504 TCP connection from 192.168.9.17:40387 to 10.103.113.161:7999 blocked by Weave NPC.
WARN: 2020/09/09 13:43:09.971415 UDP connection from 192.168.9.17:32898 to 192.168.9.5:53 blocked by Weave NPC.
WARN: 2020/09/09 13:43:09.971454 UDP connection from 192.168.9.17:59551 to 192.168.9.2:53 blocked by Weave NPC.
WARN: 2020/09/09 13:43:09.971472 TCP connection from 192.168.9.17:36835 to 10.103.113.161:7999 blocked by Weave NPC.
WARN: 2020/09/09 13:43:27.475353 UDP connection from 192.168.9.19:39679 to 192.168.9.2:53 blocked by Weave NPC.
WARN: 2020/09/09 13:43:27.475474 UDP connection from 192.168.9.19:44871 to 192.168.9.2:53 blocked by Weave NPC.
WARN: 2020/09/09 13:43:27.475516 TCP connection from 192.168.9.19:40728 to 10.98.129.70:10042 blocked by Weave NPC.
WARN: 2020/09/09 13:43:27.475572 TCP connection from 192.168.9.19:40732 to 10.98.129.70:10042 blocked by Weave NPC.
WARN: 2020/09/09 13:43:27.475613 TCP connection from 192.168.9.19:40734 to 10.98.129.70:10042 blocked by Weave NPC.
WARN: 2020/09/09 13:43:27.475653 TCP connection from 192.168.9.19:40738 to 10.98.129.70:10042 blocked by Weave NPC.
WARN: 2020/09/09 13:43:27.475680 TCP connection from 192.168.9.19:40740 to 10.98.129.70:10042 blocked by Weave NPC.
WARN: 2020/09/09 13:43:27.475707 TCP connection from 192.168.9.19:40742 to 10.98.129.70:10042 blocked by Weave NPC.
WARN: 2020/09/09 13:43:31.923379 UDP connection from 192.168.9.22:37488 to 192.168.9.5:53 blocked by Weave NPC.
WARN: 2020/09/09 13:43:31.923490 UDP connection from 192.168.9.22:60362 to 192.168.9.5:53 blocked by Weave NPC.
WARN: 2020/09/09 13:43:31.923529 TCP connection from 192.168.9.22:42114 to 10.109.134.225:9001 blocked by Weave NPC.
WARN: 2020/09/09 13:43:42.103424 UDP connection from 192.168.9.24:41034 to 192.168.9.2:53 blocked by Weave NPC.
WARN: 2020/09/09 13:43:42.103484 UDP connection from 192.168.9.24:47102 to 192.168.9.2:53 blocked by Weave NPC.
WARN: 2020/09/09 13:43:42.103502 TCP connection from 192.168.9.24:32834 to 10.109.134.225:54320 blocked by Weave NPC.

A few observations:

  • Interesting is e.g. '192.168.9.5:53' thats core-dns which I am not sure anyone would block.
  • This block of warnings can easily span 150 lines and take 20-30 sec

Isn't there a way for weave-npc to defer actual pod-startup until you are done applying the (empty) policy?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants