-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Flaky] Services should create endpoints for unready pods - Netfilter Bug #12
Comments
I can reproduce the problem, it seems that when using curl sometimes it take too long to answer and times out ... surprisingly if I use ping or nslookup it replies fast ... |
it is only with curl, nslookup works perfectly and fast
cc: @thockin @BenTheElder hmm, is everything in this specific container that is failing dns queries ?? tcpdump inside the container gets the resolved dns request but curl does not use it
|
hmm
|
heh, so my theory of being conservative supporting iptables-legacy was a disaster, @danwinship he asked me about it just yesterday ... so it turns out that if I remove the iptables rule DNS starts working perfectly
There is a bug somehow with nfqueue that causes this behavior setting |
/close as not planned, we'll use only nftables |
@aojea: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@aojea: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
it seems is a netfilter problem unrelated to iptables/nftables |
I hate this flake, it is pretty weird
|
so that suggests that nfqueue is somehow causing a packet to get moved to the wrong interface?
Do any other tests in this job rely on service DNS? Most of them use IPs I think... |
It runs all the DNS sig network tests The martian thing seems something to explore, both pods are in the same host ... maybe is how gce implement pod aliases? I need to check that |
Can we turn that error into comprehensible english? For context: Kernel log line is https://github.com/torvalds/linux/blob/ee5b455b0adae9ecafb38b174c648c48f2a3c1a5/net/ipv4/route.c#L1775 IP 10.64.3.4 is Device name "eth0" has IP 10.64.3.4 inside the CoreDNS pod, so that must be where the error is triggering? But why would it think that is martian? eth0 should be set up as a /24 (or similar) subnet. |
captured a pcap of one failure, appreciate if someone can take a look to it ... the symptom is that dns queries are too slow, so when the query goes through all the search domains dance is practically impossible it resolves in 2 seconds (curl timeout in the test) |
Not super helpful but a few notes: It looks like it is issuing a lot of duplicate queries very quickly, or else the pcap is "seeing" things twice. Maybe multiple threads? It does get a response @ 436, which is un-DNAT'ed @ 437 It looks like there are at least 2 DNS replicas in play: 10.64.1.10 and 10.64.3.4 |
it sees packets twice, see some of them are the same packet with the DNAT (svcIP/podIP) Interesting , similar issue also with coredns pods involved weaveworks/weave#3327 The resolv.conf here is like this
The repetitive queries are from the test , it tries to connect to the service and the dns resolution continously timeouts,
as you can see in the pcap, the one that succeed was me that connected to the pod and performed a nslookup ... increasing the curl timeout will reduce flakiness but taking more than 2 seconds to resolve a service name worries me |
hmm, this is annoying and suspicious , checking two occurences coredns pods are involved in these martian errors when this test happens |
the A and AAAA requests for the same record share the same source port, a possible reason can be this interact weirdly with the nftables logic opnsense/core#4708 |
The DNAT ones are easy because they IP changes. But there are a lot of what looks like identical packets. I wonder if the capture is see it on both sides of the veth? |
There are also packets that are lost os duplicates Checking the captured headers shows that the packet is like going out and coming from the eth0 device
|
more related literature https://patchwork.ozlabs.org/project/netfilter-devel/patch/CAMxdDZBwqRxZjywAfHUm-bbe-0veLPqPwAfFpw90cb0As80Dmg@mail.gmail.com/ and https://redmine.openinfosecfoundation.org/issues/2806 I may try to repro with a kselftest |
How dod you capture the data? The second image suggests two different contexts - Left is "to us" (i.e. inside the DNS pod) and right is "from us" (i.e. inside the client pod). Maybe the MAC changes because of the root netns routing? I've never looked at that behavior in detail. |
I have an internal cluster created with cluster-up.sh , and if you run the e2e test several times it eventually triggers the behavior. I modify the test timeouts to hold for hours so the environment is not wiped out ... ping me offline to get credentials and more details instructions if you want to experiment and reproduce |
no more flakes, most probable a race with same tuple on different packets on the nfqueue/netfilter logic ... smells like a kernel bug https://testgrid.k8s.io/sig-network-gce#network-policies,%20google-gce&width=20 |
Reopening, opening bug in netfilter, the bugzilla is down so copying here for reference Title: nfqueue random output behavior with packets with same tuple Description: I was puzzled by this problem for a long time, first reported in It seems I was able to narrow down the scenario, I will try to 2 nodes: N1 and N2 N1 contains two containers:
One rule to send the packets to nfqueue in postrouting, but it
The containerd DNS servers are abstracted via DNAT with IP 10.96.0.10
C1 sends a DNS request to the virtual ip 10.96.0.10, because of the
When tracing the packets I could observer two different reasons for
and
If I enable martian logging net.ipv4.conf.all.log_martians=1 it also
An interesting detail is that only seems to happen with DNS (2 packets Since the behavior is not deterministic but reproducible, it makes me I would like some help on two fronts:
|
/close this was fixed in the kernel torvalds/linux@8af79d3 now it needs to wait for distros to backport it cc: @adrianmoisey , the workaround is to not filter DNS or just have only one replica kubernetes-sigs/kind#3713 🤷 |
@aojea: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
It flakes on the network policy jobs https://testgrid.k8s.io/sig-network-gce#network-policies,%20google-gce
Analyzing one occurrence https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-network-policies/1782274345247182848
I can see DNS request on the trace but never a try to connect to the Service, that may indicate that something is dealing with the DNS request or response
The text was updated successfully, but these errors were encountered: