Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround dns kernel bug #95

Merged
merged 4 commits into from
Oct 7, 2024
Merged

Conversation

aojea
Copy link
Contributor

@aojea aojea commented Oct 7, 2024

Add a workaround to the netfilter conntrack bug https://bugzilla.netfilter.org/show_bug.cgi?id=1766 fixed by torvalds/linux@8af79d3.

Since the race problem is caused by having two packets DNATed with the same tuple at the same time by different CPUs, it impact specially DNS resolvers that sends A and AAAA request in parallel (it seems is default glibc behavior), and this is very visible to users that notice DNS latency.

Instead of processing all packets on the POSROUTING hook, special case DNS and process them in PREROUTING after DNAT happens.

Services implementations that don't rely on conntrack can disable this behavior by setting the flag to false.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 7, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Oct 7, 2024
@aojea aojea force-pushed the workaround_dns branch 4 times, most recently from fe7dc9d to fafafae Compare October 7, 2024 08:25
@adrianmoisey
Copy link
Contributor

I tested this locally and it works, thanks @aojea!

@aojea
Copy link
Contributor Author

aojea commented Oct 7, 2024

see if @danwinship wants to take a look, otherwise I will merge end of the day, so we can have it in kind

Copy link

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting a couple typos in comments. Thanks again for all your work on this!

cmd/main.go Outdated Show resolved Hide resolved
pkg/networkpolicy/controller.go Outdated Show resolved Hide resolved
pkg/networkpolicy/controller.go Show resolved Hide resolved
pkg/networkpolicy/controller.go Show resolved Hide resolved
pkg/networkpolicy/controller.go Outdated Show resolved Hide resolved
pkg/networkpolicy/controller.go Outdated Show resolved Hide resolved
Copy link

@danwinship danwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where's the regression test? 🙂

pkg/networkpolicy/controller.go Show resolved Hide resolved
@aojea
Copy link
Contributor Author

aojea commented Oct 7, 2024

where's the regression test? 🙂

do you mean in kubernetes/kubernetes? this showed up in this e2e #12
but it is a bug in the kernel, and its a race, there is a kselftest, but I don't see what type of tests we want to add here

aojea added 3 commits October 7, 2024 19:11
This was fixed in the kernel in 6.12 (commit 8af79d3edb5f)

However, users need a workaround to avoid hitting this bugs in existing
kernels.
Add a feature flag enabled by default to implement the workaround, that
consists in processing the DNS packets on the prerouting hook, after
dnat happens, so we can have the resolved IPs of the DNS server,
and avoid to process them in the postrouting hook.
@aojea aojea merged commit df33694 into kubernetes-sigs:main Oct 7, 2024
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants