Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ensure nf_conntrack module loaded for kube-proxy. #743

Merged
merged 2 commits into from
Nov 18, 2024

Conversation

aznashwan
Copy link
Contributor

This patch ensures that the nf_conntrack kernel module is loaded before the kube-proxy service is started so it can read some necessary conntrack module-related params from procfs.

Previously, although the kube-proxy service always crashed if the module wasn't loaded, this wasn't that common of an occurrence in practice as there are quite a few ways nf_conntrack gets loaded transparently:

  • Cilium automatically loads iptable_nat after a small startup delay, whose dependency tree includes nf_conntrack
  • starting firewalld/ufw/most other firewall services
  • setting iptables/nftables rules which imply session tracking

By explicitly loading nf_conntrack from the kube-proxy service wrapper directly, it should ensure the procfs values kube-proxy reads are always present on startup.

@aznashwan aznashwan requested a review from a team as a code owner October 16, 2024 15:44
Copy link
Contributor

@bschimke95 bschimke95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM but there seems an issue with the patch

+ # Note that the 'iptable_nat' module which is automatically loaded
+ # by Cilium would load 'nf_conntrack' as a dependency anyway,
+ # but we add a separate explicit entry for it for safety.
+ conntrack-module-load:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes the build to fail:

yaml.parser.ParserError: while parsing a block mapping
  in "/home/runner/work/k8s-snap/k8s-snap/snap/snapcraft.yaml", line 4[33](https://github.com/canonical/k8s-snap/actions/runs/11369169743/job/31625959021?pr=743#step:8:34), column 5
expected <block end>, but found '-'
  in "/home/runner/work/k8s-snap/k8s-snap/snap/snapcraft.yaml", line 4[34](https://github.com/canonical/k8s-snap/actions/runs/11369169743/job/31625959021?pr=743#step:8:35), column 5

@bschimke95
Copy link
Contributor

looks like this breaks the integration tests.

@aznashwan
Copy link
Contributor Author

looks like this breaks the integration tests.

Yep, currently working through the failure on 20.04 in strict mode and will get to the main clustering tests afterwards!

This patch ensures that the `nf_conntrack` kernel module is loaded
before the `kube-proxy` service is started so it can read some
necessary conntrack module-related params from procfs.

Previously, although the `kube-proxy` service always crashed if the module
wasn't loaded, this wasn't that common of an occurrence in practice as
there are quite a few ways `nf_conntrack` gets loaded transparently:
* Cilium [automatically loads `iptable_nat`](https://github.com/cilium/cilium/blob/63cd391f93b4e2c865268241d384504348672042/pkg/datapath/iptables/iptables.go#L367-L368)
after a small startup delay, whose dependency tree includes `nf_conntrack`
* starting firewalld/ufw/most other firewall services
* setting iptables/nftables rules which imply session tracking

By explicitly loading `nf_conntrack` from the `kube-proxy` service
wrapper directly, it should ensure the procfs values kube-proxy reads
are always present on startup.

Signed-off-by: Nashwan Azhari <nashwan.azhari@canonical.com>
Signed-off-by: Nashwan Azhari <nashwan.azhari@canonical.com>
@aznashwan
Copy link
Contributor Author

@bschimke95 the mixed-version cluster test seems to have behaved nicely this time, and the only Go failure is an unrelated spellcheck, so PTAL!

@bschimke95
Copy link
Contributor

Go unit test failure is unrelated and fixed here: #809

Copy link
Contributor

@bschimke95 bschimke95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bschimke95 bschimke95 merged commit 99e7a5e into canonical:main Nov 18, 2024
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants