Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spamming error "exit status 1: iptables v1.8.4 (nf_tables): table `filter' is incompatible, use 'nft' tool." #359

Closed
wangwill opened this issue Jul 9, 2023 · 8 comments

Comments

@wangwill
Copy link

wangwill commented Jul 9, 2023

k3s 1.25.10+k3s1 & 1.25.11+k3s1

All nodes are using iptables v1.8.7 (nf_tables).

This is a recent issue. Previous nodes haven't encountered this error and they are functioning well.

After adding another node today, the logging system is spamming the following error:

{"caller":"mesh.go:262","component":"kilo","error":"failed to reconcile rules: failed to check if rule exists: failed to populate chains for table "filter": running [/sbin/iptables -t filter -S --wait]: exit status 1: iptables v1.8.4 (nf_tables): table `filter' is incompatible, use 'nft' tool.\n\n","level":"error","ts":"2023-07-09"}


The network topo is full-mesh.
squat/kilo:0.6.0

@squat
Copy link
Owner

squat commented Jul 9, 2023

Hi @wangwill Interesting! It's very curious that it only happens on the new node. I'd be interested to know what is different about the new node, e.g. are the kernel and OS version the same?

To be clear, are the logs appearing on the new node or on the old nodes now that the new node was added?

I see some references to this issue across GitHub, e.g. coreos/go-iptables#73 and containernetworking/plugins#461.

It's odd that this happens only to Kilo. Kilo interacts with iptables using the same mechanism that Kube-Proxy does (iptables-wrapper) so I would expect that the same logs should appear on the Kube-Proxy containers. Can you check if Kube-Proxy is also complaining about the same issue?

This issue seems to stem from the table in question being accessed by using the nft command before iptables. Do you maybe know if this is the case?

Finally, is this issue repeatable? Ie does it happen for all new nodes and does it persist after restarts?

@wangwill
Copy link
Author

wangwill commented Jul 9, 2023

Hi, Squat

  1. All nodes are using Ubuntu 22.04.2 LTS, Kernel version 5.15.0-1038.
  2. Yes. The error logs are appearing on all kilo pods, aka old nodes as well.
  3. k3s server logs:

E0710 04:53:53.588766 1343671 network_policy_controller.go:277] Aborting sync. Failed to cleanup stale iptables rules: unable to list chains: running [/usr/sbin/iptables -t filter -S --wait]: exit status 1: iptables v1.8.7 (nf_tables): table `filter' is incompatible, use 'nft' tool.

  1. I tried iptables -L and iptables-nft -L commands.
    It is displaying errors as above when joined the cluster.
    iptables -L and iptables-nft are loaded after deleting the node and rebooting. (Other nodes within the cluster are still reporting the error.)

Re-join the node to the cluster, the same issue reappeared.

  1. Network traffic across the cluster is partly affected. Existing nodes & pods are communicating correctly while newly joined nodes are having readiness timeout & unable to provision pvc(longhorn) etc errors.

@squat
Copy link
Owner

squat commented Jul 9, 2023

@wangwill thanks for the details. So indeed, it's not just a Kilo problem; it seems everyone using iptables-nft is affected, including the network policy controller and presumably also kube-proxy.

Was the cluster recently upgraded?

I suspect that the issue might have been around for a while but only became obvious when the new node was added. As in, the network policy controller may have been failing to list rules ever since some event occurred on the cluster that affected nftables but we only noticed it recently when Kilo failed to add a new node, since when a new node is added, the other nodes need to update their iptables rules.

Can you look back into journald to check when the error was first logged by the k3s server?

@wangwill
Copy link
Author

wangwill commented Jul 9, 2023

@squat You are correct. This issue has been ongoing for a while. It is a new testing cluster and it hasn't been upgraded after the 1st init.

The error log can be traced back to 27 June 2023 after I applied:
kubectl apply -f https://raw.githubusercontent.com/squat/kilo/main/manifests/kube-router.yaml

But during this period, the 5 nodes cluster was running without any errors until today when the major issue occurred.


This is the 1st time the error message popping up in the journal log

Jun 27 15:58:32 node-a systemd[1]: run-containerd-runc-k8s.io-29813380f6485c48bdc001419f3b9004e6564b9955034975d65040e73c282875-runc.djDoNN.mount: Deactivated successfully.
Jun 27 15:58:37 node-a systemd[1]: run-containerd-runc-k8s.io-29813380f6485c48bdc001419f3b9004e6564b9955034975d65040e73c282875-runc.ajLgoa.mount: Deactivated successfully.
Jun 27 15:58:38 node-a systemd[1]: run-containerd-runc-k8s.io-d2d94380ec43c3c1124370c76ef1ecb4e5758abe1fa57da82197c322a0bc1c3b-runc.MlghJk.mount: Deactivated successfully.
Jun 27 15:59:10 node-a k3s[2061]: E0627 15:59:10.412783    2061 network_policy_controller.go:292] Failed to cleanup stale ipsets: failed to delete ipset KUBE-DST-RPURVQE4ODVUOI6S due to ipset v7.15: Set cannot be destroyed: it is in use by a kernel component
Jun 27 15:59:20 node-a k3s[2061]: W0627 15:59:20.564830    2061 machine.go:65] Cannot read vendor id correctly, set empty.
Jun 27 15:59:26 node-a systemd[1]: run-containerd-runc-k8s.io-abcc4a1abfebc12f51240fd2f857df76322c8d11d2aa2955a97f2d15aa3d9f21-runc.iFeJPp.mount: Deactivated successfully.
Jun 27 15:59:42 node-a systemd[1]: run-containerd-runc-k8s.io-29813380f6485c48bdc001419f3b9004e6564b9955034975d65040e73c282875-runc.HhplFH.mount: Deactivated successfully.
Jun 27 15:59:42 node-a systemd[1]: run-containerd-runc-k8s.io-29813380f6485c48bdc001419f3b9004e6564b9955034975d65040e73c282875-runc.knMnBf.mount: Deactivated successfully.
Jun 27 15:59:55 node-a k3s[2061]: I0627 15:59:55.408386    2061 topology_manager.go:205] "Topology Admit Handler"
Jun 27 15:59:55 node-a k3s[2061]: E0627 15:59:55.408609    2061 cpu_manager.go:394] "RemoveStaleState: removing container" podUID="bad077f1-13ab-4425-a277-69f8c8116f23" containerName="coredns"
Jun 27 15:59:55 node-a k3s[2061]: I0627 15:59:55.408647    2061 memory_manager.go:345] "RemoveStaleState removing state" podUID="bad077f1-13ab-4425-a277-69f8c8116f23" containerName="coredns"
Jun 27 15:59:55 node-a systemd[1]: Created slice libcontainer container kubepods-besteffort-pod811fccdc_9209_4bf7_b8d3_f76ff6a8b090.slice.
Jun 27 15:59:55 node-a k3s[2061]: I0627 15:59:55.570587    2061 reconciler.go:357] "operationExecutor.VerifyControllerAttachedVolume started for volume \"xtables-lock\" (UniqueName: \"kubernetes.io/host-path/811fccdc-9209-4bf7-b8d3-f76ff6a8b090-xtables-lock\") pod \"kube-router-r7dnc\" (UID: \"811fccdc-9209-4bf7-b8d>
Jun 27 15:59:55 node-a k3s[2061]: I0627 15:59:55.570640    2061 reconciler.go:357] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-hkgmd\" (UniqueName: \"kubernetes.io/projected/811fccdc-9209-4bf7-b8d3-f76ff6a8b090-kube-api-access-hkgmd\") pod \"kube-router-r7dnc\" (UID: \"811f>
Jun 27 15:59:55 node-a k3s[2061]: I0627 15:59:55.570670    2061 reconciler.go:357] "operationExecutor.VerifyControllerAttachedVolume started for volume \"lib-modules\" (UniqueName: \"kubernetes.io/host-path/811fccdc-9209-4bf7-b8d3-f76ff6a8b090-lib-modules\") pod \"kube-router-r7dnc\" (UID: \"811fccdc-9209-4bf7-b8d3->
Jun 27 15:59:56 node-a systemd[1]: Started libcontainer container f5a5138c29c9ec7d1b67339d6ba68aa92df83d141c20c7241142fb8e3719215d.
Jun 27 15:59:56 node-a systemd[1]: run-containerd-runc-k8s.io-f5a5138c29c9ec7d1b67339d6ba68aa92df83d141c20c7241142fb8e3719215d-runc.CjFHMl.mount: Deactivated successfully.
Jun 27 15:59:57 node-a systemd[1]: run-containerd-runc-k8s.io-29813380f6485c48bdc001419f3b9004e6564b9955034975d65040e73c282875-runc.hJNNdF.mount: Deactivated successfully.
Jun 27 16:00:01 node-a systemd[1]: var-lib-rancher-k3s-agent-containerd-tmpmounts-containerd\x2dmount2592918531.mount: Deactivated successfully.
Jun 27 16:00:04 node-a systemd[1]: Started libcontainer container fcfb2763a590122421f91c3494582e8d34241efc7af9d4ed82178c9434b365ee.
Jun 27 16:00:07 node-a k3s[2061]: E0627 16:00:07.726665    2061 network_policy_controller.go:277] Aborting sync. Failed to cleanup stale iptables rules: unable to list chains: running [/usr/sbin/iptables -t filter -S --wait]: exit status 1: iptables v1.8.7 (nf_tables): table `filter' is incompatible, use 'nft' tool.
Jun 27 16:00:17 node-a systemd[1]: run-containerd-runc-k8s.io-29813380f6485c48bdc001419f3b9004e6564b9955034975d65040e73c282875-runc.pmPKEg.mount: Deactivated successfully.
Jun 27 16:00:20 node-a systemd[1]: run-containerd-runc-k8s.io-b6d22729787bcb4cfac75dfb839f5188ac6128d88beb60aa5acfaf7fbec665ca-runc.pojeIG.mount: Deactivated successfully.
Jun 27 16:00:32 node-a systemd[1]: run-containerd-runc-k8s.io-29813380f6485c48bdc001419f3b9004e6564b9955034975d65040e73c282875-runc.nOmMIK.mount: Deactivated successfully.
Jun 27 16:00:42 node-a systemd[1]: run-containerd-runc-k8s.io-29813380f6485c48bdc001419f3b9004e6564b9955034975d65040e73c282875-runc.boKLaJ.mount: Deactivated successfully.
Jun 27 16:00:45 node-a k3s[2061]: E0627 16:00:45.773842    2061 network_policy_controller.go:277] Aborting sync. Failed to cleanup stale iptables rules: unable to list chains: running [/usr/sbin/iptables -t filter -S --wait]: exit status 1: iptables v1.8.7 (nf_tables): table `filter' is incompatible, use 'nft' tool.
Jun 27 16:00:47 node-a systemd[1]: run-containerd-runc-k8s.io-29813380f6485c48bdc001419f3b9004e6564b9955034975d65040e73c282875-runc.chgelL.mount: Deactivated successfully.
Jun 27 16:00:52 node-a systemd[1]: run-containerd-runc-k8s.io-29813380f6485c48bdc001419f3b9004e6564b9955034975d65040e73c282875-runc.bnabLK.mount: Deactivated successfully.
Jun 27 16:01:07 node-a k3s[2061]: E0627 16:01:07.900918    2061 network_policy_controller.go:277] Aborting sync. Failed to cleanup stale iptables rules: unable to list chains: running [/usr/sbin/iptables -t filter -S --wait]: exit status 1: iptables v1.8.7 (nf_tables): table `filter' is incompatible, use 'nft' tool.
Jun 27 16:01:17 node-a systemd[1]: run-containerd-runc-k8s.io-29813380f6485c48bdc001419f3b9004e6564b9955034975d65040e73c282875-runc.lhonMn.mount: Deactivated successfully.
Jun 27 16:01:27 node-a systemd[1]: run-containerd-runc-k8s.io-29813380f6485c48bdc001419f3b9004e6564b9955034975d65040e73c282875-runc.CmnCMH.mount: Deactivated successfully.

@squat
Copy link
Owner

squat commented Jul 9, 2023

Nice find. It sounds like we might be getting somewhere. Unfortunately the Kilo manifest for kube-router does not pin the container image to a particular version. Can you check what version you are running? Maybe it's in the logs.

There are several references to incompatibility issues that arrive when the k8s/host version of iptables is greater than kube-router's (xref: cloudnativelabs/kube-router#1370); I wonder if you're running into something related.

@squat
Copy link
Owner

squat commented Jul 9, 2023

if you remove kube-router, do the issues go away (after a reboot)?

@wangwill
Copy link
Author

remove kube-router didn't fix the issue.

https://docs.k3s.io/advanced#old-iptables-versions

I updated the api server with "--prefer-bundled-bin" to use its bundled version of iptables binary rather than the OS ones.

@squat
Copy link
Owner

squat commented Jul 10, 2023

❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants