-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU load due to XDP program in iptables #8856
Comments
Just to make sure it's not specific to a combination of HEPs/GNPs/NetworkSets, I created a cluster with
Worker env:
|
Could you check your syslog for any This, apparently, only affected my on premise arch. Specs
Syslog
|
@spacegaucho similar stuff here:
Could you check your calico-node logs to see if ther's anything like the logs below?
|
@dzacball looks like we are onto something here. No, there's nothing referencing the events you mentioned: k stern -n kube-system calico-node-* --since=48h --no-follow | grep 'XDP actions did not succeed'
+ calico-node-zlbwb › calico-node
+ calico-node-gm87s › calico-node
+ calico-node-4fp5p › calico-node
+ calico-node-vkh8q › calico-node
+ calico-node-pphxf › calico-node
+ calico-node-cdp9k › calico-node
+ calico-node-8smsj › calico-node
+ calico-node-nfww9 › calico-node
+ calico-node-lptlx › calico-node
- calico-node-cdp9k › calico-node
- calico-node-4fp5p › calico-node
- calico-node-vkh8q › calico-node
- calico-node-nfww9 › calico-node
- calico-node-zlbwb › calico-node
- calico-node-8smsj › calico-node
- calico-node-lptlx › calico-node
- calico-node-gm87s › calico-node
- calico-node-pphxf › calico-node But the issue persists. Had to rollback to 3.25 and that "fixed" the issue. I'm browsing around for any additional information. |
@spacegaucho can you possibly check if the issue is gone with upgrading |
Sure! @mazdakn how should I go about that? My bpftool --version
bpftool v5.3.0 Should I test a v3.29.0-0.dev image? Or would I need to update the binary manually inside the pods? Thanks. |
@spacegaucho Thanks for being OK to test it. I'll provide you an image with the updated |
I got similar issue here. referring to the kernel version here: https://ubuntu.com/security/livepatch/docs/livepatch/reference/kernels So, not sure if this issue is related to specific kernel version or it's actually related the HWE kernel |
xref #8833 |
@spacegaucho here is an image with the updated bpftool (v7.4). Please give it a try, and let me know if it works for you.
This is based on master branch, but should be OK to test it in a v3.28 cluster. |
@dzacball @spacegaucho can you also disable XDP mode in iptables by setting Ref: https://docs.tigera.io/calico/latest/reference/resources/felixconfig |
@mazdakn I can verify that if I set |
@dzacball can you also try the image I mentioned above in one of your test cluster? |
@mazdakn I tested your image - it works, issue is gone. (As I already mentioned, I also did some tests a few weeks back with a self-built image, using v3.28.0 + latest bpftool, worked as well) |
@dzacball for testing. |
sorry for not replying earlier, will try it asap.
|
can confirm this fixed the issue for me as well.
|
@spacegaucho thanks for testing and confirmation. Did you also managed to set |
no, sorry, should i test that directly in vanilla 3.28? |
Yes, with vanilla 3.28 image. |
I notice that #8880 is flagged for 3.29, is it possible this will also get a 3.28 backport? I was about to go down the rabbit hole of switching to the eBPF datapath and exploring XDP, but don't want to run into this issue. |
@isugimpy we will definitely backport it to v3.28, and also v3.27. |
Oh, brilliant. I didn't catch that this was unique to iptables! Thank you! |
@mazdakn Do you have an ETA for upcoming 3.27 and 3.28 releases that will already contain this fix PR? |
@dzacball v3.27.4 is expected to be released early July (probably the first week). There is no ETA for 3.28 patch release yet. |
@mazdakn Thank you. |
Closing since the fixes (including back ports to v3.27 and v3.28) are merged now, and will be available in the next patch releases. |
@dzacball 3.28.1 is planned to be released in the second half of July. |
thanks for the heads-up |
@mazdakn Do you have any update when the release will arrive? Thank you! |
@mihivagyok we are in the process of releasing 3.28.1 atm. It most likely will be released early next week. |
ref: projectcalico/calico#8856 Signed-off-by: Artiom Diomin <artiom@kubermatic.com>
ref: projectcalico/calico#8856 Signed-off-by: Artiom Diomin <artiom@kubermatic.com>
Expected Behavior
Calico using a reasonable amount of CPU. No XDP/BPF related error logs in
calico-node
.Current Behavior
Calico using about 10x times it's usual CPU. In
calico-node
logs, I can see messages likePossible Solution
If I rebuild calico-node using an up-to-date version of
bpftool
(tried withv7.4
and worked), the issue is gone.Steps to Reproduce (for bugs)
Not sure about how to reproduce. It happens in our env.. and it was introduced by updates to map definitions in #8610.
Context
We can not use affected versions of Calico due to the high CPU load (and of course we aren't sure what other issues could arise)
Your Environment
Calico
v3.27.3
and/orv3.28.0
.K8s 1.28 + Ubuntu
20.04.6
workers with5.4.0-177-generic
kernel.More details
I could confirm that this issue was introduced by this PR: #8610. It is present in Calico
v3.27.3
andv3.28.0
. If I revert the PR, and build my own calico-node image on top ofv3.28.0
, the issue is gone. Also, if I rebuild calico-node with an up-to-date version ofbpftool
(tried withv7.4
and worked), the issue is gone.Related slack thread: https://calicousers.slack.com/archives/CPTH1KS00/p1713552425270619
The text was updated successfully, but these errors were encountered: