-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected egress bandwidth out to internet #1911
Comments
Thanks for the detailed info @jwolski2. I will test it out on my cluster and let you know. |
Hey @jayanthvn I'm curious if you've had any time to dive into this issue. I've been working with AWS Support on a case about this same issue and after a few days of performing their own tests, they've concluded:
I'm not sure how to interpret that last bullet point so I'm coming back around to this issue to see if you've reached your own conclusions. Thanks! |
@jwolski2 - Sorry I didn't get time this week to verify this behavior. Will test it out by next week and update you. I can try both Ubuntu and EKS AMI. |
Apologies for piling on the nudges, but I'm curious @jayanthvn. Has there been any movement on this issue from your side? Thanks! |
@jwolski2 Sorry I got busy with few release activities. Will take a look asap. |
I've got good news. I've heard back from AWS Support and they suggested we either upgrade our Linux kernel version or our K8s version because they found evidence that there is an issue with the way the kernel is handling Kubernetes network traffic. Anyways, I upgraded from kernel version 5.11.0-1021-aws to 5.13.0-1021-aws and sure enough, where I was driving < 10Gbps to the internet gateway before, I am now able to drive 17-19Gbps. It's not exactly the 25Gbps I was hoping for, but it's a solid improvement. I'm not exactly sure where the issue lies. If you have any details about it, let me know! |
Thanks for the update. We also have the ticket in our queue and will look into it. EKS AMI kernel version is 5.4.181-99.354.amzn2.x86_64 so might be able to repro with EKS AMI too. |
One last update from our side: we pushed the kernel upgrade out to production and our app is now able to achieve at least 25Gbps whereas before it was topping out between 13-15Gbps. I haven't tested how far we can push the net i/o as we're happy enough now with performance. You may close this issue whenever you find it appropriate, @jayanthvn. |
Thanks for confirmation. So looks like it was a kernel issue. |
|
What happened:
Hey team, I'm wondering if you can help me sort out some network bandwidth issues we're having. We are running our application on Kubernetes on a c5n.9xlarge. I understand that a c5n.9xlarge has a maximum network throughput of 50Gbps. I also understand that the same instance type can drive 25Gbps to an internet gateway (Source). However, our application is able to drive less than 15Gbps to the internet gateway. And I've been able to reproduce similar behavior in our development environment with iperf where we seem to be limited to 10Gbps.
In our development environment, I have run the iperf client and server on 2 different c5n.9xlarge instances, with a mix of on-host and container combinations. The client and server commands typically look like this:
Client:
iperf -c PUBLIC_OR_PRIVATE_IP -p 32293 -P 20 -t 120 -i 5
Server:
iperf -s -p 32293
Here are the results I found:
*Based on the expected and actual results above, only the 4th test case does not meet expectations. This leads me to believe there is bottleneck egressing the container to the internet gateway. And that's the issue I'm trying to sort out.
Here's what the veth configuration looks like from within inside the container:
And from outside on the host:
And the iptables rules:
During my investigation, I stumbled upon this previously filed issue #1087 which suggests it may have been resolved after a EKS AMI / kernel update (NOTE: we are not running EKS or the EKS AMI). And so, I'm wondering if we're running into similar issues in the kernel and/or whether my report above can be validated/invalidated by your team.
(After filing the issue, I'll also send along the results of running the CNI Log Collection tool to your email address).
Thanks!
Environment:
kubectl version
): 1.19.15cat /etc/os-release
): Ubuntu 20.04.3 LTSuname -a
): 5.11.0-1021-aws # 22~20.04.2-Ubuntu SMPThe text was updated successfully, but these errors were encountered: