-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Less allocatable memory with v20191119 #387
Comments
@pappnase99 This is because of #367, adding some reserved memory to Kubelet. The reason for that is to make sure that the kubelet doesn't get killed first if the memory pressure on the node gets too high. That said, reserving 2.5G sounds a bit high. |
We're having a similar issue. We're using r5.large instances in our EKS cluster in eu-west-1 and after updating our nodes from EKS image 1.14.7 to 1.14.8 our applications keep getting evicted on these instances. After investigation we've seen the same allocatable CPU and memory changes @pappnase99 mentioned. According to #367 calculation for an m5.large instance is:
@leakingtapan @natherz97 Isn't it too much to reserve 1741Mi on an instance that has 8Gi RAM? Is there a miscalculation here? |
We are having a similar problem as described by @vedatappsamurai
After upgrade to the latest AMI, the allocatable memory has dropped significantly to the following:
This caused a side effect on our production clusters, we had some cronjobs which used to request 6GB of memory and that would be fine because of earlier allocation numbers but with the new AMI, the cronjob pods were stuck in Pending state. We are essentially reserving 1740Mi from an 8Gi node which is a tad more than 20%. I don't think the formula mentioned here is a very good guideline. The numbers should be tuned further looking at historical usages etc. 255Mi per 4Gi RAM should be good enough IMHO. |
@mogren could we get an update to this issue? any plans on revisiting this formula? depending on the size of the cluster, this will incur considerable increases in the bill. |
@RTodorov I agree it seems unnecessary high. We should update the formula to work well for larger instances. |
We were able to mitigate the issue for now by adding an extra kubelet flag |
This is blocking my teams to update to a newer AMI, can we get the calculation adjusted @mogren @natherz97 ? |
It seems as if the calculation is shared with GKE and AKS... https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture#node_allocatable https://docs.microsoft.com/en-us/azure/aks/concepts-clusters-workloads#resource-reservations |
We're working on updating the formula to reserve a smaller percentage of resources available on each instance type, pending more tests. When the PR is published, I will link it here. |
@natherz97 any updates about the progress? The PR seems inactive for the last 2 weeks after an issue is was found by a user. |
@vincentheet Apologies for the delay, we're working on getting more eyes on the PR from other team members. |
Anyone has managed to apply
on a the new managed EKS Node Group, one that is not been created via custom CloudFormation? |
@natherz97 has this been fixed? I see there was a merged PR to reduce the memory allocated, but I just created a 1.14 EKS cluster, with nodes that use AMI amazon-eks-node-1.14-v20200507, and for a c5.large I only have 2580Mi available, which means that around 1400Mi is being reserved. Is that the correct allocation? It still seems a bit excessive. |
@avasquez614 I just launched a worker node using amazon-eks-node-1.14-v20200507 (ami-0486134a23d903f10) and confirmed this commit was included in the release, f1ae97b. Using a t3.2xlarge instance:
|
@natherz97 I created a ticket in the |
This seems to have been resolved in eksctl-io/eksctl#2443. @avasquez614 if you still have issues, see #507 as well. |
Where can one find the kubeReserved memory for all of the different instance sizes? |
We're in eu-central-1 using EKS 1.14 with t3.xlarge worker nodes. After updating to AMI v20191119 the allocatable memory went down from 16132820Ki to 13461148Ki. Also the allocatable CPU went down to 3920m which is less important. Wondering what's going on here.
The text was updated successfully, but these errors were encountered: