Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eksctl creating worker nodes with too much reserved memory for kubelet #2395

Closed
avasquez614 opened this issue Jul 2, 2020 · 9 comments · Fixed by #2443
Closed

Eksctl creating worker nodes with too much reserved memory for kubelet #2395

avasquez614 opened this issue Jul 2, 2020 · 9 comments · Fixed by #2443
Assignees
Labels
kind/bug priority/important-soon Ideally to be resolved in time for the next release

Comments

@avasquez614
Copy link

We've been using eksctl for a while and it has been working wonderfully. However, a couple of months ago we've noticed that the nodes that eksctl was creating had too much kubeReserved memory. For example, just yesterday I created a 1.14 EKS cluster, with a single m5.xlarge worker, and this is what it has for kubeReserved:

"kubeReserved": {
    "cpu": "80m",
    "ephemeral-storage": "1Gi",
    "memory": "2662Mi"
  }

I had made some research before and I thought it was because of this issue with the AWS AMIs: awslabs/amazon-eks-ami#387, but according to one of the devs that issue has already been resolved. So now I don't really know what's causing so much memory to be reserved for the kubelet. Any ideas?

@avasquez614 avasquez614 added the kind/help Request for help label Jul 2, 2020
@avasquez614
Copy link
Author

By the way this is the info on the node:

Kernel Version:             4.14.181-140.257.amzn2.x86_64  
Kube Proxy Version:         v1.14.9-eks-49202c 
Kubelet Version:            v1.14.9-eks-49202c 
Operating System:           linux
Os Image:                   Amazon Linux 2

@martina-if martina-if added the priority/important-soon Ideally to be resolved in time for the next release label Jul 6, 2020
@brianpursley
Copy link
Contributor

brianpursley commented Jul 7, 2020

Edit: I understand now, but leaving this comment here for context where the corresponding change is probably needed in eksctl...


Are you thinking that the memory is calculation is wrong, or that the calculation is performing correctly, but that the amount it returns is higher than you think it should be? (not judging with that question, just trying to clarify by the way)

I didn't do the math, but 2662Mi reserved on a 16Gi instance seems to match how the calculation is designed.

See these comments I found explaining how eksctl calculates the value:
https://github.com/weaveworks/eksctl/blob/0afe5f1ff23a87eda004662cd278a1e5d5f6b20a/pkg/nodebootstrap/reserved.go#L94-L104

And these test cases showing the calculated values for a few memory sizes. There isn't a test case for 16Gi, but there is one for 12Gi and it reserves 2252Mi:
https://github.com/weaveworks/eksctl/blob/0afe5f1ff23a87eda004662cd278a1e5d5f6b20a/pkg/nodebootstrap/reserved_test.go#L30-L37

It looks like the algorithm eksctl uses matches what is in amazon-eks-ami

@brianpursley
Copy link
Contributor

brianpursley commented Jul 7, 2020

Oh., I think I see what you are saying.
In the merged PR, it shows that the reservations are reduced quite a bit. I wonder what happened to that PR because it doesn’t seem to reflect in the code.

t3.small (2 vCPU, 2 GiB, 11 pods): 321 Mi
t3.medium (2 vCPU, 4 GiB, 17 pods): 357 Mi
t3.large (2 vCPU, 8 GiB, 35 pods): 465 Mi
t3.xlarge (4 vCPU, 16 GiB, 58 pods): 603 Mi
t3.2xlarge (8 vCPU, 32 GiB, 58 pods): 603 Mi
m5.4xlarge (16 vCPU, 64 GiB, 234 pods): 1659 Mi
m4.10xlarge (40 vCPU, 160 GiB, 234 pods): 1659 Mi
m5.12xlarge (48 vCPU, 192 GiB, 234 pods): 1659 Mi
r4.8xlarge (32 vCPU, 244 GiB, 234 pods): 1659 Mi
m5.24xlarge (96 vCPU, 384 GiB, 737 pods): 4677 Mi


Edit: Ok, now i see that was the PR to update the AMI. So I guess eksctl needs to be updated to reflect the same.

Sorry for the confusion on my part, but hopefully this can add more context around the issue.


Edit 2: The calc done in the AMI is based on the max number of pods for a given instance type, which is a lookup from a file based on the instance type:

  max_num_pods=$(cat /etc/eks/eni-max-pods.txt | grep $instance_type | awk '{print $2;}')
  memory_to_reserve=$((11 * $max_num_pods + 255))

@avasquez614
Copy link
Author

avasquez614 commented Jul 7, 2020

@brianpursley Oh I think I understand. Eksctl sets its own reserved memory instead of using the AMI one correct? So does eksctl needs to change then too to follow the AMI reservation algorithm or are you guys going to keep the reservation the way it is right now? I just think it's a little bit too much.

@brianpursley
Copy link
Contributor

@avasquez614 yes, it appears so. I agree with you though, it looks like it doesn’t match what the ami does, so eksctl probably needs to be changed.

I’m not an eksctl dev, so I will defer to them, but I wonder if there is a way to not have to duplicate the logic because it can get out of sync like it is now.

@avasquez614
Copy link
Author

Thanks for your help @brianpursley. Is it possible to change this ticket to a possible bug?

@michaelbeaumont michaelbeaumont added kind/bug and removed kind/help Request for help labels Jul 7, 2020
brianpursley added a commit to brianpursley/eksctl that referenced this issue Jul 15, 2020
…sed in amazon-eks-ami

The reserved memory calculation was recently changed in amazon-eks-ami to be
based on pod density (11*maxPodsPerNode+255) instead of total memory.
See https://github.com/awslabs/amazon-eks-ami/blob/21426e27e3845319dbca92e7df32e5c4b984a1d1/files/bootstrap.sh#L163

This commit introduces a new property on InstanceTypeInfo to store MaxPodsPerNode
and populates it using the already available data that has been generated into
maxpods.go.

Because the image type specs (cpu, memory) come from an API call the the max
number of pods per node comes from a file in amaxon-eks-ami, there is a chance
that an image type is returned by the API for which there is no corresponding
record in maxpods.go. To handle this, I left the original memory calculation
in place as a fallback that will be used if max pods per node cannot be
determined.

Issue eksctl-io#2395
brianpursley added a commit to brianpursley/eksctl that referenced this issue Jul 15, 2020
…sed in amazon-eks-ami

The reserved memory calculation was recently changed in amazon-eks-ami to be
based on pod density (11*maxPodsPerNode+255) instead of total memory.
See https://github.com/awslabs/amazon-eks-ami/blob/21426e27e3845319dbca92e7df32e5c4b984a1d1/files/bootstrap.sh#L163

This commit introduces a new property on InstanceTypeInfo to store MaxPodsPerNode
and populates it using the already available data that has been generated into
maxpods.go.

Because the image type specs (cpu, memory) come from an API call the the max
number of pods per node comes from a file in amaxon-eks-ami, there is a chance
that an image type is returned by the API for which there is no corresponding
record in maxpods.go. To handle this, I left the original memory calculation
in place as a fallback that will be used if max pods per node cannot be
determined.

Issue eksctl-io#2395
brianpursley added a commit to brianpursley/eksctl that referenced this issue Jul 15, 2020
…sed in amazon-eks-ami

The reserved memory calculation was recently changed in amazon-eks-ami to be
based on pod density (11*maxPodsPerNode+255) instead of total memory.
See https://github.com/awslabs/amazon-eks-ami/blob/21426e27e3845319dbca92e7df32e5c4b984a1d1/files/bootstrap.sh#L163

This commit introduces a new property on InstanceTypeInfo to store MaxPodsPerNode
and populates it using the already available data that has been generated into
maxpods.go.

Because the image type specs (cpu, memory) come from an API call the the max
number of pods per node comes from a file in amaxon-eks-ami, there is a chance
that an image type is returned by the API for which there is no corresponding
record in maxpods.go. To handle this, I left the original memory calculation
in place as a fallback that will be used if max pods per node cannot be
determined.

Issue eksctl-io#2395
brianpursley added a commit to brianpursley/eksctl that referenced this issue Jul 15, 2020
…sed in amazon-eks-ami

The reserved memory calculation was recently changed in amazon-eks-ami to be
based on pod density (11*maxPodsPerNode+255) instead of total memory.
See https://github.com/awslabs/amazon-eks-ami/blob/21426e27e3845319dbca92e7df32e5c4b984a1d1/files/bootstrap.sh#L163

This commit introduces a new property on InstanceTypeInfo to store MaxPodsPerNode
and populates it using the already available data that has been generated into
maxpods.go.

Because the image type specs (cpu, memory) come from an API call the the max
number of pods per node comes from a file in amaxon-eks-ami, there is a chance
that an image type is returned by the API for which there is no corresponding
record in maxpods.go. To handle this, I left the original memory calculation
in place as a fallback that will be used if max pods per node cannot be
determined.

Issue eksctl-io#2395
brianpursley added a commit to brianpursley/eksctl that referenced this issue Jul 15, 2020
…sed in amazon-eks-ami

The reserved memory calculation was recently changed in amazon-eks-ami to be
based on pod density (11*maxPodsPerNode+255) instead of total memory.
See https://github.com/awslabs/amazon-eks-ami/blob/21426e27e3845319dbca92e7df32e5c4b984a1d1/files/bootstrap.sh#L163

This commit introduces a new property on InstanceTypeInfo to store MaxPodsPerNode
and populates it using the already available data that has been generated into
maxpods.go.

Because the image type specs (cpu, memory) come from an API call the the max
number of pods per node comes from a file in amaxon-eks-ami, there is a chance
that an image type is returned by the API for which there is no corresponding
record in maxpods.go. To handle this, I left the original memory calculation
in place as a fallback that will be used if max pods per node cannot be
determined.

Issue eksctl-io#2395
brianpursley added a commit to brianpursley/eksctl that referenced this issue Jul 15, 2020
…sed in amazon-eks-ami

The reserved memory calculation was recently changed in amazon-eks-ami to be
based on pod density (11*maxPodsPerNode+255) instead of total memory.
See https://github.com/awslabs/amazon-eks-ami/blob/21426e27e3845319dbca92e7df32e5c4b984a1d1/files/bootstrap.sh#L163

This commit introduces a new property on InstanceTypeInfo to store MaxPodsPerNode
and populates it using the already available data that has been generated into
maxpods.go.

Because the image type specs (cpu, memory) come from an API call the the max
number of pods per node comes from a file in amaxon-eks-ami, there is a chance
that an image type is returned by the API for which there is no corresponding
record in maxpods.go. To handle this, I left the original memory calculation
in place as a fallback that will be used if max pods per node cannot be
determined.

Issue eksctl-io#2395
brianpursley added a commit to brianpursley/eksctl that referenced this issue Jul 15, 2020
…ulation used by amazon-eks-ami

The reserved memory calculation was changed in amazon-eks-ami to be based on pod
density (11*maxPodsPerNode+255) instead of total memory.
See https://github.com/awslabs/amazon-eks-ami/blob/21426e27e3845319dbca92e7df32e5c4b984a1d1/files/bootstrap.sh#L163

This commit introduces a new InstanceTypeInfo property to store MaxPodsPerNode
and populates it using the data from maxpods.go so that it will be available to
use when calculing the default memory to reserve.

Because the image type specs (cpu, memory) come from an API call and the max
number of pods per node comes from a file in amaxon-eks-ami, there is a chance
that an image type is returned by the API for which there is no corresponding
record in maxpods.go. To handle this, the original memory calculation remains
as a fallback that will be used if max pods per node is unknown.

Issue eksctl-io#2395
brianpursley added a commit to brianpursley/eksctl that referenced this issue Jul 15, 2020
…ulation used by amazon-eks-ami

The reserved memory calculation was changed in amazon-eks-ami to be based on pod
density (11*maxPodsPerNode+255) instead of total memory.
See https://github.com/awslabs/amazon-eks-ami/blob/21426e27e3845319dbca92e7df32e5c4b984a1d1/files/bootstrap.sh#L163

This commit introduces a new InstanceTypeInfo property to store MaxPodsPerNode
and populates it using the data from maxpods.go so that it will be available to
use when calculing the default memory to reserve.

Because the image type specs (cpu, memory) come from an API call and the max
number of pods per node comes from a file in amaxon-eks-ami, there is a chance
that an image type is returned by the API for which there is no corresponding
record in maxpods.go. To handle this, the original memory calculation remains
as a fallback that will be used if max pods per node is unknown.

Issue eksctl-io#2395
brianpursley added a commit to brianpursley/eksctl that referenced this issue Jul 15, 2020
…ulation used by amazon-eks-ami

The reserved memory calculation was changed in amazon-eks-ami to be based on pod
density (11*maxPodsPerNode+255) instead of total memory.
See https://github.com/awslabs/amazon-eks-ami/blob/21426e27e3845319dbca92e7df32e5c4b984a1d1/files/bootstrap.sh#L163

This commit introduces a new InstanceTypeInfo property to store MaxPodsPerNode
and populates it using the data from maxpods.go so that it will be available to
use when calculing the default memory to reserve.

Because the image type specs (cpu, memory) come from an API call and the max
number of pods per node comes from a file in amaxon-eks-ami, there is a chance
that an image type is returned by the API for which there is no corresponding
record in maxpods.go. To handle this, the original memory calculation remains
as a fallback that will be used if max pods per node is unknown.

Issue eksctl-io#2395
brianpursley added a commit to brianpursley/eksctl that referenced this issue Jul 15, 2020
…ulation used by amazon-eks-ami

The reserved memory calculation was changed in amazon-eks-ami to be based on pod density (11*maxPodsPerNode+255) instead
of total memory.
See https://github.com/awslabs/amazon-eks-ami/blob/21426e27e3845319dbca92e7df32e5c4b984a1d1/files/bootstrap.sh#L163

This commit introduces a new InstanceTypeInfo property to store MaxPodsPerNode and populates it using the data from
maxpods.go so that it will be available to use when calculing the default memory to reserve.

Because the image type specs (cpu, memory) come from an API call and the max number of pods per node comes from a file
in amaxon-eks-ami, there is a chance that an image type is returned by the API for which there is no corresponding
record in maxpods.go. To handle this, the original memory calculation remains as a fallback that will be used if max
pods per node is unknown.

Issue eksctl-io#2395
brianpursley added a commit to brianpursley/eksctl that referenced this issue Jul 15, 2020
…ulation used by amazon-eks-ami

The reserved memory calculation was changed in amazon-eks-ami to be based on pod density
(11*maxPodsPerNode+255) instead of total memory.
See https://github.com/awslabs/amazon-eks-ami/blob/21426e27e3845319dbca92e7df32e5c4b984a1d1/files/bootstrap.sh#L163

This commit introduces a new InstanceTypeInfo property to store MaxPodsPerNode and populates it
using the data from maxpods.go so that it will be available to use when calculing the default memory
to reserve.

Because the image type specs (cpu, memory) come from an API call and the max number of pods per node
comes from a file in amaxon-eks-ami, there is a chance that an image type is returned by the API for
which there is no corresponding record in maxpods.go. To handle this, the original memory
calculation remains as a fallback that will be used if max pods per node is unknown.

Issue eksctl-io#2395
@brianpursley
Copy link
Contributor

😆 wow, I didn’t realize my commit would tag this so much. Sorry for the noise.

I am working on a potential PR to sync up the reserved memory calc.

@brianpursley
Copy link
Contributor

@michaelbeaumont were you already working on this?

@michaelbeaumont
Copy link
Contributor

@brianpursley No! Keep going, I just want to take care of making sure it gets fixed.

brianpursley added a commit to brianpursley/eksctl that referenced this issue Jul 16, 2020
…ased on pod density

(11*maxPodsPerNode+255) instead of total memory.
See https://github.com/awslabs/amazon-eks-ami/blob/21426e27e3845319dbca92e7df32e5c4b984a1d1/files/bootstrap.sh#L163

This commit introduces a new MaxPodsPerNode property on InstanceTypeInfo and populates it
using the data from maxpods.go so that it will be available for calculating the default memory
to reserve. It removes the Memory property from InstanceTypeInfo because it is no longer needed.

This commit also updates related unit tests. It fixes some unit tests which reference instance types which no
longer exist and adds a check to make sure instance types which are expected to exist actually exist, and those
which are not expected to exist do not exist.

Issue eksctl-io#2395
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug priority/important-soon Ideally to be resolved in time for the next release
Projects
None yet
4 participants