Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Add memory requests for nodes and controlplane workloads #49

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

invidian
Copy link
Member

@invidian invidian commented Feb 25, 2020

This commit adds memory requests for all nodes and controlplane
workloads. The reason behind it is to better show user available
resources on both worker and controller nodes e.g. when doing 'kubectl
describe node'. This is important while one scales the controlplane
deployments and may prevent node eviction.

The measurment was done on freshly created cluster, with
prometheus-operator and metrics-server deployed,
on controller node and on worker node, so the numbers might be lower
than the numbers on long-running cluster, but they give at least some
initial visibility.

The values were measured using 'systemd-cgtop -m -1 / --depth=1', not
using 'free', as 'systemd-cgtop' also includes page cache usage, the
same way 'kubelet' is measuring the memory usage.

Before the measurment, following command has been executed:
'sync; echo 1 | sudo tee /proc/sys/vm/drop_caches; sleep 10'
To make sure only active memory has been captured.

system.slice uses ~250Mi, init.scope uses ~200Mi, which sums up to
roughly 500Mi, which is needed for system.

Kubelet in /docker slice was using ~100Mi. etcd in /docker slice was
using ~200Mi, so workers has 100Mi reserved for 'kube' and controllers
has 300Mi.

Memory usage for self-hosted components has been measured using the
following command: 'kubectl top pods --sort-by=memory | sort -h -k3 -r'.

Then, the read values were rounded up a bit.

Signed-off-by: Mateusz Gozdek mateusz@kinvolk.io

@invidian invidian force-pushed the invidian/memory-requests branch 6 times, most recently from c24096b to 7133e9e Compare March 6, 2020 12:46
@invidian invidian requested review from iaguis, ipochi, rata and surajssd March 6, 2020 13:20
@invidian invidian requested a review from surajssd March 10, 2020 09:50
Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding these fields makes sense, but not sure which was the reasoning to pick them. Can you please share the reasoning (and add it to the commit)?

I can't really review if the numbers make sense without knowing how they were calculated :)

@@ -95,7 +95,8 @@ systemd:
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--register-with-taints=$${NODE_TAINTS} \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
--volume-plugin-dir=/var/lib/kubelet/volumeplugins \
--kube-reserved=memory=500Mi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this value come from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values are a snapshot of kubectl top pods --sort-by=memory | sort -h -k3 -r after deploying a cluster from example configuration. They should perhaps be configurable, as depending on the cluster size, they will be changing.

Those values are minimal, just to have something in place.

Should I add such comment in commit message?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please do.

I'd like to get some values from production clusters maybe too... But we can update later, maybe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, kubelet+docker was using 500mb?

Note ssh, etc. should go under --system-reserved, say the docs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. I'll use system-reserved then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. PTAL.

@invidian invidian force-pushed the invidian/memory-requests branch from 7133e9e to 1357be0 Compare March 27, 2020 09:45
@invidian
Copy link
Member Author

I think adding these fields makes sense, but not sure which was the reasoning to pick them. Can you please share the reasoning (and add it to the commit)?

I can't really review if the numbers make sense without knowing how they were calculated :)

Right, sorry for not providing that straight away. I've added methods of measuring etc to the commit message now. Please take a look.

@invidian invidian force-pushed the invidian/memory-requests branch 2 times, most recently from aefa784 to ac21d82 Compare March 30, 2020 09:52
@invidian invidian requested a review from rata March 30, 2020 09:56
@invidian invidian force-pushed the invidian/memory-requests branch from ac21d82 to 5bb18fa Compare March 30, 2020 10:44
This commit adds memory requests for all nodes and controlplane
workloads. The reason behind it is to better show user available
resources on both worker and controller nodes e.g. when doing 'kubectl
describe node'. This is important while one scales the controlplane
deployments and may prevent node eviction.

The measurment was done on freshly created cluster, with
prometheus-operator and metrics-server deployed,
on controller node and on worker node, so the numbers might be lower
than the numbers on long-running cluster, but they give at least some
initial visibility.

The values were measured using 'systemd-cgtop -m -1 / --depth=1', not
using 'free', as 'systemd-cgtop' also includes page cache usage, the
same way 'kubelet' is measuring the memory usage.

Before the measurment, following command has been executed:
'sync; echo 1 | sudo tee /proc/sys/vm/drop_caches; sleep 10'
To make sure only active memory has been captured.

system.slice uses ~250Mi, init.scope uses ~200Mi, which sums up to
roughly 500Mi, which is needed for system.

Kubelet in /docker slice was using ~100Mi. etcd in /docker slice was
using ~200Mi, so workers has 100Mi reserved for 'kube' and controllers
has 300Mi.

Memory usage for self-hosted components has been measured using the
following command: 'kubectl top pods --sort-by=memory | sort -h -k3 -r'.

Then, the read values were rounded up a bit.

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
To avoid duplicating the template logic and to make the configuration
more readable, as having quote (") right before the template logic is
very confusing.

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
@vbatts
Copy link

vbatts commented Aug 6, 2020

This seems like something that would be done inside slices of cgroups on the host, to ensure the system can still operate. I guess it seems arbitrary and inflexible to declare numbers like this. Right?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants