Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webhooks server memory usage depends on number of pods in cluster. #888

Closed
amisevsk opened this issue Jul 5, 2022 · 0 comments · Fixed by #889
Closed

Webhooks server memory usage depends on number of pods in cluster. #888

amisevsk opened this issue Jul 5, 2022 · 0 comments · Fixed by #889
Assignees
Milestone

Comments

@amisevsk
Copy link
Collaborator

amisevsk commented Jul 5, 2022

Description

The webhooks server is caching all pods on the cluster in-memory, regardless of whether they are DevWorkspace pods or not. This can result in the webhook server reaching its default memory limit (300Mi) and being killed by the cluster, causing all DWO webhooks will stop working. Further, since it's not possible to filter pods/exec requests in webhooks (see kubernetes/kubernetes#91732), all kubectl exec commands in the cluster will be blocked.

This issue hasn't been seen until now as webhook server memory usage only becomes a problem when there are a large number (>6000) pods on a cluster, and normally this does not occur due to CPU/memory constraints in the cluster. However, some cluster tasks (e.g. s2i on OpenShift) can result in many completed or errored pods being left on the cluster.

How To Reproduce

Note: This is hard to reproduce without also killing a small test cluster

  1. Create test pods on a cluster -- note these pods have ~100KB of annotations (to hopefully use more space in memory per pod) and complete immediately after starting:
    for i in {0001..0500}; do
      curl https://gist.githubusercontent.com/amisevsk/2ca0a75f2bfcf785e597df37d8a22221/raw/007c6b8ff7780ec3aacfd7881f4ed9e22f809470/big-pod.yaml \
        | yq -y --arg name "test-pod-$i" '.metadata.name = $name' \
        | oc apply -f -
      sleep 1s
    done
  2. (In another terminal) try to kubectl exec into a pod to trigger the webhook server caching pods internally
  3. Observe webhooks server memory usage (e.g. on OpenShift: oc adm top pod <webhooks-server-pod-name>)

In my testing, once I get to around 500 pods, the webhooks server requires ~230MiB of memory (up from ~30MiB at idle)

Expected behavior

DWO webhooks server memory usage should not depend on the number of non-DWO-related objects on the cluster.

Additional context

The webhooks server needs to read pods from the cluster in order to validate pods/exec requests for restricted-access workspaces. Since this is a read-only operation (we just need to check pod metadata), this is done via the controller-runtime manager, which implements efficient read operations by asynchronously watching all objects of interest in the cluster.

The controller itself has had a similar problem in the past. See: #652

@amisevsk amisevsk self-assigned this Jul 5, 2022
@amisevsk amisevsk added this to the v0.15.x milestone Jul 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant