You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The webhooks server is caching all pods on the cluster in-memory, regardless of whether they are DevWorkspace pods or not. This can result in the webhook server reaching its default memory limit (300Mi) and being killed by the cluster, causing all DWO webhooks will stop working. Further, since it's not possible to filter pods/exec requests in webhooks (see kubernetes/kubernetes#91732), all kubectl exec commands in the cluster will be blocked.
This issue hasn't been seen until now as webhook server memory usage only becomes a problem when there are a large number (>6000) pods on a cluster, and normally this does not occur due to CPU/memory constraints in the cluster. However, some cluster tasks (e.g. s2i on OpenShift) can result in many completed or errored pods being left on the cluster.
How To Reproduce
Note: This is hard to reproduce without also killing a small test cluster
Create test pods on a cluster -- note these pods have ~100KB of annotations (to hopefully use more space in memory per pod) and complete immediately after starting:
(In another terminal) try to kubectl exec into a pod to trigger the webhook server caching pods internally
Observe webhooks server memory usage (e.g. on OpenShift: oc adm top pod <webhooks-server-pod-name>)
In my testing, once I get to around 500 pods, the webhooks server requires ~230MiB of memory (up from ~30MiB at idle)
Expected behavior
DWO webhooks server memory usage should not depend on the number of non-DWO-related objects on the cluster.
Additional context
The webhooks server needs to read pods from the cluster in order to validate pods/exec requests for restricted-access workspaces. Since this is a read-only operation (we just need to check pod metadata), this is done via the controller-runtime manager, which implements efficient read operations by asynchronously watching all objects of interest in the cluster.
The controller itself has had a similar problem in the past. See: #652
The text was updated successfully, but these errors were encountered:
Description
The webhooks server is caching all pods on the cluster in-memory, regardless of whether they are DevWorkspace pods or not. This can result in the webhook server reaching its default memory limit (300Mi) and being killed by the cluster, causing all DWO webhooks will stop working. Further, since it's not possible to filter
pods/exec
requests in webhooks (see kubernetes/kubernetes#91732), allkubectl exec
commands in the cluster will be blocked.This issue hasn't been seen until now as webhook server memory usage only becomes a problem when there are a large number (>6000) pods on a cluster, and normally this does not occur due to CPU/memory constraints in the cluster. However, some cluster tasks (e.g. s2i on OpenShift) can result in many completed or errored pods being left on the cluster.
How To Reproduce
Note: This is hard to reproduce without also killing a small test cluster
kubectl exec
into a pod to trigger the webhook server caching pods internallyoc adm top pod <webhooks-server-pod-name>
)In my testing, once I get to around 500 pods, the webhooks server requires ~230MiB of memory (up from ~30MiB at idle)
Expected behavior
DWO webhooks server memory usage should not depend on the number of non-DWO-related objects on the cluster.
Additional context
The webhooks server needs to read pods from the cluster in order to validate
pods/exec
requests for restricted-access workspaces. Since this is a read-only operation (we just need to check pod metadata), this is done via the controller-runtime manager, which implements efficient read operations by asynchronously watching all objects of interest in the cluster.The controller itself has had a similar problem in the past. See: #652
The text was updated successfully, but these errors were encountered: