-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate VPA for CoreDNS #3800
Comments
In that cluster, what I see is that we are maxing HPA due memory consumption. Memory grows based on cluster size + load, but from the CPU metric we can say each coredns pod is not heavily loaded. When the load is low, cluster size has a bigger representation in the memory consumption, causing HPA to scale up unnecessarily. We could combine HPA (CPU only) and VPA (mem only); this way we can scale horizontally based on load and allocate more memory on each pod depending on the cluster size. The other issue to solve is that we can't ship the VPA CR in the coredns app. |
I'll try using https://github.com/coredns/perf-tests/blob/master/kubernetes. Basically what it's doing is creating a bunch of pods, headless services and services to load coredns as we need in our case. If that works we can implement it as a test in our CI. |
I managed to test VPA only for memory and but pods don't get evicted. If I manually delete the pod the new one gets the new recommended resource allocation. I'm checking if this is a bug in VPA or what. |
So, it turns out that when setting I'm including this resource into the https://github.com/giantswarm/coredns-extensions-app |
coredns-extensions-app 0.1.0 released into the giantswarm-playground catalog. I'm testing this config on the golem MC for a few days. I already see requested memory going down from 512MB to 250MB. We'll keep an eye on it. |
I've been testing the setup on golem and everything looks good. The only problem is that I was not really able to test scaling up. If we wanted to proceed with this we should:
|
We observed increased latency on clusters with many coredns replicas (100). Right now, coredns uses HPA for scaling replica numbers but we could investigate if using VPA would mitigate such an issue.
The text was updated successfully, but these errors were encountered: