Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate VPA for CoreDNS #3800

Open
ubergesundheit opened this issue Dec 16, 2024 · 6 comments
Open

Investigate VPA for CoreDNS #3800

ubergesundheit opened this issue Dec 16, 2024 · 6 comments
Assignees

Comments

@ubergesundheit
Copy link
Member

We observed increased latency on clusters with many coredns replicas (100). Right now, coredns uses HPA for scaling replica numbers but we could investigate if using VPA would mitigate such an issue.

@github-project-automation github-project-automation bot moved this to Inbox 📥 in Roadmap Dec 16, 2024
@weatherhog weatherhog moved this from Inbox 📥 to Backlog 📦 in Roadmap Dec 16, 2024
@mcharriere mcharriere moved this from Backlog 📦 to In Progress ⛏️ in Roadmap Jan 7, 2025
@mcharriere mcharriere self-assigned this Jan 7, 2025
@mcharriere
Copy link

mcharriere commented Jan 7, 2025

In that cluster, what I see is that we are maxing HPA due memory consumption.

Memory grows based on cluster size + load, but from the CPU metric we can say each coredns pod is not heavily loaded. When the load is low, cluster size has a bigger representation in the memory consumption, causing HPA to scale up unnecessarily.

We could combine HPA (CPU only) and VPA (mem only); this way we can scale horizontally based on load and allocate more memory on each pod depending on the cluster size.
This is possible, but we need to figure out how to properly test it.

The other issue to solve is that we can't ship the VPA CR in the coredns app.

@mcharriere
Copy link

This is possible, but we need to figure out how to properly test it.

I'll try using https://github.com/coredns/perf-tests/blob/master/kubernetes. Basically what it's doing is creating a bunch of pods, headless services and services to load coredns as we need in our case. If that works we can implement it as a test in our CI.

@teemow teemow moved this from In Progress ⛏️ to Inbox 📥 in Roadmap Jan 10, 2025
@mcharriere mcharriere moved this from Inbox 📥 to In Progress ⛏️ in Roadmap Jan 13, 2025
@mcharriere mcharriere changed the title Investigate VPA instead of HPA for coreDNS workers Investigate VPA for CoreDNS Jan 13, 2025
@mcharriere
Copy link

I managed to test VPA only for memory and but pods don't get evicted. If I manually delete the pod the new one gets the new recommended resource allocation.

I'm checking if this is a bug in VPA or what.

@mcharriere
Copy link

So, it turns out that when setting minAllowed, it must include both CPU and Mem limits regardless what resources are actually managed by VPA.

I'm including this resource into the https://github.com/giantswarm/coredns-extensions-app

@mcharriere
Copy link

coredns-extensions-app 0.1.0 released into the giantswarm-playground catalog.

I'm testing this config on the golem MC for a few days. I already see requested memory going down from 512MB to 250MB. We'll keep an eye on it.

@mcharriere
Copy link

mcharriere commented Jan 20, 2025

I've been testing the setup on golem and everything looks good. The only problem is that I was not really able to test scaling up.
VPA scales up on memory when the pod reaches the limit and gets OOMed. To make it happen we need a massive cluster.

If we wanted to proceed with this we should:

  • Start with our MCs. Disabling targetMemoryUtilizationPercentage in shared-configs.
  • Install coredns-extensions-app in our MCs. That can be done as a gitops base.
  • Coordinate for a test in customer WC
  • Add coredns-extensions-app to the cluster chart
  • Enable coredns-extensions-app and disable targetMemoryUtilizationPercentage in the cluster-<provider> charts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress ⛏️
Development

No branches or pull requests

2 participants