Investigate VPA for CoreDNS #3800

ubergesundheit · 2024-12-16T11:05:23Z

We observed increased latency on clusters with many coredns replicas (100). Right now, coredns uses HPA for scaling replica numbers but we could investigate if using VPA would mitigate such an issue.

mcharriere · 2025-01-07T12:32:02Z

In that cluster, what I see is that we are maxing HPA due memory consumption.

Memory grows based on cluster size + load, but from the CPU metric we can say each coredns pod is not heavily loaded. When the load is low, cluster size has a bigger representation in the memory consumption, causing HPA to scale up unnecessarily.

We could combine HPA (CPU only) and VPA (mem only); this way we can scale horizontally based on load and allocate more memory on each pod depending on the cluster size.
This is possible, but we need to figure out how to properly test it.

The other issue to solve is that we can't ship the VPA CR in the coredns app.

mcharriere · 2025-01-08T07:08:25Z

This is possible, but we need to figure out how to properly test it.

I'll try using https://github.com/coredns/perf-tests/blob/master/kubernetes. Basically what it's doing is creating a bunch of pods, headless services and services to load coredns as we need in our case. If that works we can implement it as a test in our CI.

mcharriere · 2025-01-14T10:59:39Z

I managed to test VPA only for memory and but pods don't get evicted. If I manually delete the pod the new one gets the new recommended resource allocation.

I'm checking if this is a bug in VPA or what.

mcharriere · 2025-01-14T12:14:15Z

So, it turns out that when setting minAllowed, it must include both CPU and Mem limits regardless what resources are actually managed by VPA.

I'm including this resource into the https://github.com/giantswarm/coredns-extensions-app

mcharriere · 2025-01-15T10:25:52Z

coredns-extensions-app 0.1.0 released into the giantswarm-playground catalog.

I'm testing this config on the golem MC for a few days. I already see requested memory going down from 512MB to 250MB. We'll keep an eye on it.

mcharriere · 2025-01-20T09:12:50Z

I've been testing the setup on golem and everything looks good. The only problem is that I was not really able to test scaling up.
VPA scales up on memory when the pod reaches the limit and gets OOMed. To make it happen we need a massive cluster.

If we wanted to proceed with this we should:

Start with our MCs. Disabling targetMemoryUtilizationPercentage in shared-configs.
Install coredns-extensions-app in our MCs. That can be done as a gitops base.
Coordinate for a test in customer WC
Add coredns-extensions-app to the cluster chart
Enable coredns-extensions-app and disable targetMemoryUtilizationPercentage in the cluster-<provider> charts.

ubergesundheit added component/coredns team/cabbage Team Cabbage labels Dec 16, 2024

github-project-automation bot added this to Roadmap Dec 16, 2024

github-project-automation bot moved this to Inbox 📥 in Roadmap Dec 16, 2024

weatherhog moved this from Inbox 📥 to Backlog 📦 in Roadmap Dec 16, 2024

mcharriere moved this from Backlog 📦 to In Progress ⛏️ in Roadmap Jan 7, 2025

mcharriere self-assigned this Jan 7, 2025

teemow moved this from In Progress ⛏️ to Inbox 📥 in Roadmap Jan 10, 2025

mcharriere moved this from Inbox 📥 to In Progress ⛏️ in Roadmap Jan 13, 2025

mcharriere changed the title ~~Investigate VPA instead of HPA for coreDNS workers~~ Investigate VPA for CoreDNS Jan 13, 2025

mcharriere mentioned this issue Jan 14, 2025

Add VPA for CoreDNS deployments giantswarm/coredns-extensions-app#4

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate VPA for CoreDNS #3800

Investigate VPA for CoreDNS #3800

ubergesundheit commented Dec 16, 2024

mcharriere commented Jan 7, 2025 •

edited

Loading

mcharriere commented Jan 8, 2025

mcharriere commented Jan 14, 2025

mcharriere commented Jan 14, 2025

mcharriere commented Jan 15, 2025

mcharriere commented Jan 20, 2025 •

edited

Loading

Investigate VPA for CoreDNS #3800

Investigate VPA for CoreDNS #3800

Comments

ubergesundheit commented Dec 16, 2024

mcharriere commented Jan 7, 2025 • edited Loading

mcharriere commented Jan 8, 2025

mcharriere commented Jan 14, 2025

mcharriere commented Jan 14, 2025

mcharriere commented Jan 15, 2025

mcharriere commented Jan 20, 2025 • edited Loading

mcharriere commented Jan 7, 2025 •

edited

Loading

mcharriere commented Jan 20, 2025 •

edited

Loading