Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safety net for director Prometheus instance #1311

Closed
haoming29 opened this issue May 20, 2024 · 0 comments · Fixed by #1887
Closed

Safety net for director Prometheus instance #1311

haoming29 opened this issue May 20, 2024 · 0 comments · Fixed by #1887
Assignees
Labels
director Issue relating to the director component internal Internal code improvements, not user-facing
Milestone

Comments

@haoming29
Copy link
Contributor

haoming29 commented May 20, 2024

We experienced a Prometheus cardinality spike recently as our gin exporter has a path label for requests it serves, causing half a million time series in less than a week. Consequently, the director memory went up til 5GiB.

Since the director handles all object access requests and it scrapes all other servers, it's expected to have large cardinality for various metric labels. Although a fix is in effect to reduce known metric cardinality in #1276, to prevent Prometheus process accidentally explodes the director, we should add various limits to Prometheus to control the max memory/time series/labels it can use.

There's a good reference of what knobs to turn from Cloudflare blog https://blog.cloudflare.com/how-cloudflare-runs-prometheus-at-scale and we should adapt them accordingly

@haoming29 haoming29 added internal Internal code improvements, not user-facing director Issue relating to the director component labels May 20, 2024
@haoming29 haoming29 changed the title Safety for director Prometheus instance Safety net for director Prometheus instance May 20, 2024
@haoming29 haoming29 self-assigned this May 22, 2024
@haoming29 haoming29 added this to the v7.10.0 milestone May 30, 2024
@jhiemstrawisc jhiemstrawisc modified the milestones: v7.10.0, v7.13.0 Dec 11, 2024
@patrickbrophy patrickbrophy linked a pull request Jan 13, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
director Issue relating to the director component internal Internal code improvements, not user-facing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants