You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We experienced a Prometheus cardinality spike recently as our gin exporter has a path label for requests it serves, causing half a million time series in less than a week. Consequently, the director memory went up til 5GiB.
Since the director handles all object access requests and it scrapes all other servers, it's expected to have large cardinality for various metric labels. Although a fix is in effect to reduce known metric cardinality in #1276, to prevent Prometheus process accidentally explodes the director, we should add various limits to Prometheus to control the max memory/time series/labels it can use.
We experienced a Prometheus cardinality spike recently as our
gin
exporter has apath
label for requests it serves, causing half a million time series in less than a week. Consequently, the director memory went up til 5GiB.Since the director handles all object access requests and it scrapes all other servers, it's expected to have large cardinality for various metric labels. Although a fix is in effect to reduce known metric cardinality in #1276, to prevent Prometheus process accidentally explodes the director, we should add various limits to Prometheus to control the max memory/time series/labels it can use.
There's a good reference of what knobs to turn from Cloudflare blog https://blog.cloudflare.com/how-cloudflare-runs-prometheus-at-scale and we should adapt them accordingly
The text was updated successfully, but these errors were encountered: