-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
default agent config causes metric cardinality explosion in Prometheus #229
Comments
Note: some other metrics (i.e. |
No, this is being discussed on vector vectordotdev/vector#11995 and dropping the tags (with a transform) will not help with the amount of metrics being collected from internal_metrics, only reduce the data sent out at the sink. Any fix on the chart itself won't be able to help in this case. |
Sorry - this clearly passed my notice! @tuananhnguyen-ct is correct, dropping the tags (or the tag_cardinality_limit transform) should protect your downstream Prometheus from cardinality issues. However Vector will still be tracking things internally and we'll need to solve this in a more complete fashion. I'll close this as a duplicate of vectordotdev/vector#11995. |
Thanks for the replies and pointers! I’ve subscribed to the upstream issue.
… On 3. Aug 2022, at 19:17, Spencer Gilbert ***@***.***> wrote:
Sorry - this clearly passed my notice! @tuananhnguyen-ct is correct, dropping the tags (or the tag_cardinality_limit transform) should protect your downstream Prometheus from cardinality issues. However Vector will still be tracking things internally and we'll need to solve this in a more complete fashion.
I'll close this as a duplicate of vectordotdev/vector#11995.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.
|
We recently deployed a vector agent based on the default configuration to a relatively busy Kubernetes cluster (~300 nodes, ~8000 pods). Some of the metrics have unbounded cardinality on some of the tags.
In particular, the file-based metrics (
vector_files_added_total
,vector_files_unwatched_total
) have afile
tag, causing their cardinality to reach millions of time series over a couple of days. This had a noticeable performance impact on the overall observability infrastructure (based on Prometheus/Thanos).As a workaround, we're including the following remap transform in our
customConfig
:Would it make sense to include this in the default configuration?
The text was updated successfully, but these errors were encountered: