-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Drop synapse_storage_transaction_time_bucket #11124
Conversation
686d94f
to
1d64f2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! One minor suggestion about the wording of the changelog.
1d64f2f
to
bed1e41
Compare
This particular metric has a much too high cardinality due to the fact that the desc label can have (at present) 248 values. This results in over 3k series per Synapse. If you have a Prometheus instance that monitors multiple Synpase instances it results in a huge amount of additional series to ingest. The metric in question is also not used in the Synapse dashboard and the core team has indicated they're happy to drop this metric entirely. Fixes #11081 Signed-off-by: Daniele Sluijters <daenney@users.noreply.github.com>
bed1e41
to
e08a76f
Compare
Forgive me, for I am not a Prometheus expert, but are |
They are 🤦. They're created implicitly by the I'd still really like to drop this metric because it's problematic, but I guess we'll have to ship a small update to the dashboard to go with it? |
Good spot! |
The alternative would be to find a way to reduce the cardinality of this metric to something more manageable. Right now 248 values multiplied by 15 buckets results in 3720 series per Synapse. Is there some other way we could slice |
I think we might be able to replace it with a |
I think we're certainly happy for Prometheus to be configured to drop these, I'm a little less sure about dropping it for everyone (though probably still fine). I am vaguely wondering if we want to have a config option to drop some of the less useful high cardinality metrics? In the same way you might have |
That sounds like a good middle ground. Looking at the dashboard, we do indeed only use the
It's certainly a possibility. I've never seen debug levels affect the emitted metrics in other software. At that point it's starting to feel more like tracing data to me which might be a good angle for this if we do want to collect this type of data at a higher granularity? |
Any thoughts on what's happening with this PR? (is it still wanted? is it going to be a configuration flag? ...) |
Yeah, we still very much need this. The timeseries aren't all that useful and they take up a decent amount of space. We're simply dropping them at ingestion time on the EMS side, but it would be nicer to not have to. Haven't had the time yet to check if I can change it to a summary type without consequence. |
Though I suppose we can close this one, since we'll take a different approach to solving it. |
This particular metric has a much too high cardinality due to the fact
that the desc label can have (at present) 248 values. This results in
over 3k series per Synapse. If you have a Prometheus instance that
monitors multiple Synpase instances it results in a huge amount of
additional series to ingest.
The metric in question is also not used in the Synapse dashboard and
the core team has indicated they're happy to drop this metric entirely.
Fixes #11081
Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.