-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Measure distributor push duration #1027
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Added a few requests.
@jpkrohling We are adding this metric as a way to get write latency on Tempo. We cannot measure this more directly b/c we rely on the otelcol receivers.
Do the receivers expose this metric directly in anyway?
Alternatively do the receivers allow for a custom grpc or http server to be used?
modules/distributor/receiver/shim.go
Outdated
Namespace: "tempo", | ||
Name: "distributor_push_duration", | ||
Help: "Records the amount of time to push a batch to the ingester.", | ||
Buckets: prom_client.ExponentialBuckets(2, 2, 10), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsure on these buckets. 2s
is a long time for the smallest one. Our p50 is 50ms on this endpoint in ops. Maybe review what the buckets are in the weaveworks common library since we use those on our other endpoints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have a look. I wasn't quite sure about the bucket counting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not much in weaveworks, but lots in coretex. After thinking it over and reviewing the defaults, the default seems reasonable Low enough and high enough with reasonable steps in between.
DefBuckets = []float64{.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10}
I was sure that we had histograms for every component, including receivers. I just ran a test here and it's not there, even with the metrics level set to "detailed". I'll dig for more information, perhaps my recollection is wrong. For the record: some components have custom metrics and are instrumented using OpenCensus. There's a task going on to replace it with OpenTelemetry Metrics SDK. References: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/dd44390a4a3b64b3a9c9a1ca57e95903282765ac/exporter/loadbalancingexporter/factory.go#L34 |
c64832e
to
4eabdbe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely want the dashboard, but I thought I was to do that in the deployment repo. Is the mixin here imported? |
Alright, I've included an update to the dashboard, but not sure how to test it yet. |
What this PR does:
distributor_push_duration
Which issue(s) this PR fixes:
Fixes #614
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]