Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure distributor push duration #1027

Merged
merged 4 commits into from
Oct 14, 2021

Conversation

zalegrala
Copy link
Contributor

@zalegrala zalegrala commented Oct 12, 2021

What this PR does:

  • Add metric distributor_push_duration

Which issue(s) this PR fixes:
Fixes #614

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Added a few requests.

@jpkrohling We are adding this metric as a way to get write latency on Tempo. We cannot measure this more directly b/c we rely on the otelcol receivers.

Do the receivers expose this metric directly in anyway?
Alternatively do the receivers allow for a custom grpc or http server to be used?

CHANGELOG.md Outdated Show resolved Hide resolved
modules/distributor/receiver/shim.go Outdated Show resolved Hide resolved
Namespace: "tempo",
Name: "distributor_push_duration",
Help: "Records the amount of time to push a batch to the ingester.",
Buckets: prom_client.ExponentialBuckets(2, 2, 10),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsure on these buckets. 2s is a long time for the smallest one. Our p50 is 50ms on this endpoint in ops. Maybe review what the buckets are in the weaveworks common library since we use those on our other endpoints?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have a look. I wasn't quite sure about the bucket counting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much in weaveworks, but lots in coretex. After thinking it over and reviewing the defaults, the default seems reasonable Low enough and high enough with reasonable steps in between.

DefBuckets = []float64{.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10}

@jpkrohling
Copy link

@jpkrohling We are adding this metric as a way to get write latency on Tempo. We cannot measure this more directly b/c we rely on the otelcol receivers.

I was sure that we had histograms for every component, including receivers. I just ran a test here and it's not there, even with the metrics level set to "detailed". I'll dig for more information, perhaps my recollection is wrong.

For the record: some components have custom metrics and are instrumented using OpenCensus. There's a task going on to replace it with OpenTelemetry Metrics SDK. References:

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/dd44390a4a3b64b3a9c9a1ca57e95903282765ac/exporter/loadbalancingexporter/factory.go#L34
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/loadbalancingexporter/metrics.go
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/dd44390a4a3b64b3a9c9a1ca57e95903282765ac/exporter/loadbalancingexporter/resolver_dns.go#L125

Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, one requested change. Can we get an addition to the writes dashboard as well with this metric?

image

@zalegrala
Copy link
Contributor Author

Definitely want the dashboard, but I thought I was to do that in the deployment repo. Is the mixin here imported?

@zalegrala
Copy link
Contributor Author

Alright, I've included an update to the dashboard, but not sure how to test it yet.

@joe-elliott joe-elliott merged commit 1f0c06a into grafana:main Oct 14, 2021
@zalegrala zalegrala deleted the distributorLatency branch October 14, 2021 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Distributor Latency Metrics
3 participants