-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[connector/spanmetrics] Add maximum span duration metric #31885
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Span duration metrics are currently exported from the connector in the form of histograms with buckets that can be modified as users see fit. Is there a reason this is insufficient for your use case? |
Hi @crobert-1, we are running the gateway collector set-up where one span metric configuration is producing span metrics for dozens of applications. That makes it harder to pick a set of histogram buckets that work for all use cases. For example, a 30 second bucket may be sufficient for identifying long-running API call spans but not for a batch job's spans that take several minutes. We also don't get the exactness of the latency that a max metric would give, only that the latency is between two buckets' values. Although we're switching from using an observability vendor that calculated max automatically, so we may be used to relying on max when the imprecision of histograms works in practice |
Would the idea be to then have a datapoint within the metric representing each application that's sending data in? That generally makes sense to me, but one question I have is how much effort is required to allow the connector to export metrics that aren't histograms. It looks like metrics are all output as histograms from the README, so I'm wondering what this change would entail from that perspective. (Correct me if I'm wrong here and it's already exporting gauges.) |
I was picturing it working like the existing
The control flow seems mostly the same as the The tasks coming to mind are:
|
I suppose it would be too cumbersome to support specific service (and/or span type) explicit histogram bucket configuration? |
That's a workaround but creates more effort for an observability team to maintain and explain how it works to other teams. In our case, we're using the centralized/gateway collector set-up instead of sidecars. I suppose each custom request from other teams would need a new filter, span metric component, and pipeline set-up |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been closed as inactive because it has been stale for 120 days with no activity. |
Component(s)
connector/spanmetrics
Is your feature request related to a problem? Please describe.
Hello, adding a
duration_max
span metric would be helpful for finding anomalies in latencyDescribe the solution you'd like
The connector flushes metrics at a configurable interval, so maybe it could emit a gauge with the highest latency seen since the last flush
For cumulative span metrics, I suppose it could flush a value of zero if no spans are seen in-between flushes? For delta span metrics it would not emit anything in that case?
I'm also assuming new configuration would be good for disabling this metric
Describe alternatives you've considered
Lmk if there's an alternative
Additional context
I can implement this change
The text was updated successfully, but these errors were encountered: