-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inherit instrumentation library name #1364
Inherit instrumentation library name #1364
Conversation
…sian-forks/opentelemetry-collector-contrib into inherit-instrumentation-library-name
LGTM, only a few minor change suggestions. I'm thinking we should add an abstraction layer to replace It's not required to be done at this moment. But worth considering to be done when we push the stuff to OTel upstream later. |
@@ -74,6 +74,8 @@ The following settings can be optionally configured: | |||
- `attach_span_and_trace_id` attaches span id and trace id as attributes on metrics generated from spans if set to `true`. If not provided, | |||
will use default value of `false`. | |||
|
|||
- `inherit_instrumentation_library_name`: defines whether the metrics generated from spans should inherit the same instrumentation library name as the span. If not provided, will use default value of `false` which will define the instrumentation library name as `spanmetricsprocessor` on metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to inherit the instrumentation library name of the span? Is it strictly necessary? Is it solely for the one attribute from the span being reflected on the metric?
In my view the instrumentation Library being spanmetricsprocessor
is valid, because that's where the metric is generated. If someone really needs to know the original instrumentation library, they can look at the linked span.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with setting everything as spanmetricsprocessor
as the instrumentation library is that it can incorrectly aggregate. For example, in our use case, we have HTTP metrics sent from Service Proxy and HTTP metrics sent from the application. These will share the same attributes and as a result they'll all be aggregated together into one blob by the span metrics processor, hence incorrectly showing doubled up data. The only way to differentiate between these two is by checking the instrumentation library name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, yeah it makes sense. Wouldn't they be aggregated in some other way though? Like I would think service.name would be different in the the case you mentioend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, cos the service.name
for both would be the application (not Service Proxy). Otherwise, all the Service Proxy metrics for all services would collapse into one giant metric if they had the same service name (and even if we set up another variable to mark the actual service name, this would have issues in the pipeline since we now can't rate limit/throttle per service since everything the service would be set to SP). Using the instrumentation library name seemed like the in-built way in OTel to differentiate, instead of spinning up our own solution. (We had a conversation about this as a team some time last year, I'm not sure if you were there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, fair enough! sounds good
latencyCount map[resourceKey]map[metricKey]uint64 | ||
latencySum map[resourceKey]map[metricKey]float64 | ||
latencyBucketCounts map[resourceKey]map[metricKey][]uint64 | ||
latencyCount map[resourceKey]map[instrLibKey]map[metricKey]uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand why you've done it this way, but it does make it more complex. It seems the more we add to this the more we are just replicating what the metrics SDK is designed to do, except in a makeshift way.
Probably not necessary for now, but I would like to investigate if it's possible to simply use an SDK instrument and aggregator to create these metrics rather than re-inventing the wheel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any links to the stuff you mentioned, I'm not very familiar with it. Is there an in-built aggregator in the SDK? :curious:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider removing the aggregation from the processor. Instead, it should only focus on extraction. See open-telemetry#403 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the other idea is connect it up to other metrics processor so it can handle the funky stuff there. I don't have any links, it's just in my head. But there's definitely a better way, like you mentioned in the linked comment.
@chenzhihao @jamesmoessis this should be ready for review again, I think I've addressed the PR comments 🥺 |
LGTM |
Description:
Adding the feature for generated metrics to inherit instrumentation library name from the span if specified so via config.
There is one caveat to the method I used which is that it'll duplicate the OTel metrics structure when declaring instrumentation library details and have each instrumentation library section twice (one for "calls_total" metrics and one for "latency" metrics) which then have the metrics / data points attached to them. E.g for one span with the same resource attributes and library name, the generated metric structure will be pseudocode like:
--> Metrics
.......--> Resource Metrics
.............--> Instrumentation Library Name A
....................--> Metric Calls Total
...........................--> Data Point Calls Total
.............--> Instrumentation Library Name A
....................--> Metric Latency
...........................--> Data Point Latency
I chose to do it this way because it's a "small-ish" trade-off with the benefit of not having to completely rework the code and the data structures use. Very open to suggestions if you can see a simpler way to do it or if you feel strongly we should do the larger change to have it not duplicate like described above.
Testing:
true
false
Documentation:
README / Config Examples for usage