Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inherit instrumentation library name #1364

Merged
merged 10 commits into from
Jan 21, 2022

Conversation

Tenaria
Copy link

@Tenaria Tenaria commented Jan 14, 2022

Description:
Adding the feature for generated metrics to inherit instrumentation library name from the span if specified so via config.

There is one caveat to the method I used which is that it'll duplicate the OTel metrics structure when declaring instrumentation library details and have each instrumentation library section twice (one for "calls_total" metrics and one for "latency" metrics) which then have the metrics / data points attached to them. E.g for one span with the same resource attributes and library name, the generated metric structure will be pseudocode like:
--> Metrics
.......--> Resource Metrics
.............--> Instrumentation Library Name A
....................--> Metric Calls Total
...........................--> Data Point Calls Total
.............--> Instrumentation Library Name A
....................--> Metric Latency
...........................--> Data Point Latency

I chose to do it this way because it's a "small-ish" trade-off with the benefit of not having to completely rework the code and the data structures use. Very open to suggestions if you can see a simpler way to do it or if you feel strongly we should do the larger change to have it not duplicate like described above.

Testing:

  • Tests added to check that instrumentation library name is correctly added when config is specified true
  • Tests added to check that default instrumentation library name ("spanmetricsprocessor") when config is specified to false
  • Existing tests modified to suit

Documentation:
README / Config Examples for usage

@chenzhihao
Copy link

chenzhihao commented Jan 19, 2022

LGTM, only a few minor change suggestions.

I'm thinking we should add an abstraction layer to replace map[resourceKey]map[instrLibKey]map[metricKey]val? The reason is we can see most of the change can be simplified if we have the abstraction layer for the index data structure. It will also provide the ability to follow the Open-Closed principle if we need to change the data structure in future.

It's not required to be done at this moment. But worth considering to be done when we push the stuff to OTel upstream later.

processor/spanmetricsprocessor/processor_test.go Outdated Show resolved Hide resolved
processor/spanmetricsprocessor/processor.go Outdated Show resolved Hide resolved
processor/spanmetricsprocessor/processor.go Outdated Show resolved Hide resolved
processor/spanmetricsprocessor/processor.go Outdated Show resolved Hide resolved
@@ -74,6 +74,8 @@ The following settings can be optionally configured:
- `attach_span_and_trace_id` attaches span id and trace id as attributes on metrics generated from spans if set to `true`. If not provided,
will use default value of `false`.

- `inherit_instrumentation_library_name`: defines whether the metrics generated from spans should inherit the same instrumentation library name as the span. If not provided, will use default value of `false` which will define the instrumentation library name as `spanmetricsprocessor` on metrics.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to inherit the instrumentation library name of the span? Is it strictly necessary? Is it solely for the one attribute from the span being reflected on the metric?

In my view the instrumentation Library being spanmetricsprocessor is valid, because that's where the metric is generated. If someone really needs to know the original instrumentation library, they can look at the linked span.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with setting everything as spanmetricsprocessor as the instrumentation library is that it can incorrectly aggregate. For example, in our use case, we have HTTP metrics sent from Service Proxy and HTTP metrics sent from the application. These will share the same attributes and as a result they'll all be aggregated together into one blob by the span metrics processor, hence incorrectly showing doubled up data. The only way to differentiate between these two is by checking the instrumentation library name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, yeah it makes sense. Wouldn't they be aggregated in some other way though? Like I would think service.name would be different in the the case you mentioend

Copy link
Author

@Tenaria Tenaria Jan 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, cos the service.name for both would be the application (not Service Proxy). Otherwise, all the Service Proxy metrics for all services would collapse into one giant metric if they had the same service name (and even if we set up another variable to mark the actual service name, this would have issues in the pipeline since we now can't rate limit/throttle per service since everything the service would be set to SP). Using the instrumentation library name seemed like the in-built way in OTel to differentiate, instead of spinning up our own solution. (We had a conversation about this as a team some time last year, I'm not sure if you were there)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, fair enough! sounds good

latencyCount map[resourceKey]map[metricKey]uint64
latencySum map[resourceKey]map[metricKey]float64
latencyBucketCounts map[resourceKey]map[metricKey][]uint64
latencyCount map[resourceKey]map[instrLibKey]map[metricKey]uint64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why you've done it this way, but it does make it more complex. It seems the more we add to this the more we are just replicating what the metrics SDK is designed to do, except in a makeshift way.

Probably not necessary for now, but I would like to investigate if it's possible to simply use an SDK instrument and aggregator to create these metrics rather than re-inventing the wheel.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any links to the stuff you mentioned, I'm not very familiar with it. Is there an in-built aggregator in the SDK? :curious:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider removing the aggregation from the processor. Instead, it should only focus on extraction. See open-telemetry#403 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the other idea is connect it up to other metrics processor so it can handle the funky stuff there. I don't have any links, it's just in my head. But there's definitely a better way, like you mentioned in the linked comment.

processor/spanmetricsprocessor/processor_test.go Outdated Show resolved Hide resolved
@Tenaria
Copy link
Author

Tenaria commented Jan 20, 2022

@chenzhihao @jamesmoessis this should be ready for review again, I think I've addressed the PR comments 🥺

@chenzhihao
Copy link

LGTM

@Tenaria Tenaria merged commit d4863aa into collections-main Jan 21, 2022
@Tenaria Tenaria deleted the inherit-instrumentation-library-name branch January 21, 2022 00:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants