[connector/servicegraph] Coalesce different attr sets into single ScopeMetrics metric entry #34070

evantorrie · 2024-07-15T01:07:01Z

Tests in connector/servicegraph were failing because the servicegraph buildMetrics() code was creating multiple metric entries for the same metric name within a single MetricScope. Although this may not be forbidden by the Otel specification, I think there is a general assumption that a metric name does not appear more than once within the same MetricScope.
Instead, different values (e.g. with different sets of attribute values) should be created as separate datapoints within the same metric.

The pmetrictest.CompareScopeMetrics test functionality is not designed to handle multiple metric entries with the same Name(). Instead, it is assumed that in cases where Order is ignored, the first entry found in the actual metrics which matches the name of the expected metrics must be the metric to compare.

This fix changes the buildMetrics() code to create one metric within a scope, and instead create multiple datapoints per metric when there are entries where the datapoint attribute set is unique (i.e. all entries in the internal maps serviceGraphConnector.req{Total,FailedTotal,ServerDurationSecondsCount} are coalesced into a single named metric as appropriate.)

Note: I don't have any past experience working with servicegraphconnector, but just observing that collectClientLatencyMetrics() and collectServerLatencyMetrics() both range over the same map - p.reqServerDurationSecondsCount, although the actual values collected in collectClientLatencyMetrics() are from p.reqServerDurationSeconds{Count,Sum,BucketCounts} and the values collected in collectServerLatencyMetrics() are from p.reqClientDurationSeconds{Count,Sum,BucketCounts}.

This seems a little asymmetrical, but I don't have enough experience to say whether this is an error or not.

Description: Fixes #33998

Testing: All other unit tests now complete, and the previously failing unit test now works reliably.

Documentation: No documentation added. This is a unit test fix.

Tests were failing because the `buildMetrics()` code was creating multiple metric entries inside a single MetricScope. The `pdatatest` test functionality is not designed to handle this. The code now creates one metric within a scope, but creates one datapoint for each unique set of attributes.

evantorrie · 2024-07-15T03:04:53Z

No changelog entry as this is a fix for a flaky unit-test.

t00mas

This makes sense 👍🏼

jpkrohling

I believe the following PR has the same purpose as this, but might be a better solution. If this PR here has further improvements, we can consider merging it as well. I'm blocking this, so that we can take that into account before merging.

#34076

This reverts commit ed3a56a.

evantorrie · 2024-07-17T03:50:30Z

Restore skipped test introduced in #34120

evantorrie · 2024-07-17T04:00:20Z

I believe the following PR has the same purpose as this, but might be a better solution. If this PR here has further improvements, we can consider merging it as well. I'm blocking this, so that we can take that into account before merging.

#34076

#34076 was not successful in fixing the servicegraph connector flaky test. The root cause of the error is at a higher level than the order of datapoint attributes within a metric. As mentioned earlier

The pmetrictest.CompareScopeMetrics test functionality is not designed to handle multiple metric entries with the same Name(). Instead, it is assumed that in cases where Order is ignored, the first entry found in the actual metrics which matches the name of the expected metrics must be the metric to compare.

jpkrohling · 2024-07-17T08:28:29Z

Would it make sense to undo #34076 as part of this PR?

evantorrie · 2024-07-17T17:31:35Z

As far as I can tell from #34076, it attempts to sort all datapoint slices based on a canonicalized order of the attributes within each datapoint.

The existing compare{Number,Histogram,Summary}DataPointSlices functions in pmetrictest check for each "expected" datapoint whether there exists an "actual" datapoint using the following test to determine if the datapoints match.

if reflect.DeepEqual(edp.Attributes().AsRaw(), adp.Attributes().AsRaw())

If there is a match, but they are not in the same index order if e != a, then it will append an error indicating the out-of-order data attributes.

The only way to avoid an OutOfOrderErr and get to the point of comparing the actual values is to ensure that the order of the metric datapoints within a DataPointSlice are exactly the same between the expected and the actual which is what #34076 does by sorting both expected and actual in the same way.

So no, I don't think #34076 needs to be reverted.

It's a useful option, particularly if there are multiple datapoints within a single Resource::Scope::Metric structure, and it's non-deterministic as to which order they will be added.

I do see that the merged commit didn't fix the copy/paste error on the function comment for IgnoreDatapointAttributesOrder() though (it still refers to IgnoreMetricAttributeValue).

Fixing that comment is not really part of this PR, but I can put it in as an additional commit to avoid opening/merging another PR just for that comment change.

jpkrohling · 2024-07-17T18:54:04Z

Thanks for the comments. I'll merge this as is, the comment can be fixed on a follow up PR

github-actions bot added the connector/servicegraph label Jul 15, 2024

github-actions bot requested review from JaredTan95, jpkrohling and mapno July 15, 2024 01:07

evantorrie marked this pull request as ready for review July 15, 2024 03:04

evantorrie requested a review from a team July 15, 2024 03:04

github-actions bot assigned songy23 Jul 15, 2024

jpkrohling added the Skip Changelog PRs that do not require a CHANGELOG.md entry label Jul 15, 2024

t00mas approved these changes Jul 15, 2024

View reviewed changes

jpkrohling mentioned this pull request Jul 16, 2024

[chore][pkg/pdataset/pmetrictest] introduce IgnoreDatapointAttributesOrder option to CompareMetricsOption #34076

Merged

jpkrohling requested changes Jul 16, 2024

View reviewed changes

evantorrie added 3 commits July 16, 2024 13:27

Merge branch 'main' into fix-servicegraph-tests

ae93e41

Merge branch 'main' into fix-servicegraph-tests

1d303cc

Revert "[chore] skip flaky test (open-telemetry#34120)"

153924e

This reverts commit ed3a56a.

jpkrohling approved these changes Jul 17, 2024

View reviewed changes

jpkrohling merged commit f2cfc2d into open-telemetry:main Jul 17, 2024
154 checks passed

github-actions bot added this to the next release milestone Jul 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[connector/servicegraph] Coalesce different attr sets into single ScopeMetrics metric entry #34070

[connector/servicegraph] Coalesce different attr sets into single ScopeMetrics metric entry #34070

evantorrie commented Jul 15, 2024

evantorrie commented Jul 15, 2024

t00mas left a comment

jpkrohling left a comment

evantorrie commented Jul 17, 2024

evantorrie commented Jul 17, 2024

jpkrohling commented Jul 17, 2024

evantorrie commented Jul 17, 2024

jpkrohling commented Jul 17, 2024

[connector/servicegraph] Coalesce different attr sets into single ScopeMetrics metric entry #34070

[connector/servicegraph] Coalesce different attr sets into single ScopeMetrics metric entry #34070

Conversation

evantorrie commented Jul 15, 2024

evantorrie commented Jul 15, 2024

t00mas left a comment

Choose a reason for hiding this comment

jpkrohling left a comment

Choose a reason for hiding this comment

evantorrie commented Jul 17, 2024

evantorrie commented Jul 17, 2024

jpkrohling commented Jul 17, 2024

evantorrie commented Jul 17, 2024

jpkrohling commented Jul 17, 2024