[processor/spanmetrics] Resource attributes support #7075

chenzhihao · 2022-01-07T03:22:00Z

Description:
Currently, there is a bug / no logic to differentiate between resource attributes. This PR adds the feature to allow users to optionally specify resource attributes to append similar to the existing dimensions mechanism.

Service Name has been moved to become a default resource attribute instead of the attribute as per the resource semantic convention.

Implementation details:
Some of the critical implementation changes:

The core data structures are changed from map[metricKey]valType to map[resourceKey]map[metricKey]valType. This ensures the accurate aggregation of data based on specified resource attributes and dimensions.
This processor is stateful. Due to the nature of its logic, the concurrent executions of ConsumeTraces() will output incorrect data. This lock forces the ConsumeTraces() can only execute in serial. In the mean, the lock is not used in the internal functions of this processor. These internal functions are concurrent unsafe.

Potential todos:
We should consider these issues for the following improvements:

There is a bit of discrepancy between the use of terms "Dimensions" and "Attributes". It seems like "Attributes" is more commonly used so I have used that term in this PR for adding in resource attributes. I propose that we rename dimensions to attributes in the future, altho there will need to be some backwards compatibility added to the config to support this.
Should we still fall back to search attribute value from resources for dimensions? see todo.

Link to tracking Issue:
#6486

Testing:

Testing added to ensure new structure/hierarchy of metrics under instrumentationLibraryMetrics structure under Resource is generated correctly.
TestProcessorConsumeTracesConcurrentSafe is added to make sure this public function ConsumeTraces() does not cause race conditions(the tests are executed with -race flags)

Documentation:
Usage of the new config option resource_attributes added to README.md

…trics processor

- Use LRU cache for the "resource_attributes". - Add document/usage for the "resource_attributes" configuration.

processor/spanmetricsprocessor/processor.go

processor/spanmetricsprocessor/keybuilder/keybuilder.go

processor/spanmetricsprocessor/testdata/config-full.yaml

jamesmoessis · 2022-01-11T22:32:40Z

processor/spanmetricsprocessor/processor.go

-	defaultDimensionsCacheSize = 1000
+	serviceNameKey             = conventions.AttributeServiceName
+	instrumentationLibraryName = "spanmetricsprocessor"
+	operationKey               = "operation"   // OpenTelemetry non-standard constant.


The attributes used in this processor aren't standard. Something we might address at some point. For example "status_code" should probably be "http.status_code" if im not mistaken

jamesmoessis · 2022-01-11T22:36:15Z

processor/spanmetricsprocessor/processor.go

-	latencyCount         map[metricKey]uint64
-	latencySum           map[metricKey]float64
-	latencyBucketCounts  map[metricKey][]uint64
+	latencyCount         map[resourceKey]map[metricKey]uint64


Just a thought: I can't help but think this processor is slowly reinventing a lot of what the metrics SDK is already supposed to be doing. I wonder if we can simplify it by using metric instruments to record new values, and let the SDK do the rest? It would simplify this processor massively, I haven't looked into it enough to see if it would be possible though.

That is a valid point :)

Not sure on the limitations of this but it is worth exploring.

jamesmoessis · 2022-01-11T22:43:49Z

processor/spanmetricsprocessor/processor.go

@@ -227,9 +243,17 @@ func (p *processorImp) Capabilities() consumer.Capabilities {
 // It aggregates the trace data to generate metrics, forwarding these metrics to the discovered metrics exporter.
 // The original input trace data will be forwarded to the next consumer, unmodified.
 func (p *processorImp) ConsumeTraces(ctx context.Context, traces pdata.Traces) error {
+	p.lock.Lock()
+	// use defer to pass the output to downstream components as quick as possible.


To my understanding, this would synchronize the entire processor, not allowing two calls of ConsumeTraces to be called at once. I think it makes sense since there is a lot of internal state going on that shouldn't be messed with. It seems that same instances of processors aren't executed concurrently anyway, though someone with more expertise on the collector could verify that.

~~In this case I think deferring the unlocking is fine.~~

jamesmoessis · 2022-01-11T22:51:16Z

processor/spanmetricsprocessor/processor.go

-		p.lock.Unlock()
-		return nil, err
-	}
+		// If the service name doesn't exist, we treat it as invalid and do not generate a metric


We are changing this field to be mandatory. See the PR description:

Service Name has been moved to become a default resource attribute instead of the attribute as per the resource semantic convention.

This should be captured in some way, either via logging or an additional metric saying to allow for easy debugging.

processor/spanmetricsprocessor/processor.go

MovieStoreGuy

At this point in time, I do not believe this PR should be merged as it currently presented.

There is may too many behavioural changes for me to consider this safe to release on its own.

I would prefer this PR to broken down into smaller change sets so that iterating on behaviour changes doesn't block the accepted changes.

processor/spanmetricsprocessor/keybuilder/keybuilder.go

MovieStoreGuy

Few things that I had noticed but I still would like this split

MovieStoreGuy · 2022-01-12T01:14:46Z

processor/spanmetricsprocessor/processor.go

-	defaultDimensionsCacheSize = 1000
+	serviceNameKey             = conventions.AttributeServiceName
+	instrumentationLibraryName = "spanmetricsprocessor"
+	operationKey               = "operation"   // OpenTelemetry non-standard constant.


I would encourage you to use the semconv where possible.

MovieStoreGuy · 2022-01-12T01:18:06Z

processor/spanmetricsprocessor/processor.go

+// metricKey is used to carry the stringified metric attributes
 type metricKey string

+// resourceKey is used to carry the stringified resource attributes
+type resourceKey string


Since this is no method receivers on these types, they are superfluous and provide no additional value.

They're not superfluous, they help to make sense of the structures that use them to construct multi-dimensional maps. Rather than map[string]map[string]int, there is map[resourceKey]map[metricKey]int.

@MovieStoreGuy #7075 (comment)

MovieStoreGuy · 2022-01-12T01:21:50Z

processor/spanmetricsprocessor/processor.go

-	latencyCount         map[metricKey]uint64
-	latencySum           map[metricKey]float64
-	latencyBucketCounts  map[metricKey][]uint64
+	latencyCount         map[resourceKey]map[metricKey]uint64


That is a valid point :)

Not sure on the limitations of this but it is worth exploring.

MovieStoreGuy · 2022-01-12T01:27:48Z

processor/spanmetricsprocessor/processor.go

@@ -227,9 +243,17 @@ func (p *processorImp) Capabilities() consumer.Capabilities {
 // It aggregates the trace data to generate metrics, forwarding these metrics to the discovered metrics exporter.
 // The original input trace data will be forwarded to the next consumer, unmodified.
 func (p *processorImp) ConsumeTraces(ctx context.Context, traces pdata.Traces) error {
+	p.lock.Lock()
+	// use defer to pass the output to downstream components as quick as possible.


It is rather expensive way to do this.

You're basically blocking in processing (even not intended) for a secondary call to ConsumeTraces with no timeout.
You would be better using something like atomic or a channel as a semaphore to allow for fast exit instead of queue until holder is done.

MovieStoreGuy · 2022-01-12T01:29:50Z

processor/spanmetricsprocessor/processor.go

-		p.lock.Unlock()
-		return nil, err
-	}
+		// If the service name doesn't exist, we treat it as invalid and do not generate a metric


This should be captured in some way, either via logging or an additional metric saying to allow for easy debugging.

MovieStoreGuy · 2022-01-12T01:43:56Z

processor/spanmetricsprocessor/processor_test.go

+			for _, traces := range tc.traces {
+				// Test
+				traces := traces
+				// create an excessive concurrent usage. The processor will not be used in this way practically.


I am worried that since we are not joining and waiting on the validation thread of this test that we are walking into creating a flaky tests since result can happen after the "test" has completed.

MovieStoreGuy · 2022-01-12T01:48:39Z

processor/spanmetricsprocessor/processor_test.go

+			}
+			return true
+		})
+		assert.Empty(t, wantResourceAttrServiceA, "Did not see all expected dimensions in metric. Missing: ", wantResourceAttrServiceA)


Would it not be faster to check if the attributes len matches the expected len?

MovieStoreGuy · 2022-01-12T02:10:32Z

processor/spanmetricsprocessor/processor_test.go

+			switch k {
+			case notInSpanResourceAttr1:
+				assert.Fail(t, notInSpanResourceAttr1+" should not be in this metric")
+			default:
+				assert.Equal(t, wantResourceAttrServiceA[k], value)
+				delete(wantResourceAttrServiceA, k)


Why not use assert.Equal here?

It could be like:

assert.NotEqual(t, k, notInSpanResource, "Must not be resource defined...") assert.Equal(t, wantResourceAttrService[k], v.StringVal()) delete(wantResourceAttrServiceA, v.StringVal())

processor/spanmetricsprocessor/keybuilder/keybuilder.go

processor/spanmetricsprocessor/processor.go

Aneurysm9 · 2022-01-12T02:30:37Z

processor/spanmetricsprocessor/processor.go

@@ -227,9 +243,17 @@ func (p *processorImp) Capabilities() consumer.Capabilities {
 // It aggregates the trace data to generate metrics, forwarding these metrics to the discovered metrics exporter.
 // The original input trace data will be forwarded to the next consumer, unmodified.
 func (p *processorImp) ConsumeTraces(ctx context.Context, traces pdata.Traces) error {
+	p.lock.Lock()
+	// use defer to pass the output to downstream components as quick as possible.


Another approach that might be considered is to lean in to concurrency. Instead of emitting metrics for every invocation of ConsumeTraces(), perhaps start a goroutine with a ticker that will periodically take the lock, build the metrics, emit them, and reset the exemplars. I think the whole of ConsumeTraces() would probably still need to be under lock, but it would do less work with each invocation and metrics emission would be more regular.

chenzhihao · 2022-01-12T05:33:49Z

At this point in time, I do not believe this PR should be merged as it currently presented.

There is may too many behavioural changes for me to consider this safe to release on its own.

I would prefer this PR to broken down into smaller change sets so that iterating on behaviour changes doesn't block the accepted changes.

@MovieStoreGuy I think there are two user-facing behaviour changes as we put in the PR's description:

Currently, there is a bug / no logic to differentiate between resource attributes. This PR adds the feature to allow users to optionally specify resource attributes to append similar to the existing dimensions mechanism.
Service Name has been moved to become a default resource attribute instead of the attribute as per the resource semantic convention.

I agree we should aim to create the small changeset as we can. But the feature change is a bit atomic and hard to break. If the tests are giving us enough confidence to cover the behaviour change, we should consider they are safe. Feel free to let me know what tests should be added.

chenzhihao · 2022-01-18T05:57:08Z

At this point in time, I do not believe this PR should be merged as it currently presented.

There is may too many behavioural changes for me to consider this safe to release on its own.

I would prefer this PR to broken down into smaller change sets so that iterating on behaviour changes doesn't block the accepted changes.

@Aneurysm9 @albertteoh Can I ask your opinion on the above comment from @MovieStoreGuy ? I replied before as:

I think there are two user-facing behaviour changes as we put in the PR's description:

Currently, there is a bug / no logic to differentiate between resource attributes. This PR adds the feature to allow users to optionally specify resource attributes to append similar to the existing dimensions mechanism.

Service Name has been moved to become a default resource attribute instead of the attribute as per the resource semantic convention.

I agree we should aim to create the small changeset as we can. But the feature change is a bit atomic and hard to break. If the tests are giving us enough confidence to cover the behaviour change, we should consider they are safe. Feel free to let me know what tests should be added.

I'm open to breaking it down(probably later) if we are leaning towards that.

albertteoh · 2022-01-18T11:18:41Z

Can I ask your opinion on the above comment

In general, I agree with @MovieStoreGuy in reducing the diff size, and I think there are some opportunities to do this. I also appreciate the difficulty of keeping these PR separate, particularly when introducing the new resource attr key to internal maps.

Moreover, I think github handles stacked diffs quite nicely, so you can create PRs for various layers of functionality.

In this PR's case, at least the keybuilder package and its uses, could be dedicated to a separate PR.

Currently, there is a bug / no logic to differentiate between resource attributes. This PR adds the feature to allow users to optionally specify resource attributes to append similar to the existing dimensions mechanism.

Service Name has been moved to become a default resource attribute instead of the attribute as per the resource semantic convention.

Is the former a pre-requisite for the latter? If so, perhaps there's another opportunity to break the PR up.

Aneurysm9 · 2022-01-18T18:27:05Z

I think I agree that there may be too many things happening here to have a high degree of confidence in the correctness of the changes. Is it possible to structure this as a series of PRs that each take independent steps in this direction? If not, because an intermediate state would be broken or something like that, I would really like to see it rebased into a series of commits that can be viewed as a sequence of refactorings and additions.

chenzhihao · 2022-01-24T22:48:10Z

Please see the discussion about removing the aggregation from spanmettricsprocessor:
#403 (comment)

I think we should be on hold adding the new feature into this processor until the aggregation functionality is removed.

Tenaria and others added 30 commits December 13, 2021 13:47

starts initial work on optional resource attributes config to span me…

f5553a1

…trics processor

adds test to check metric length when copying resources

f186407

adds changelog and fixes tests

3206a11

Merge branch 'main' into OBC-256-Resource-Attributes-Merge-Upstream

3da4719

fix tests

945a6d1

remove empty new line

fc62eab

updates changelog

07b5ad3

add accidentally removed comment back in

5782cb0

revert variable name change and fix comment

e2eb41d

revert variable name cahnge

321c6e2

adds test to ensure ordering of keys used for aggregation

fb0561c

fixes lock held; optimise for loop; add parallel in test; add comments

80ebf64

reverse conditions to simplify code

19775dc

update tests to use defined constant instrumentationLibraryName

8c2d38e

Merge branch 'main' into OBC-256-Resource-Attributes-Merge-Upstream

cbbf532

update the logic of method updateCallMetrics

91c8b7d

fix tests

068a1bd

add todos

18f20d0

add todos

4d370ad

Merge branch 'main' into OBC-256-Resource-Attributes-Merge-Upstream

024a122

fix lint

7561f97

fix lint

62e892f

remove resourceAttrList

d8edfda

Update "resource_attributes"

d175a4e

- Use LRU cache for the "resource_attributes". - Add document/usage for the "resource_attributes" configuration.

fix lint

608a083

remove duplicate p.resetExemplarData()

a0519c0

refactor how processor reset after every ConsumeTraces process

9772056

move keybuilder to a package

ae17b37

update comments

8ea124c

add copyright license

78c11f3

iterate the slice via range

dd7609a

chenzhihao force-pushed the resource-attributes-support branch from 7575eac to 548efb5 Compare January 11, 2022 04:47

simplify code

76713ee

chenzhihao force-pushed the resource-attributes-support branch from 548efb5 to 76713ee Compare January 11, 2022 04:49

chenzhihao added 2 commits January 11, 2022 15:52

simplify code

21822be

fix lint

5483b44

albertteoh reviewed Jan 11, 2022

View reviewed changes

processor/spanmetricsprocessor/processor.go Outdated Show resolved Hide resolved

processor/spanmetricsprocessor/processor.go Outdated Show resolved Hide resolved

processor/spanmetricsprocessor/processor.go Show resolved Hide resolved

jamesmoessis reviewed Jan 11, 2022

View reviewed changes

chenzhihao added 4 commits January 12, 2022 11:22

rename variable

4df2bb7

simplify code

d52bc55

simplify code

80a1dd2

add comments for keybuilder

1a90bae

MovieStoreGuy requested changes Jan 12, 2022

View reviewed changes

processor/spanmetricsprocessor/keybuilder/keybuilder.go Outdated Show resolved Hide resolved

MovieStoreGuy reviewed Jan 12, 2022

View reviewed changes

Aneurysm9 reviewed Jan 12, 2022

View reviewed changes

chenzhihao added 7 commits January 12, 2022 14:30

update comment; remove defer()

62b1e9a

treat empty string as valid

6b507da

make Seperator public

f8f6112

add missed configuration field

b840730

add test for BuildMetricKey() and BuildResourceAttrKey()

2f4034d

fix typo; add comments to the testdata config files.

524509f

gofmt

828b674

chenzhihao mentioned this pull request Jan 13, 2022

Porting back changes from upstream PR to the fork atlassian-forks/opentelemetry-collector-contrib#1363

Merged

service name is not mandatory as an attribute

c8853c2

chenzhihao closed this Jan 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[processor/spanmetrics] Resource attributes support #7075

[processor/spanmetrics] Resource attributes support #7075

chenzhihao commented Jan 7, 2022 •

edited

Loading

jamesmoessis Jan 11, 2022

jamesmoessis Jan 11, 2022

MovieStoreGuy Jan 12, 2022

jamesmoessis Jan 11, 2022 •

edited

Loading

jamesmoessis Jan 11, 2022

chenzhihao Jan 12, 2022

MovieStoreGuy Jan 12, 2022

MovieStoreGuy left a comment

MovieStoreGuy left a comment

MovieStoreGuy Jan 12, 2022

MovieStoreGuy Jan 12, 2022

Aneurysm9 Jan 12, 2022

chenzhihao Jan 12, 2022

MovieStoreGuy Jan 12, 2022

MovieStoreGuy Jan 12, 2022

MovieStoreGuy Jan 12, 2022

MovieStoreGuy Jan 12, 2022

MovieStoreGuy Jan 12, 2022

MovieStoreGuy Jan 12, 2022

Aneurysm9 Jan 12, 2022

chenzhihao commented Jan 12, 2022 •

edited

Loading

chenzhihao commented Jan 18, 2022

albertteoh commented Jan 18, 2022

Aneurysm9 commented Jan 18, 2022

chenzhihao commented Jan 24, 2022

[processor/spanmetrics] Resource attributes support #7075

[processor/spanmetrics] Resource attributes support #7075

Conversation

chenzhihao commented Jan 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesmoessis Jan 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MovieStoreGuy left a comment

Choose a reason for hiding this comment

MovieStoreGuy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenzhihao commented Jan 12, 2022 • edited Loading

chenzhihao commented Jan 18, 2022

albertteoh commented Jan 18, 2022

Aneurysm9 commented Jan 18, 2022

chenzhihao commented Jan 24, 2022

chenzhihao commented Jan 7, 2022 •

edited

Loading

jamesmoessis Jan 11, 2022 •

edited

Loading

chenzhihao commented Jan 12, 2022 •

edited

Loading