-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/otlp/metrics: fix otel.* namespaced metrics #142
Conversation
This change fixes a problem where sytem.* and process.* metrics were correctly namespaced with otel.*, but the original metric was not kept.
@@ -70,7 +70,9 @@ func remapSystemMetrics(all pmetric.MetricSlice, m pmetric.Metric) { | |||
copyMetric(all, m, "system.disk.in_use", 1) | |||
} | |||
// process.* and system.* metrics need to be prepended with the otel.* namespace | |||
m.SetName("otel." + m.Name()) | |||
newm := all.AppendEmpty() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chatted with @liustanley , only the metrics from the previous extractSystemMetrics
should be kept: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/23445/files#diff-59a3a2cae4395f15cad4800806f9dc7680aed0f303dd896c0ca8663e0fd0ac35L56
For process.*
metrics, only the ones with otel.*
prefix are expected, original ones are expected to be overriden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but I don't see the difference between this implementation and the one in the linked code. All metrics that start with system.* and process.* seem to get the prefix. I don't see any special case for process.*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of system.* and process.* metrics get the otel. prefix - that's right. In addition to that, the original system.* metrics (no otel. prefix) should be preserved / copied, while process.* metrics don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, but I don't see that in the old code. The behaviour seems the same for both process and system
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way, in my latest commit, I think that I have applied what you meant. PTAL
@@ -69,7 +69,12 @@ func remapSystemMetrics(all pmetric.MetricSlice, m pmetric.Metric) { | |||
case "system.filesystem.utilization": | |||
copyMetric(all, m, "system.disk.in_use", 1) | |||
} | |||
// process.* and system.* metrics need to be prepended with the otel.* namespace | |||
if strings.HasPrefix(name, "system.") { | |||
// we keep the original metric for system.* metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked again at the old code, I think the behavior is still different: we prepend otel.
to the original names, but we keep the remapped names with no otel.
prefix. https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/23445/files#diff-59a3a2cae4395f15cad4800806f9dc7680aed0f303dd896c0ca8663e0fd0ac35L115-L129
For example, in the old code:
- if we have
"system.filesystem.utilization"
, first we remap it to"system.disk.in_use"
and append the new one to slice - then we prepend
otel.
to"system.filesystem.utilization"
but not
"system.disk.in_use"
- in the end we got
"otel.system.filesystem.utilization"
and"system.disk.in_use"
vs.
- Before this PR, in the end we got
"otel.system.filesystem.utilization"
and"otel.system.disk.in_use"
- With the current PR, in the end we got
"system.filesystem.utilization"
,"system.disk.in_use"
,"otel.system.filesystem.utilization"
and"otel.system.disk.in_use"
Both of ^ are wrong. In short, we prepend otel.
to the original metrics, but don't prepend it to the remapped metrics.
Sorry for the confusion, I was also trying to understand the old behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are misunderstanding the current PR. It works exactly like you expect (in your "old code" example above). I have simplified the tests for you to make it easier for you to understand. Feel free to set up a call if it's still not clear.
Now, in the tests:
- The
in
value contains the original metric - The
out
value contains any new (!) metrics added. In the case ofsystem.*
metrics you will notice there is a copy of the original. - The test asserts that the original metric (from
in
, notout
) is prepended withotel.*
when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken: using the same example, now after remapMetrics
we're getting 3 metrics: "otel.system.filesystem.utilization"
, "system.disk.in_use"
and "system.filesystem.utilization"
?
If so, the last one "system.filesystem.utilization"
shouldn't be there
in: metric("system.cpu.load_average.1m", point{f: 1}), | ||
out: []pmetric.Metric{ | ||
metric("system.load.1", point{f: 1}), | ||
metric("system.cpu.load_average.1m", point{f: 1}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be otel.system.cpu.load_average.1m
?
require.Equal(t, out, dest.At(dest.Len()-len(tt.out)+i)) | ||
} | ||
t.Run("", func(t *testing.T) { | ||
dest := pmetric.NewMetricSlice() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more clear if in
is in dest
which is consistent with:
md := metricsArray.At(k) |
remapMetrics(metricsArray, md) |
@@ -69,7 +69,12 @@ func remapSystemMetrics(all pmetric.MetricSlice, m pmetric.Metric) { | |||
case "system.filesystem.utilization": | |||
copyMetric(all, m, "system.disk.in_use", 1) | |||
} | |||
// process.* and system.* metrics need to be prepended with the otel.* namespace | |||
if strings.HasPrefix(name, "system.") { | |||
// we keep the original metric for system.* metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken: using the same example, now after remapMetrics
we're getting 3 metrics: "otel.system.filesystem.utilization"
, "system.disk.in_use"
and "system.filesystem.utilization"
?
If so, the last one "system.filesystem.utilization"
shouldn't be there
}, | ||
}, | ||
{ | ||
in: metric("system.filesystem.utilization", point{f: 15}), | ||
out: []pmetric.Metric{metric("system.disk.in_use", point{f: 15})}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm I feel I'm getting more confused now - looks like the previous test is doing the the right thing, but how comes the resulting metric in datadog exporter got an otel.
prefix? open-telemetry/opentelemetry-collector-contrib#23445 (comment)
I finally found the root cause - the bug isn't in this file opentelemetry-mapping-go/pkg/otlp/metrics/metrics_translator.go Lines 660 to 683 in 3d07b09
The for loop above is unfortunately an anti-pattern: you are appending more elements to
Sent out #148 to fix. |
This change fixes a problem where sytem.* and process.* metrics were correctly namespaced with otel.*, but the original metric was not kept.