-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics semantic conventions for timed operations #657
Add metrics semantic conventions for timed operations #657
Conversation
|
How about a standard set of labels to distinguish metrics of the same category/kind, but different target? For example, my service may make many http calls as a client. There should be some sort of label to distinguish what the target of the http calls was. Same would go for databases, as my service may talk to multiple database systems. |
@jkwatson, I considered this idea, but I was struggling to come up with a set of labels that I thought could be broadly applied. I had punted on defining labels until we make category-specific semantic conventions. I'd be happy to write them up and add them to this PR if y'all think that makes sense. |
This, along with open-telemetry/oteps#108, form the basis of a new section on metric naming for the OpenTelemetry spec. The comments discuss several ways to reflect these statistics, with one, two, or three instruments. I am in favor of defining these statistics using a single |
…lueRecorder instrument
1968873
to
9b5446e
Compare
|
||
#### Status | ||
|
||
Recordings with the duration instrument must include a `status` label. The value of this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me this is connected with open-telemetry/oteps#123, which has called the canonical status code into question (again). As discussed in today's SIG, there's a need for systems to easily find the status code that describes an error condition, so having a bunch of specific conventions (e.g., http.status, posix.status, ...) is not a complete solution.
The current Span Status structure notwithstanding, we could represent a variable status code space in string-valued format by prefixing the status code space name to the code name, e.g.,
status=http:404
status=posix:EINTR
status=grpc:CANCELLED
This would require a breaking code change for spans, but if Span Status included: (1) a conventional code number, (2) a conventional code space name, (3) a message, the above could be generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I do not mean this comment to be blocking. The outcome of open-telemetry/oteps#123 may suggest a change in this part of the specification either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requirement here is problematic in the light of #706
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what is going to happen to the Span's status attribute, but this is still a requirement we want for metrics. We'd like a low-cardinality but highly informative label, with a known enumeration of possible values, that can be further collapsed downstream by a metric view, exporter, or a vendor's back-end.
With the current state of the trace spec, I feel the StatusCanonicalCode is the correct value type for this label. If the span's status attributes actually is removed, we'll need to refactor this spec, but I don't expect its intent will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed ^^^.
@bogdandrutu please merge? |
Recordings with the duration instrument must include a `status` label. The value of this | ||
label must be a valid Span [`StatusCanonicalCode`](../../trace/api.md#statuscanonicalcode). | ||
|
||
#### Labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we need to record timing, status, and attributes of a Span
to support the metrics derived from spans. Does this mean that the unsampled span becomes just a non-exported Span but otherwise stops being a no-op structure? This might add quite a bit of overhead in the unsampled case, but it does also make sense to me to use Span
as a fallback generic structure for a request across all instrumentation. I wonder what about libraries that do natively record metric information in their own representation of a request, for example
For these, would we recommend unsampled spans to continue to be pure no-op and use this native functionality for metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are raising good points. To export metrics from spans requires a real span object that can remember attributes from start and wait for the status and duration to be known. If there is a good (future) way to encode metrics in spans we might choose that instead. It wonder if the Span Sampler API could be used to return a reduced-size Span which would act like a no-op span but store enough information to report a duration metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a good (future) way
Ah I didn't realize that's future work since this doc talks a lot about Spans. But I wonder is it just referring to the concept of a span (window of time) and not the data model we use in tracing?
For sampling, yeah I think it would be possible to have three types of spans, exported real span, non-exported real span, and no-op span if we needed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmacd I missed the message to merge this, but I have some questions/concerns :)
|
||
When creating duration metrics from Spans, the duration **in milliseconds** of all | ||
_recorded_ Spans of a given category and kind should be aggregated together in a single | ||
[`ValueRecorder`](../api.md#valuerecorder). A Span's duration should be recorded onto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not add this as requirement yet "A Span's duration should be recorded onto this instrument regardless of whether the Span is sampled.". Can we leave this to the future work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason is because this is a tracing API change not a semantic convention
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure this is an API change? I imagine this can be done with the sampler API, but haven't done the exercise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious to learn how achieve this with sampler api? If SamplingDecision is NOT_RECORD
, then we dont start/stop the span, and hence we won't track the duration, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this spec change is about auto-generated metrics by the SDK. I believe this applies to library or manual instrumentation that wants to record timing metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove that line. I don't think it's relevant for defining the semantic conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this relates to my other point - agree that if this is about metrics and not necessarily spans we should clarify that. It's unclear whether this doc is talking about Span
, the data model we use in tracing, hence these points about sampling, or just any window of time, for which we want to gather metrics and the implementation is left for the future. I'm getting the sense that it's the latter, but the wording in the entire doc seems to be tied to tracing, not just this line, below we have talk of "Labels applied to these metric recordings should follow the attribute semantic conventions
of the spans from which they are derived." which has the same issue as this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention with this PR was to define semantic conventions for metrics based on Spans. That is, based on the Span object from the tracing data model.
I like this because I feel we get some simple semantic conventions for metrics sort of by default. If you're making timing metrics, and you already have spans timing those same operations, making the metrics will be easy: just follow this naming convention, use the same semantic conventions as the Span did (adjusting for cardinality), and you're good to go.
Perhaps this is causing more confusion than it's worth. I'm happy to reframe it to be much more independent of our Span semantic conventions, but this gives me a concern that there would be a lot of duplication between our semantic conventions for duration metrics and spans, and that they would diverge over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All that said, I support John's suggestion of just removing this line:
A Span's duration should be recorded onto this instrument regardless of whether the Span is sampled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with wanting to reuse the semantic conventions defined for spans, I made a suggestion to try to decouple it from the implementation (particularly sampling) of tracing at least for now.
For Spans that follow one of the common semantic-conventional _areas_, the category | ||
should be the label prefix used in that semantic convention. | ||
|
||
For example: | ||
|
||
* `http.server.duration` | ||
* `http.client.duration` | ||
* `db.client.duration` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit hard to implement with this requirement of having different name for the metric. Can we make {category}
a label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. We should be able to generate a metric name from a span name, that's the goal, at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to follow the guidance from OTEP 108:
Associated metrics SHOULD be nested together in a hierarchy based on their usage.
"As a rule of thumb, aggregations over all the dimensions of a given metric SHOULD be meaningful," as Prometheus recommends.
I don't believe that an aggregation of HTTP client duration and database client duration would be meaningful, so I've suggested that we separate them with this top-level category. I certainly may be wrong; I may not have thought of all use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmacd the problem that I see is that the {category} is unbounded, and we will end up with a lot of instruments. Not saying it is impossible, just harder than a label.
But if all feel strong about that we can keep it this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it makes more sense to make the categories bounded? We could include a list of known categories in this semantic convention. Each category likely has its own semantic conventions anyway; we'll be adding those over a series of upcoming PRs.
|
||
### Convention | ||
|
||
When creating duration metrics from Spans, the duration **in milliseconds** of all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify that this is not a requirement to be implemented by all the libraries? I would suggest to start with implementing this in the collector from the exported spans and learn from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of this PR is to have a semantic convention that allows the collector to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really? I don't believe that's the goal at all. Where did you get this from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to say this is just a semantic convention. We should be able to use it for creating metrics from non-sampled spans, but that's an implementation question, not a semantic question. It's the implementation and sampler API question that I think is out of scope here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of this PR is to have a semantic convention that allows the collector to do this.
I am confused on this statement. The 'justification' section in this PR talks about inaccuracies while doing aggregation in downstream (collector), so this must be about getting metrics before sampling?
Because these Spans may be sampled, aggregating information about them downstream is subject to inaccuracies. Computing count, duration, and error rate from all Spans (including sampled=true and sampled=false) results in more accurate aggregates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like to clarify the goal of this PR, as people seem to be confused by this and make sure we don't misunderstand what are the action items, where this should be implemented, who should implement this, etc.
|
||
#### Labels | ||
|
||
Labels applied to these metric recordings should follow the attribute semantic conventions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be written like this?
Labels applied to these metric recordings should follow the attribute semantic conventions of the corresponding span for the operation. In other words, the labels should match as best as possible the attributes the Span for the operation would have if it had been sampled.
I think this along with the duration tweak suggested above decouples this PR from the sampling aspect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the corresponding Span for the operation wording. I think I'll leave off the second sentence.
It's important that this guidance is followed by the writers of the category-specific semantic conventions. And I think it'll be clear enough for them.
|
||
When creating duration metrics from Spans, the duration **in milliseconds** of all | ||
_recorded_ Spans of a given category and kind should be aggregated together in a single | ||
[`ValueRecorder`](../api.md#valuerecorder). A Span's duration should be recorded onto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with wanting to reuse the semantic conventions defined for spans, I made a suggestion to try to decouple it from the implementation (particularly sampling) of tracing at least for now.
|
||
### Convention | ||
|
||
When creating duration metrics from Spans, the duration **in milliseconds** of all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to use milliseconds over base units of seconds? For open-telemetry/oteps#119, I've gone with base units, fwiw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is that I hadn't read OTep 119. 😂
Fixed in c92d8f5.
I have to be missing something. Is the goal of OT metrics to support collection of aggregated metrics only within the context of active/traced/sampled spans? I can't reconcile the metrics section of this API with some of the detailed application/framework-level metrics that are more often collected. If this spec is only intended for auto-instrumentation agents, it still conflicts with other easy-to-enable app framework level metrics. While I think standardizing the format of metrics for http server requests or http client requests has value, I am having trouble resolving the language used because it seems inextricably linked to tracing spans, and that just doesn't resolve against broader SRE practices. If the OT metrics API is specifically related to gathering additional metrics about sampled spans, then we don't have an issue. |
Use seconds to record duration, rather than milliseconds
I am closing this PR in favor of my OTEP PR and @gfuller1's HTTP metrics PR. |
This PR adds semantic conventions for metrics that should be broadly applicable to many timed operations. This PR attempts to implement #654.
This PR addresses naming of these metrics, but does not attempt to specify labeling. I believe that belongs in category-specific semantic conventions, as the labels used will vary across http, database, messaging, etc...
I have attempted to follow the guidelines here: open-telemetry/oteps#108