Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow all metric conventions to be either synchronous or asynchronous. #2458

Merged
merged 9 commits into from
Apr 8, 2022

Conversation

fstab
Copy link
Member

@fstab fstab commented Mar 30, 2022

Changes

It should be an implementation detail whether metric instruments are synchronous or asynchronous.

As an example, for many Java runtime metrics there will be two ways to implement them: Using JFR events, or using JMX. JFR-based implementations will likely be synchronous, while JMX-based implementations will likely be asynchronous. The semantic conventions spec should be defined in a way to allow compatible implementations both ways.

This is generally true, not just for Java runtime metrics.

This PR removes "synchronous" and "asynchronous" from the specification of semantic conventions for metrics.

@fstab fstab requested review from a team March 30, 2022 09:01
@fstab fstab force-pushed the jvm-metrics-can-be-synchronous branch from 922a501 to 9be3f8e Compare March 30, 2022 09:03
@reyang reyang added spec:metrics Related to the specification/metrics directory area:semantic-conventions Related to semantic conventions labels Mar 30, 2022
@jmacd
Copy link
Contributor

jmacd commented Mar 30, 2022

I would suggest that whether metric instruments are synchronous or asynchronous is always implementation detail. You should be able to switch between these instrument forms for performance reasons, and temporality controls are available for exporters to control what the consumer receives.

Moreover, for the OTLP exporter specifically, the supported temporality preferences will give consistent temporality output, even when the underlying implementation changes between synchronous and asynchronous instruments.

@trask
Copy link
Member

trask commented Mar 30, 2022

@fstab I sent fstab#1 to your PR to expand this to all metric conventions (and consolidate #2451)

@fstab
Copy link
Member Author

fstab commented Mar 30, 2022

Thanks Trask, I merged it, and rebased to the latest origin/main.

@fstab fstab force-pushed the jvm-metrics-can-be-synchronous branch 2 times, most recently from 4faebf4 to c28ba84 Compare March 30, 2022 18:57
@trask
Copy link
Member

trask commented Mar 30, 2022

@fstab can you update the PR title and description to reflect the expanded scope?

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to clarify the "semantic convention" rational to use the "instrument" type instead of the Aggregation type in the table. For OTLP consumers it is irrelevant the instrument type (since someone can use a view to change this), so what is the correct way to define the "semantic convention"? Who is the user/consumer of these? Etc.

@fstab fstab force-pushed the jvm-metrics-can-be-synchronous branch from 67eae26 to 5727162 Compare March 31, 2022 14:25
@fstab fstab changed the title Don't specify whether JVM metrics are synchronous or asynchronous Allow all metric conventions to be either synchronous or asynchronous. Mar 31, 2022
@fstab
Copy link
Member Author

fstab commented Mar 31, 2022

@trask I updated the title and description, and I rebased on main again. Should I squash the commits or will this be done during merge?

@jmacd
Copy link
Contributor

jmacd commented Mar 31, 2022

I would like to clarify the "semantic convention" rational to use the "instrument" type instead of the Aggregation type in the table. For OTLP consumers it is irrelevant the instrument type (since someone can use a view to change this), so what is the correct way to define the "semantic convention"? Who is the user/consumer of these? Etc.

Opinions here: The "user" is the one who decides to link in a standards-conventional library of code; the standards conventions tell them what kind of view they should expect to be able to write. The Aggregation is user-controllable from that point onward, with the conventional instrument telling you which Aggregation to expect by default (but not which temporality).

I'm so pleased that this issue has come to the head, because the point of temporality is that it is non-semantic in nature. In other words--its a performance and cost trade. I loved hearing this:

JFR-based implementations will likely be synchronous, while JMX-based implementations will likely be asynchronous

Sweet!!

@trask
Copy link
Member

trask commented Mar 31, 2022

@trask I updated the title and description, and I rebased on main again. Should I squash the commits or will this be done during merge?

@fstab thx! it will be squashed during merge

I would like to clarify the "semantic convention" rational to use the "instrument" type instead of the Aggregation type in the table. For OTLP consumers it is irrelevant the instrument type (since someone can use a view to change this), so what is the correct way to define the "semantic convention"? Who is the user/consumer of these? Etc.

My basic view (biased as an instrumenter I'm sure) is that the semantic conventions are a set of requirements for compliant instrumentation. This also then defines guarantees for what OTLP consumers / backends can expect from compliant instrumentation under default views / span processors / log processors.

If we want to make a stronger statement about metric aggregation type, we could list both instrument type and aggregation type in the semantic conventions to make it clear that if a user alters the aggregation type of a "standard" metric, it will no longer be considered a "standard" metric.

@fstab
Copy link
Member Author

fstab commented Mar 31, 2022

I would like to clarify the "semantic convention" rational to use the "instrument" type instead of the Aggregation type in the table. For OTLP consumers it is irrelevant the instrument type (since someone can use a view to change this), so what is the correct way to define the "semantic convention"? Who is the user/consumer of these? Etc.

This is a good point.

I'm still struggling with the terminology.

We are currently using Instrument as defined in metrics/api.md. Unfortunately there is no terminology to express "a Counter that can either be synchronous or asynchronous", because according to the spec the term Counter is by definition synchronous. We are trying to use Counter without requiring it to be synchronous, but that's not the terminology defined in metrics/api.md.

What we could do is use Point Kind from metrics/datamodel.md instead of Instrument:

Counter -> Monotonic Sum
UpDownCounter -> Gauge
Histogram -> Histogram

The Point Kind has the advantage that it covers both, the synchronous and the asynchronous Instrument.

What do you think?

@reyang
Copy link
Member

reyang commented Apr 1, 2022

I'm still struggling with the terminology.

@fstab these are the available Aggregation Types (essentially what the SDK could potentially produce):

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#aggregation

Data Model is a super set of what the SDK could produce because it contains extra stuff (e.g. Summary) for compat/interop with other systems (e.g. Prometheus).

Hope this helps!

@fstab
Copy link
Member Author

fstab commented Apr 1, 2022

Thanks a lot @reyang.

My understanding is if we specify the aggregation (left column) in the semantic conventions, then implementations could choose any of the compatible instrument types (right column).

Aggregation Compatible Instrument Types
Monotonic Sum Counter
Asynchronous Counter
Non-Monotonic Sum UpDownCounter
Asynchronous UpDownCounter
Asynchronous Gauge
Explicit Bucket Histogram Histogram

Sounds like for monitoring backends it should be fine to specify aggregations only, because that defines what monitoring backends can expect to be sent over the wire (except for temporality).

For implementers that should be fine too, because they are free to choose any of the compatible Instrument types, as the instrument type would be an implementation detail.

I'm happy to replace Instrument with Aggregation in the semantic conventions.

What do you think?

@jmacd
Copy link
Contributor

jmacd commented Apr 1, 2022

Correction: @fstab, Asynchronous Gauge instruments are not compatible with Non-monotonic Sum--that's a semantic change. Otherwise, I support the direction you're heading.

Unfortunately there is no terminology to express "a Counter that can either be synchronous or asynchronous", because according to the spec the term Counter is by definition synchronous.

I agree this needs to be fixed. One solution would be e.g. to list "Counter or Asynchronous Counter" everywhere. Maybe "(Asynchronous) Counter" conveys this without new terminology? We might want way to refer to instruments and semantic-equivalents. Counters and Asynchronous Counters are semantic equivalents, and same for UpDownCounter and Asynchronous UpDownCounter. Otherwise, there are no other semantic equivalencies between instruments.

Instrument Semantically-equivalent instruments
Counter Asynchronous Counter
UpDownCounter Asynchronous UpDownCounter
Histogram
Asynchronous Counter Counter
Asynchronous UpDownCounter UpDownCounter
Asynchronous Gauge

Would a general guideline that "semantic conventions are written to use "Counter" and "UpDownCounter" but for each of these the asynchronous equivalent may be used" work?

@fstab fstab force-pushed the jvm-metrics-can-be-synchronous branch 2 times, most recently from b82d948 to 87f4e80 Compare April 1, 2022 21:08
@fstab fstab force-pushed the jvm-metrics-can-be-synchronous branch from b59993f to 948906e Compare April 5, 2022 11:28
@fstab
Copy link
Member Author

fstab commented Apr 5, 2022

I fixed the Markdownlint error by adding a \n to the end of README.md.

@dyladan
Copy link
Member

dyladan commented Apr 5, 2022

First, to get it out of the way I support this PR as JS maintainer. I think this allows quite a bit more freedom for instrumentation authors and different libraries/runtimes may expose metrics in different ways so allowing this flexibility seems like a good idea for the long haul.

Now I'd like to address the disagreement I see here. It seems like there is a fundamental disagreement here on who the semantic conventions are for. Is it meant to describe what the instrumentation authors should do, or is it meant to describe what the backend processors should expect to see? This question hasn't been that important until now because in tracing there was no mechanism like views which could cause those two to be in disagreement. Now with views, it is possible for the user to configure metrics such that they disagree with the semantic conventions. I think this is a question that we need to resolve and clearly document, because without it we are likely to see these disagreements again in the future, and even worse than that a backend may incorrectly process or display data based on the semantic conventions in particular configurations. I think it can be handled outside this PR, but I thought it was important to address.

@jmacd
Copy link
Contributor

jmacd commented Apr 5, 2022

semantic conventions are useless for backends

Are you concerned about a potential change of aggregation or change of temporality?

The SDK has temporality controls so I think this is not the concern you're having.

If the concern is about change of aggregation, isn't that the whole point of a Views feature in our SDKs? Viewing a Histogram instrument using the SUM, COUNT, MIN, or MAX aggregation, for example--the user does this on purpose, so I don't see a problem.

@jmacd
Copy link
Contributor

jmacd commented Apr 5, 2022

meant to describe what the instrumentation authors should do, or is it meant to describe what the backend processors should expect to see?

The way I understand it, the semantic conventions are meant to indicate 1 of 4 semantic categories.

Any aggregation that is semantically compatible with the instrument category is meaningful. I believe our job as the authors of semantic conventions and a data model is to define what is meaningful, not what is useful. A simple backend that faithfully displays each kind of aggregation can make a useful presentation out of any data, which should convey meaning to a user provided that the aggregation was semantically compatible with the instrument dictated by semantic conventions. The user controls the View, and the user receives the intended meaning for whatever use they have. Does this answer your concerns @dyladan and @bogdandrutu?

@dyladan
Copy link
Member

dyladan commented Apr 5, 2022

Does this answer your concerns @dyladan and @bogdandrutu?

I think we are in agreement, but @bogdandrutu's concern seems different than what mine was. It seems to me that @bogdandrutu feels that the semconv should be able to be interpreted exactly by a backend to decide what data to display and how to display it. You seem to feel that the semconv describes what the data means, but not necessarily what its exact representation needs to be and requires the backend to have some flexibility. FWIW I think you and I are interpreting the meaning of semantic conventions the same way. My point is simply that this needs to be agreed on by everyone or we're going to see these arguments again.

@reyang
Copy link
Member

reyang commented Apr 7, 2022

Does this answer your concerns @dyladan and @bogdandrutu?

I think we are in agreement, but @bogdandrutu's concern seems different than what mine was. It seems to me that @bogdandrutu feels that the semconv should be able to be interpreted exactly by a backend to decide what data to display and how to display it. You seem to feel that the semconv describes what the data means, but not necessarily what its exact representation needs to be and requires the backend to have some flexibility. FWIW I think you and I are interpreting the meaning of semantic conventions the same way. My point is simply that this needs to be agreed on by everyone or we're going to see these arguments again.

+1, I've created #2475 to track this issue and will raise it to the semantic convention SIG. This is a
wider issue which doesn't have to block this PR.

@fstab there is one CI job failing? Would you fix it? Thanks!

fstab and others added 7 commits April 7, 2022 22:13
Signed-off-by: Fabian Stäber <fabian@fstab.de>
Signed-off-by: Fabian Stäber <fabian@fstab.de>
Signed-off-by: Fabian Stäber <fabian@fstab.de>
Signed-off-by: Fabian Stäber <fabian@fstab.de>
Signed-off-by: Fabian Stäber <fabian@fstab.de>
Signed-off-by: Fabian Stäber <fabian@fstab.de>
@fstab fstab force-pushed the jvm-metrics-can-be-synchronous branch from af98fae to 8b5d6a5 Compare April 7, 2022 20:22
@fstab
Copy link
Member Author

fstab commented Apr 7, 2022

Fixed the make table-check error and rebased to main.

Copy link
Contributor

@jamesmoessis jamesmoessis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support this and agree that sync vs async instruments are an implementation detail and the semantic convention doesn't need to specify it. One example that comes to mind is http.server.active_requests. This was listed as async in the semantic convention, but I can imagine many instrumentations use a synchronous instrument to record it depending on implementation.

@reyang
Copy link
Member

reyang commented Apr 8, 2022

There is a transient failure which was caused by the web.archive.org maintenance ...

@fstab I think you've done your part, let's wait and I'll trigger the CI again once web.archive.org has recovered.

image

@reyang reyang closed this Apr 8, 2022
@reyang reyang reopened this Apr 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions spec:metrics Related to the specification/metrics directory
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants