Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define how providers handle identical creation calls #4146

Closed
wants to merge 20 commits into from

Conversation

pellared
Copy link
Member

@pellared pellared commented Jul 15, 2024

Towards open-telemetry/opentelemetry-go#3368

Why why

Most of the SDKs ignore instrumentation scope attributes (C++ and PHP are exceptions). This is a functional blocker for following issues:

Why

The specification does not explicitly stay how the SDK should handle tracer/meter/logger creation when the user calls it with the same identifying parameters (name, version, scheme_url), but when other parameters are different (currently only attributes).

Because of this the Go SDK ignores instrument scope attributes (see: open-telemetry/opentelemetry-go#3368).
I think the same is true for Java SDK, .NET SDK, Python SDK, JS SDK.

C++ SDK creates only a single tracer for many "identical" creation calls (the instrumentation scope attributes value comes from the first "identical" tracer creation call).

PHP SDK creates always a new instance of a tracer for many "identical" creation calls (with potentially different instrumentation scope attributes).

Right now, the specification does not require the SDK to handle scope attributes. Currently, having them ignored is specification compliant.

What

  • SDK MUST emit instrumentation scope attributes that were passed during first "identical" instrumentation creation (which is a correct way of using the API)
  • SDK handles next "identical" instrumentation creations in the way they want (e.g. return always a new instrument instance like in PHP, JS or return the previously created instrument like in C++, Go). However, it MUST be documented that this behavior may change in future releases (which is currently seen as incorrect usage of the API - still, in future we may see it as a valid usage of the API).

I think that we do NOT need to emit warnings when user creates instruments with the same identifying parameters (name, version, scheme_url), but with different attributes until we feedback that such issue occurs frequently enough.

The most important functional addition to SDK spec is

The first identical INSTRUMENT MUST be associated with all (both identifying and non-identifying) passed parameters.

@pellared pellared changed the title Scope attrs Define how providers handle identical creation calls Jul 15, 2024
@pellared pellared marked this pull request as ready for review July 15, 2024 19:51
@pellared pellared requested review from a team July 15, 2024 19:51
@reyang
Copy link
Member

reyang commented Jul 15, 2024

@pellared
Copy link
Member Author

pellared commented Jul 15, 2024

@reyang, @jmacd, can you double-check the changelog changes?

could you update the compliance matrix?

I see that there are entries for it.

Get a Tracer with scope attributes
get_meter accepts attributes.
LoggerProvider.Get Logger accepts attributes

I see that according to the compliance matrix PHP and C++ SDKs are supposed to support this scenario.

@open-telemetry/cpp-maintainers, @open-telemetry/php-maintainers, maybe you can take a look if my proposal is complaint with your SDKs existing behavior?

EDIT:
I think that C++ SDK would be compliant with my proposal and PHP SDK would NOT.

@yurishkuro
Copy link
Member

when the user calls it with the same identifying parameters (name, version, scheme_url), but when other parameters are different (currently only attributes).

Sorry, I am not following the premise of the problem. The API for getting the instrument does not accept attributes, only the scope & schema. What other attributes are you referring to? And why is it necessary to require that the returned object is "the same" - it sounds like a non-functional, performance-related requirement but it's theoretically possible to provide identical performance even by returning different objects (e.g. unique object per thread).

@pellared
Copy link
Member Author

pellared commented Jul 16, 2024

The API for getting the instrument does not accept attributes, only the scope & schema

They do accept instrument scope attributes.

attributes (optional): Specifies the instrumentation scope attributes to associate with emitted telemetry. This API MUST be structured to accept a variable number of attributes, including none.

why is it necessary to require that the returned object is "the same"

Consistent behavior for handling parameters which are described as "identifying" the instruments.

The term identical applied to Tracer/Meter/Loggers describes instances where all
identifying fields are equal. The term distinct applied to Tracer/Meter/Loggers describes
instances where at least one identifying field has a different value.

@pellared
Copy link
Member Author

SIG meeting notes:

  • There is a confusion why instrumentation scope attributes cannot be part of "identifying parameters ".
  • It should be a SHOULD (not MUST). It think SDKs should document that this is a not supported use case and behavior may change. Also it once mentioned that some SDKs may want to partition the identity to e.g. threads or do other things to reduce the synchronization.
  • (separate PR) The SDK SHOULD warn when the user calls with the same identifying parameters (name, version, scheme_url), but when other parameters are different. The JS would not like it, because they do not want to "track" the created instruments. They always create a new one.

@tigrannajaryan, do you recall some previous conversations and decisions why instrumentation scope attributes cannot be part of "identifying parameters "?

@pellared
Copy link
Member Author

pellared commented Jul 16, 2024

After the SIG meeting, I think it would be better to make smaller steps to make some progress. My proposal is that:

  • SDK MUST emit instrumentation scope attributes that were passed during first "identical" instrumentation creation (which is a correct way of using the API)
  • SDK handles next "identical" instrumentation creations in the way they want (return new unique instance, return the previously created instrument). However, it MUST be documentee that this behavior may change in future releases (which is currently seen as incorrect usage of the API - still, in future we may see it as a valid usage of the API).

I think that we do NOT need to emit warnings when user creates instruments with the same identifying parameters (name, version, scheme_url), but with different attributes until we feedback that such issue occurs frequently enough.

Does it look reasonable?

CC @jack-berg, @jmacd, @reyang, @jsuereth

@pellared pellared marked this pull request as draft July 16, 2024 16:30
@pellared
Copy link
Member Author

I changed the PR (together with its description) based on feedback from the SIG meeting.

specification/logs/sdk.md Outdated Show resolved Hide resolved
specification/metrics/sdk.md Outdated Show resolved Hide resolved
specification/trace/sdk.md Outdated Show resolved Hide resolved
@jmacd
Copy link
Contributor

jmacd commented Jul 23, 2024

Picking up a fragment of an argument that was presented around this topic in last week's OTel Spec SIG:

I had stated a desire to see Scope attributes used, in particular I expect Scope attributes to reduce the cost of repetitive attributes. Where we expect to see the same attribute on a span/metric/log over and over again, we can optimize the cost of telemetry by placing attributes into the scope.

The counter-argument was that it would cause trouble for Prometheus? What should Prometheus do with scope attributes?, goes the question. My opinion is that scope attributes should be elevated to metric attributes, since their purpose is to lower the cost of telemetry. A Prometheus user with an instrumentation library having several instruments may observe themselves repeatedly applying the same attributes to every instrument. In this situation, the Prometheus user can lower the cost of recording metrics and the cost of OTLP transport by adopting a scope attribute instead of repeated metric attributes. If you have an attribute like A=B applied to every metric instrument in an instrumentation library, it would be semantically no change to replace the repeated attribute with the same attribute on the scope.

Working backwards from this position, it seems that SDKs simply need to record scope attributes--whether or not they recognize "identical" calls or not.

@tsloughter
Copy link
Member

Bumping what @pellared asked in an earlier comment, does anyone know why attributes weren't included as identifying if its declared that:

It is a user error to create Tracers with different attributes but the same identity.

That is pretty contradictory, "they are non-identifying, but if you use them as non-identifying it is your fault for doing it wrong and we won't tell you what will happen" :)

@tsloughter
Copy link
Member

Also wanted to give a bigger 👍 than just on @jmacd's comment to scope attributes. I get asked about that a good bit for Erlang/Elixir.

Just realized we meant different scopes, though I get asked about both. "Scope" meaning both Instrumentation Scope attributes and the current context having attributes that get attached -- the latter meaning attributes would go on signal data generated by any Instrumentation Scope that is created using the same context.

@pellared
Copy link
Member Author

pellared commented Jul 24, 2024

why attributes weren't included as identifying

AFAIK because scope attributes were introduced after instrumentation scope name, version, and scheme URL were specified as identifying. As far as I understood from the SIG meeting, adding a new parameter to the identifying parameters list was considered as a breaking change.

Personally, I do not find it as a breaking change. At least from my database experience adding a column to a primary key it is an acceptable change/migration.

However, in order to include instrumentation scope attributes to the identifying parameters list we would need to define the equality of attribute collections (I do not think it is specified). In my opinion, the order or attributes should not affect the equality of attribute ([k1: v1, k2: v2] should be equal to [k2: v2, k1: v1]) (especially that some languages model the attributes as map/hashset).

Still, I was thinking of proposing a "baby-step" first that will not be against the proposal of adding instrumentation scope attributes as identifying later.

CC @tigrannajaryan

@yurishkuro
Copy link
Member

yurishkuro commented Jul 24, 2024

I would prefer to split the discussion away from the concept of "identifying". If a user calls provider.Get("foo", "1.0", {"bar": "baz"}) they expect the attributes to be captured in the telemetry. It has nothing to do with "identifying". If SDK has some magical way of returning the same handle and yet still capturing user data (scope attributes), they can roll with it. They can also return different handles for identical inputs if they have reasons to do so (perhaps the buffering associated with the original handle has expired and was deallocated). I don't see why the spec needs to dictate the behavior in terms of "identity" of the handles, it should only describe the resulting behavior: the user-provided data must be preserved.

Now, there is a special case here for metrics as some of these implementation choices might affect Prometheus aggregations, or the provider could be configured to not include scope attributes in the metrics (technically this is also possible with other signals). But again, it's up to the SDK to decide which handle to return. If it already knows from its internal configuration that scope attributes will not make it to the exported metrics, it can return the same handle and ignore the scope attributes.

@pellared
Copy link
Member Author

@yurishkuro, anything needed to be changed in the PR? I put the "identity" wording as it already exists in the API specs and I tried my best to capture what you described. I am open to suggestions. So far I see no actionable feedback based on comments. I think that some of the feedback is not directly related to proposed changes but the way how "get instrument APIs" are specified.

@yurishkuro
Copy link
Member

yurishkuro commented Jul 24, 2024

@pellared specifically, I would remove, instead of add more, references to "identifying". I think this concept is irrelevant and only creates more confusion, trying to dictate implementation instead of specifying behavior relevant to the user.

@pellared
Copy link
Member Author

pellared commented Jul 25, 2024

I would remove, instead of add more, references to "identifying". I think this concept is irrelevant and only creates more confusion, trying to dictate implementation instead of specifying behavior relevant to the user.

I want to know that the "identifying fields" term is (currently) relevant in duplicate instrument registration section.

In order, to remove "identifying fields", we would need to define equality for Attributes Collection first (so that duplicate instrument registration happens when all passed input is equal).

For now, I am closing this PR as I think the issue needs more discussions. I also think that resolving the issue may involve having more then one PR. I have created #4160 to have discussions in an issue rather than a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants