Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify that attribute keys are unique in collections #2248

Merged
merged 2 commits into from
Jan 26, 2022

Conversation

tigrannajaryan
Copy link
Member

Attributes keys must be unique. The key/value pair collections in the specification was always intended to model a map.

There was a recent confusion about this. This change clarifies the spec. This is an editorial change. It does not change any semantics.

Resolves #2245

@tigrannajaryan tigrannajaryan requested review from a team January 6, 2022 16:31
@yurishkuro
Copy link
Member

Who is supposed to be responsible for the uniqueness? We cannot impose requirements on the users of the API who can call setAttribute twice with the same key. And streaming implementations of the API cannot enforce key uniqueness (e.g. via last-wins) because they do not hold state.

@tigrannajaryan
Copy link
Member Author

Who is supposed to be responsible for the uniqueness? We cannot impose requirements on the users of the API who can call setAttribute twice with the same key. And streaming implementations of the API cannot enforce key uniqueness (e.g. via last-wins) because they do not hold state.

I agree, users should't be required to do that. The Trace API already enforces the behavior:

Setting an attribute with the same key as an existing attribute SHOULD overwrite the existing attribute's value.

My understanding of this clause is that the uniqueness is enforced (perhaps eventually, but nevertheless). Streaming implementations don't necessarily need to hold state on the sender side to enforce this. They can define the semantics of the receiving side rules in a manner that results uniqueness of attributes from receiver's perspective (the "overwrite if exists" rule can be applied at the receiver end).

I was not able to find an equivalent clause in the Metrics API. It should be probably added.

@carlosalberto
Copy link
Contributor

@yurishkuro Please take a look again.

@yurishkuro
Copy link
Member

As I already commented, I do not understand what this description refers to. Something needs to enforce this uniqueness. It can be users of the API, the implementation of the API (i.e. SDK), or the backend. OTel cannot impose requirements on the users or the backends, and SDKs may not be able to enforce uniqueness (e.g. streaming API). So what is this clause referring to?

@tigrannajaryan
Copy link
Member Author

As I already commented, I do not understand what this description refers to. Something needs to enforce this uniqueness. It can be users of the API, the implementation of the API (i.e. SDK), or the backend.

@yurishkuro My reading of the current requirement is that it is the responsibility of the SDK or of the exporter to enforce uniqueness.

If it is not clear we should clarify it, but I think it can be a separate PR. This PR does not contradict that, this PR merely sets the broader expectation from the semantics perspective.

@yurishkuro
Copy link
Member

The current req says "SHOULD overwrite", so uniqueness is not guaranteed. And that's my concern with the wording in this PR is that it gives a misleading impression that it is guaranteed / enforced.

@tigrannajaryan
Copy link
Member Author

The current req says "SHOULD overwrite", so uniqueness is not guaranteed. And that's my concern with the wording in this PR is that it gives a misleading impression that it is guaranteed / enforced.

That's a good point. I think the only reason it is a SHOULD and not a MUST is because an alternate strategy is possible (drop duplicates), not because duplicates are considered acceptable. I am fine with changing it to MUST or calling our more clearly the uniqueness requirement (or just linking to the wording introduced by this PR).

Again, I think this PR is more fundamental and it reflects our intent. If the existing spec wording contradicts with this PR then that wording needs to change not our guarantee of uniqueness which this PR merely states explicitly.

@yurishkuro
Copy link
Member

yurishkuro commented Jan 12, 2022

Again, I think this PR is more fundamental and it reflects our intent.

I guess this is where we disagree. First, there is no fundamental problems for the backends (at least tracing backends) to support multiple attributes with the same key (Jaeger always allowed that), and I see it as a function of you-get-back-what-you-put-in, i.e. the users are in control of uniqueness. Second, I think there's definitely room for strict maps, but only in situations with atomic writes to the API. E.g. adding an event to span is atomic, you can't amend the event afterwards, and that gives a clear choke point to enforce uniqueness. Logging API is similar (if we had one), metrics API probably same as well. However, the Span API allows multiple writes, so as long as the user can call span.setAttribute(k, v) multiple times, they should not expect some magic deduping by default (although it can be opt-in at some point in the pipeline).

Concrete example: Canopy at Meta supports block.annotate(props) API (block == span). Props is a map so uniqueness is enforced in this atomic write, but annotate can be called multiple times, possibly with conflicting keys. The SDK implementation is streaming, i.e. these writes are flushed independently, no state is stored in the block/span. The storage accepts all writes as is, without fiddling with the data. So the only point to enforce uniqueness (if desired) is in the trace reader when higher-level data model is returned inflated from the raw events.

jmacd
jmacd previously requested changes Jan 13, 2022
specification/common/common.md Outdated Show resolved Hide resolved
@jmacd
Copy link
Contributor

jmacd commented Jan 13, 2022

@yurishkuro I think you can have your interpretation inside a streaming SDK without changing the spec from Tigran's position.

OTel added an array-of-strings attribute so that you could have a multi-valued attribute. If the application intends to report a multi-valued attribute, it should built the values in memory and set them as an attribute. If you want to update the attribute to a new set of values through the span's lifetime, that also works, but the set of attributes remains a map and your attribute has multiple string values. The application is required to keep its working set in memory if it wants to change the list of values for an attribute, since it has to update them all in a single call.

The Metrics data model defines more than one way to model data, and there is an analogy to draw with tracing. Metrics defines an event model, wherein each call to a synchronous instrument is an independently observable event. A streaming metric SDK could report each event as an individual (hey let's use column compression for this!) and if you're so inclined you could see every change to a counter this way. The data model defines a "stream" model which eliminates the individual events and aggregates them together. You can no longer see the individual updates.

For a streaming trace SDK, I would recommend you consider the SetAttribute() as an immutable event. Considering a string-valued attribute being set multiple times, you will see one event for every modification to the attribute. When all the events are collected, you will see an attribute that was modified several times. But at each point in time, semantically, the attribute has a single value. The consumer of your streaming SDK may see a timeline of updates to the event, and you can show all the values that were set, but I would say it's off-spec to consider the set of values that were set to be a multi-valued map.

@tigrannajaryan
Copy link
Member Author

But at each point in time, semantically, the attribute has a single value.

I think this is key observation. I intentionally avoid specifying who and how enforces these semantics because there are many different places and ways to achieve it. As long as we explicitly tell that it is a requirement how exactly it is achieved can and should be left undefined to give enough leeway for implementations to do the best they can.

I will follow up with a PR for OTLP in the proto repo, specifically requiring that senders must not have duplicates in the attributes list.

Other implementations using other protocols (e.g stream SDK with an exporter that uses an event-based protocol) can move the uniqueness enforcement requirement to the receiver/backend.

@yurishkuro
Copy link
Member

I mostly agree with what @jmacd is saying. What I see problematic is that this is clearly a very nuanced topic, as evidenced by this discussion, but the spec change makes a blunt statement that pretends that these nuances do not exist. If I were reading this spec for the first time, I would still have the exact same questions I raised here, because the wording does not answer any of them.

@tigrannajaryan
Copy link
Member Author

I mostly agree with what @jmacd is saying. What I see problematic is that this is clearly a very nuanced topic, as evidenced by this discussion, but the spec change makes a blunt statement that pretends that these nuances do not exist. If I were reading this spec for the first time, I would still have the exact same questions I raised here, because the wording does not answer any of them.

This is a great feedback. Let me do another round and see if I can add more clarity and capture these nuances.

@tigrannajaryan
Copy link
Member Author

@yurishkuro @jmacd I added some wording to address your comments. Please have another look and see if it helps.

@tigrannajaryan tigrannajaryan requested a review from jmacd January 20, 2022 01:26
@tigrannajaryan
Copy link
Member Author

CI failure doesn't seem to be related to my change in any way.

@tigrannajaryan tigrannajaryan dismissed jmacd’s stale review January 24, 2022 18:06

Reworded, should be OK now. Please review again.

@tigrannajaryan
Copy link
Member Author

@jmacd PTAL.

@carlosalberto
Copy link
Contributor

Tried to merge but it won't allow me, as there's one outdated unresolved conversation - which cannot be found now, hence cannot be resolved.

Wondering if anybody has experienced this feature before?

Attributes keys must be unique. The key/value pair collections in the specification
was always intended to model a map.

There was a recent confusion about this. This change clarifies the spec.

Resolves open-telemetry#2245
@tigrannajaryan tigrannajaryan merged commit abc1f97 into open-telemetry:main Jan 26, 2022
@tigrannajaryan tigrannajaryan deleted the unique-attr-keys branch January 26, 2022 21:47
carlosalberto pushed a commit to carlosalberto/opentelemetry-specification that referenced this pull request Oct 31, 2024
…#2248)

Attributes keys must be unique. The key/value pair collections in the specification
was always intended to model a map.

There was a recent confusion about this. This change clarifies the spec.

Resolves open-telemetry#2245
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Specify whether keys in attribute lists are expected to be unique
9 participants