Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

B3 consistent handling of debug flags and parent span id #1004

Closed
mwear opened this issue Sep 25, 2020 · 10 comments · Fixed by #1045
Closed

B3 consistent handling of debug flags and parent span id #1004

mwear opened this issue Sep 25, 2020 · 10 comments · Fixed by #1045
Labels
release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs spec:context Related to the specification/context directory

Comments

@mwear
Copy link
Member

mwear commented Sep 25, 2020

What are you trying to achieve?

Consistency in OTel around handling B3 debug flags and parent span id.

Additional context.

B3 context propagation brings some zipkin concepts that do not map cleanly onto OpenTelemetry, namely, the debug trace flags, and propagation of parent span id (in addition to the current span id). There are reasonable solutions to both of these issues, we just need to agree on an approach.

Trace Flags

There is a single trace flag, which signals that a debug trace should be collected. This does not match the trace flags that we store on a span context. I'd propose we workaround this as follows:

  • On extract if a debug trace flag is found, convert it to a sampled flag
  • On inject propagate a sampled flag instead of the debug trace flag

This means an OTel app would not continue to propagate the debug trace flags.

Alternative:
It would be possible to preserve the debug trace flag by storing additional B3 metadata in an extracted context. It would allow us to propagate what was received. Since OTel doesn't support a debug flag natively, we'd probably still have to convert it to a sampled flag internally.

Parent Span ID

B3 propagates a parent span id in addition to the span id of the current operation. This effectively makes it a grand parent id when it's extracted. This doesn't fit cleanly on the span context.

According to the B3 spec, it's optional, so we could omit altogether.

Alternatively we could:

  • On extract we could store it as additional b3 metadata on the context. We would need to preserve it for the rare case where a span was not started in process, but b3 was still injected into an outgoing request.
  • On inject propagate the current in process span id if one is present, otherwise, propagate the extracted parent id.

Summary

It seems like there are two pretty clear paths forward, we just need to pick one. We can

  • Convert and omit concepts that do not map over cleanly to OTel
  • Preserve B3 semantics by storing additional metadata in an extracted context.

Does anyone have any opinions, or other approaches we should consider?

@mwear mwear added the spec:context Related to the specification/context directory label Sep 25, 2020
@andrewhsu andrewhsu added the release:after-ga Not required before GA release, and not going to work on before GA label Sep 25, 2020
@andrewhsu
Copy link
Member

from the issue triage mtg today with TC, moving this to after ga. @mwear if you think this should be an editorial change before ga, please let us know.

@mwear
Copy link
Member Author

mwear commented Sep 25, 2020

Thinking about this a little more. I don't think that we can propagate a parent span id with the spec in its current form, as there is not a parent span id reader on anything other than span data. If we decide that we are going to omit parent span id from B3 context propagation, then we can push this out. If we want to propagate it, it will require an API change.

@mwear
Copy link
Member Author

mwear commented Sep 25, 2020

The python SIG had a lengthy discussion about propagating parent span id: open-telemetry/opentelemetry-python#236 and ultimately decided not to propagate it.

@Oberon00
Copy link
Member

This seems to be closely related to #1007 (and #359, #367)

@mwear
Copy link
Member Author

mwear commented Sep 28, 2020

The issues point at the reason for why zipkin supports propagating a parent span id. The tl;dr is that Zipkin allows the client and server spans of a request to share the same span id. OpenTelemetry explicitly requires each span to have a unique id, so this is not functionality that we need to support. As a result, we can probably safely ignore propagating parent span id.

The second point is forcing a debug trace, which is another Zipkin feature that OpenTelemetry does not support. I think we can probably safely convert a debug flag into a sampled trace flag. Do people agree with this?

@MrAlias
Copy link
Contributor

MrAlias commented Sep 29, 2020

Alternative:
It would be possible to preserve the debug trace flag by storing additional B3 metadata in an extracted context. It would allow us to propagate what was received. Since OTel doesn't support a debug flag natively, we'd probably still have to convert it to a sampled flag internally.

This is what the Go implementation does:

https://github.com/open-telemetry/opentelemetry-go-contrib/blob/71b6d7fc42c235ea005fde85c23fb84df36d3e60/propagators/b3/b3_propagator.go#L337-L338

https://github.com/open-telemetry/opentelemetry-go-contrib/blob/71b6d7fc42c235ea005fde85c23fb84df36d3e60/propagators/b3/b3_propagator.go#L211-L219

On inject propagate a sampled flag instead of the debug trace flag

I'm not sure we could say we implement the B3 specification if we choose to do this. It states ...

Debug implies an accept decision, so don't also send the X-B3-Sampled header.

Which means we would be doing the opposite. Additionally, systems often use this propagation scheme as a troubleshooting aid. Meaning that spans can take different forms based on this flag and usually report Span.debug = true for each span in the trace.

@mwear
Copy link
Member Author

mwear commented Sep 29, 2020

Based on the feedback here and from the spec SIG we should:

  • Omit parent span id when propagating B3.
  • Preserve and propagate the debug flag as received. Internally, convert debug to a sampled trace flag so that we record the related spans.

I'll update the spec the reflect these decisions.

@anuraaga
Copy link
Contributor

anuraaga commented Oct 1, 2020

Sorry for the late comment - just curious why does "OpenTelemetry explicitly requires each span to have a unique id"? There doesn't seem to be anything in the API, propagation, exporter concepts where this wouldn't work and could be left up to backends. Is there a reason for explicitly disallowing it?

@mwear
Copy link
Member Author

mwear commented Oct 2, 2020

I should clarify, that I believe the spec indicates a spans should have unique ids within a given trace.

The spec mentions that a span has:

An immutable SpanContext that uniquely identifies the Span

I interpret this to mean if two spans share a trace id, then they must have separate span ids. It's possible I'm reading into this too deeply. Although, I could see this being a requirement for some tracing backends. It wouldn't be easy to work around duplicate span ids for the same trace unless a backend has explicit support for it. It also complicates context propagation, as mentioned in this issue, supporting shared span ids means that a parent id needs to be propagated in addition to the current span id. While this is allowed in the B3 spec, it's not part of W3C trace context.

@carlosalberto carlosalberto added release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs and removed release:after-ga Not required before GA release, and not going to work on before GA labels Oct 27, 2020
@carlosalberto
Copy link
Contributor

Marking this as Allowed-for-GA was we will clarify this part as part of the existing related #1045 PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs spec:context Related to the specification/context directory
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants