Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

B3 propagation does not handle ParentSpanId #236

Closed
toumorokoshi opened this issue Oct 24, 2019 · 33 comments · Fixed by #286
Closed

B3 propagation does not handle ParentSpanId #236

toumorokoshi opened this issue Oct 24, 2019 · 33 comments · Fixed by #286
Assignees
Labels
bug Something isn't working sdk Affects the SDK package.
Milestone

Comments

@toumorokoshi
Copy link
Member

Currently the b3 propagator does not handle the ParentSpanId. this is a field that does not exist in w3c tracecontext:

https://github.com/openzipkin/b3-propagation#parentspanid

I believe to do this properly, we need to:

  1. generate a single string version of the headers and add it into tracestate under the "b3" key.
  2. inject the values by extracting them from the tracestate.

Thoughts?

@toumorokoshi toumorokoshi added bug Something isn't working sdk Affects the SDK package. labels Oct 24, 2019
@c24t c24t added this to the Alpha v0.3 milestone Oct 31, 2019
@ocelotl
Copy link
Contributor

ocelotl commented Nov 9, 2019

Would it make sense to have something like B3SpanContext, who would inherit from SpanContext and would have an additional attribute named parentspanid? The same approach could be applied to support future implementations of propagators.

@toumorokoshi
Copy link
Member Author

@ocelotl what's the motivation for that vs including it in the tracestate? I believe the purpose of tracestate is specifically to support vendor or other implementations:

https://www.w3.org/TR/trace-context/#tracestate-header

The con I see for extending the class is multiple different SpanContext implementations that expose different fields. Then whatever exporter or system that consumes that data would have to be aware of the existing of that field.

But I'd love to hear someone more involve in tracecontext's perspective. Maybe @reyang?

@ocelotl
Copy link
Contributor

ocelotl commented Nov 9, 2019

@toumorokoshi Oh, there is no specific motivation, I'm just understanding this 😅

Ok, so you suggest something like this?

span_context = some_b3_format_object.extract(...)
span_context.tracestate == {
    "x-b3-parentid": "someparentid"
}

Maybe is better to use a more specific key in tracestate (like x-b3-parentid, instead of just b3) in case the B3 specification adds another field later and we would want to follow the same approach for it.

@ocelotl
Copy link
Contributor

ocelotl commented Nov 9, 2019

... or maybe x-b3-parentid@b3 would be a even better key?

@toumorokoshi
Copy link
Member Author

I'd probably suggest just writing out a single key for b3 in general, so maybe just call it "b3"?

Maybe worth talking to opentelemetry-js folks. A quick scan and I think only they've implemented b3 header propagation: https://github.com/open-telemetry/opentelemetry-js/blob/19756471a629a6b9f083ab8c3c5c95495c3fbffb/packages/opentelemetry-core/src/context/propagation/B3Format.ts

I don't think it's necessarily explicit, but it is implied a single vendor should use a single key here: https://www.w3.org/TR/trace-context/#combined-header-value

@toumorokoshi
Copy link
Member Author

to be clear: I was suggesting calling it "b3" and then encoding the parent-id. But I think as this is just an implementation detail under the hood and clashes are rare, you could probably just call it x-b3-parentid or b3-parentid.

I don't see a lot in the tracecontext spec as how to actually encode values. I might suggest grabbing someone on the community gitter to voice their opinion.

@Oberon00
Copy link
Member

Would this be solved if the injector got a Span instead of a SpanContext as input?

ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Nov 12, 2019
@ocelotl
Copy link
Contributor

ocelotl commented Nov 12, 2019

@Oberon00 thanks for the suggestion, but how would that solve the issue? If I understand this correctly, we would still need to use the span's context in inject to get the necessary attributes. Please correct me if I am not understanding this right.

I added an implementation in #286, it is just my attempt to code the original idea of @toumorokoshi 😄

@Oberon00
Copy link
Member

Your PR handles extraction only. The more interesting problem is injection IMHO. But tbh, from the "spec" at https://github.com/openzipkin/b3-propagation#parentspanid I don't really get what the parent span ID is supposed to be.

Ah, upon reading https://github.com/openzipkin/b3-propagation#why-is-parentspanid-propagated, it seems that actually we currently handle the b3-span-id wrong: Span ID is supposed to become the ID of the child span, if I'm reading this correctly.

@Oberon00
Copy link
Member

Oberon00 commented Nov 12, 2019

From reading this, it seems that the OpenTelemetry trace propagation model is incompatible with the B3 model.

@ocelotl
Copy link
Contributor

ocelotl commented Nov 12, 2019

Hello

@Oberon00 #286 includes this. Is something else needed to make the PR handle inject too?

On the other hand, if you think the B3 model is incompatible with OpenTelemetry, should this issue be closed?

@Oberon00
Copy link
Member

Sorry, I missed that line. But still, you can't just copy that header through.
If I understand the B3 spec correctly, the B3 parent id is just like the W3c https://www.w3.org/TR/trace-context/#parent-id. The B3 span ID, on the other hand is a request to give the child span this ID. If we wanted to implement this, we would need to change

to check the parent span context for a B3 span ID and use that instead of generating a new one if present.

Please read https://github.com/openzipkin/b3-propagation#why-is-parentspanid-propagated! I'd like a second opinion if I'm understanding this correctly.

@toumorokoshi
Copy link
Member Author

Looking through the specs a bit, I don't think the b3/parent-span-id is equivalent to tracecontext.parent-id. Since the ParentSpanId represents the parent of the SpanID that was propagated, we don't really need to do anything with it: in the context of our request, the ParentSpanId is actually our grandparent. SpanID is our parent, so we should set the SpanID as the ParentSpanID header.

So effectively the behavioral change we should perform is: the extracted SpanID should be injected in as the ParentSpanID for client requests. I believe we can ignore the extracted ParentSpanId header in it's entirety.

As an approach there, I'd suggest:

  1. extract span_id and set it in tracestate as the parentspanid. As tracestate gets propagated through to children, it'll end up in client spans.
  2. inject the parentspanid stored in tracestate as "ParentSpanId".

That all said I think there's a bit of ambiguity in the b3 spec around it's purpose. We probably should still wrangle someone closer to the specification to clarify, I'll try to find someone too.

ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Nov 12, 2019
@ocelotl
Copy link
Contributor

ocelotl commented Nov 12, 2019

Ok, I also think X-B3-ParentSpanId is not equivalent to W3CTraceContext/parent-id. I think X-B3-SpanId is equivalent to W3CTraceContext/parent-id. It is mentioned here:

This is the ID of this request as known by the caller (in some tracing systems, this is known as the span-id, where a span is the execution of a client request). (emphasis is mine).

Now, I think @Oberon00 raises a valid point by warning that this:

would need to be changed to avoid generating a new span-id (is this why you think B3 is not compatible with OpenTelemetry, @Oberon00?). This is explicitly mentioned here: The difference is that B3 uses the same span ID for the client and server side of an RPC... (emphasis is mine).

@toumorokoshi suggests extracting span-id and storing it in tracestate as x-b3-parentspanid. I think that is an approach consistent with OpenTelemetry, but it would make it necessary to generate a new span-id (in the child) to take place of the span-id that was extracted (from the parent). That seems to be precisely the behavior of a non-B3 propagation format as it is mentioned here:

Some propagation formats look similar to B3, but don't propagate a field named parent. Instead, they propagate a span ID field which serves the same purpose as ParentSpanId. Unlike B3, these systems use a different span ID for the client and server side of an RPC. When a server reads headers like this, it is expected to provision a new span ID for itself, and use the one it extracted as its parent.

Here's an example of an alternate library composing a trace context with incoming http request headers and an ID generator:

                           ┌───────────────────┐
 Incoming Headers          │   TraceContext    │
┌───────────────────┐      │ ┌───────────────┐ │
│ XXXX-TraceId      │──────┼─┼> TraceId      │ │
│                   │      │ │               │ │
│ XXXX-SpanId       │──────┼─┼> ParentSpanId │ │
└───────────────────┘      │ │               │ │      ┌──────────────┐
                           │ │  SpanId      <┼─┼──────│ ID Generator │
                           │ └───────────────┘ │      └──────────────┘
                           └───────────────────┘

So, I think @toumorokoshi approach is considered by B3 to be "non-B3" (since it would require to generate a span-id). Maybe this is fine, as it is being implemented into OpenTelemetry, but I'd rather ask first for your thoughts before implementing @Oberon00, @toumorokoshi.

Thanks!

@SergeyKanzhelev
Copy link
Member

I'd also suggest to get some inspiration from https://github.com/openzipkin/b3-propagation/blob/master/RATIONALE.md

@Oberon00
Copy link
Member

Oberon00 commented Nov 13, 2019

@ocelotl

would need to be changed to avoid generating a new span-id (is this why you think B3 is not compatible with OpenTelemetry, @Oberon00?)

Exactly: The B3 spec has this example that I think cannot be reproduced with just a custom B3 TextFormat in OpenTelemetry:

Here's an example of a B3 library extracting a trace context from incoming http
request headers:

                           ┌───────────────────┐
 Incoming Headers          │   TraceContext    │
┌───────────────────┐      │ ┌───────────────┐ │
│ X-B3-TraceId      │──────┼─┼> TraceId      │ │
│                   │      │ │               │ │
│ X-B3-ParentSpanId │──────┼─┼> ParentSpanId │ │
│                   │      │ │               │ │
│ X-B3-SpanId       │──────┼─┼> SpanId       │ │
│                   │      │ └───────────────┘ │
└───────────────────┘      └───────────────────┘

@Oberon00
Copy link
Member

Come to think of it, the OpenTelemetry SpanContext doesn't even have a parent ID.

@toumorokoshi
Copy link
Member Author

Discussing in the sig briefly with @Oberon00 and @carlosalberto, we came up with the following:

  1. The ParentSpanId does not need to be extracted. opentelemetry does not use it at the moment, and it does not need to be propagated or otherwise operated on afterward.
  2. The HTTPTextFormat and Propagator API needs to consume not a SpanContext, but a Span. This is to give the HTTPTextFormant and Propagator access to the parent span (if one exists).
  3. The injector SHOULD inject the ParentSpanId. Once Update README.md #2 is complete, injectors will be able to access the parent, and ParentSpanId should be that parent (if it exists)

@toumorokoshi
Copy link
Member Author

Okay sorry, I just read through the new updates, which makes me think that my reasoning here is wrong. I was unaware of the choice os B3 to use the SAME SpanId for both client and server spans.

I think you're right that this is fundamentally incompatible with OpenTelemetry, which I think could be problematic. At least today, all of our integrations proceed to immediately create a new span with the extracted span as the parent. I guess I need to understand B3 a little bit better, but it's not clear to me whether propagators should do the same.

@Oberon00
Copy link
Member

If I understood @carlosalberto correctly, the Span-Id will always be one of a span that was already started on the other side (the client span). So if we just use that as a parent, we will not create the trace that we would get with a proper B3 implementation, but a sensible trace nevertheless.

@Oberon00
Copy link
Member

Oberon00 commented Nov 15, 2019

By the way, before implementing

The HTTPTextFormat and Propagator API needs to consume not a SpanContext, but a Span.

there is a case where we would need to extract the B3 span ID: Namely when extracting a B3 span context from the wire and immediatelly injecting it again without creating any intermediate span (transparent proxy). I think we might want to have a forward API in addition to inject and extract that behaves like the old inject, i.e. takes a SpanContext (or it could actually take an incoming request, a getter, an outgoing request and a setter and do all at once without actually creating a SpanContext).

@ocelotl
Copy link
Contributor

ocelotl commented Nov 18, 2019

Hello everybody

I think it is better to make a decision before making an implementation here. As I understand this right now, we have 2 choices: either we don't handle parentspanid at all or we handle it in a way that is not B3 compliant. I'm not sure if it is a good idea to do the latter, because having a wrong implementation can be worse than having no implementation at all.

It should also be considered if it is wise to keep an implementation of B3 that does not handle parentspanid because that can also be considered a wrong implementation.

Please share any comments or ideas that you have about this issue that can help make a decision on what to do with it.

Thanks!

@toumorokoshi
Copy link
Member Author

I think the issue is not so much in the handling of parentid as it is the issue that currently, we always create new spans after we extract. I was actually looking through tracecontext spec and I don't believe anything actually requires we create a new spanid for the server span. Although I guess I'm not sure how consumers are supposed to deal with that.

The test cases (https://github.com/w3c/trace-context/blob/master/test/test.py) actually lead me to believe that we could actually not create new span ids and the tracecontext propagation would work as intended. Is there any documentation that states we do need to create a new spanid for the server span? If not we could remove it and enable b3 propagation compatibility that way.

@Oberon00
Copy link
Member

When we create a new span-ID (= a new Span) maybe goes beyond the W3C-spec, but my understanding is that in OpenTelemetry, client and server should have separate spans. Existing semantic conventions assume that at least.

@toumorokoshi
Copy link
Member Author

toumorokoshi commented Nov 19, 2019 via email

@Oberon00
Copy link
Member

Well, the existence of the SpanKind field pretty much implies that OpenTelemetry is designed to create different spans for client and server.

@Oberon00
Copy link
Member

Oberon00 commented Nov 19, 2019

One example where it is explicitly spelled out is gRPC (https://github.com/open-telemetry/opentelemetry-specification/blob/27b738b74eeb10560dc0308554a1d626cb93df79/specification/data-rpc.md#grpc):

Implementations MUST create a span, when the gRPC call starts, one for client-side and one for server-side.

To be honest, the B3 semantics seem quite exotic to me. It would have never occurred to me that client and server would share a span, so I also operated under that assumption when I updated the HTTP semantic conventions.

@codeboten
Copy link
Contributor

Has this conversation started outside of the Python SIG? It definitely impacts all other languages.

@toumorokoshi
Copy link
Member Author

not yet but agreed, I'll author the ticket now.

@toumorokoshi
Copy link
Member Author

^ 💥 . It'll be confusing, but I think we need to continue this particular conversation there. I'll mention people in.

@toumorokoshi
Copy link
Member Author

So the consensus of open-telemetry/opentelemetry-specification#359 is to not honor the B3 standard of consuming client span ids as server span ids. So that is not an issue that needs to be resolved.

In that situation, I don't feel like there's an appropriate value of ParentSpanID to propagate. Enumerating options:

  1. set ParentSpanID as the id of the parent span of the current span. Integrations create new spans on the client, so the parent is probably either an intermediate span, or the newly created server span. I'm not sure what a consumer will do with that information.
  2. set the spanid of the extracted value as the parentspanid during injection. This would ensure that the relationship between the client span and server span is similar to how it would be if B3 headers were implemented properly, but with a caveat that it is not entirely accurate, as a span was created in the middle that is not referenced.

I'll personally vote for #1, as it would be better than an ultimately incorrect representation that skips a span or two in the middle potentially.

@codeboten
Copy link
Contributor

Option #1 is following the alternate approach offered by the B3 spec, I think it makes sense to follow it.

                           ┌───────────────────┐
 Incoming Headers          │   TraceContext    │
┌───────────────────┐      │ ┌───────────────┐ │
│ XXXX-TraceId      │──────┼─┼> TraceId      │ │
│                   │      │ │               │ │
│ XXXX-SpanId       │──────┼─┼> ParentSpanId │ │
└───────────────────┘      │ │               │ │      ┌──────────────┐
                           │ │  SpanId      <┼─┼──────│ ID Generator │
                           │ └───────────────┘ │      └──────────────┘
                           └───────────────────┘

@ocelotl
Copy link
Contributor

ocelotl commented Dec 2, 2019

First option is not to be confused with #1 😄

ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Dec 3, 2019
ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Dec 6, 2019
ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Dec 7, 2019
ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Dec 11, 2019
ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Dec 12, 2019
ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Dec 12, 2019
@c24t c24t modified the milestones: Alpha v0.3, Alpha v0.4 Dec 16, 2019
ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Dec 16, 2019
ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Jan 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working sdk Affects the SDK package.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants