Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Composite propagators: Clarify the behavior of multiple extractor #496

Closed
rghetia opened this issue Feb 28, 2020 · 26 comments
Closed

Composite propagators: Clarify the behavior of multiple extractor #496

rghetia opened this issue Feb 28, 2020 · 26 comments
Labels
area:api Cross language API specification issue release:after-ga Not required before GA release, and not going to work on before GA spec:context Related to the specification/context directory triage:rejected:declined

Comments

@rghetia
Copy link
Contributor

rghetia commented Feb 28, 2020

In a composite propagator there can be more than one Extractor for trace context. The specification doe not indicate how to handle extraction from two different propagators if both of them extracts valid trace context. In current implementation in Go the second extractor overwrites the first. So the last one wins. There are two problems with this.

  1. The Last trace context is invalid and the first trace context is valid - Even though the first context is valid it will not use it.
  2. Both of them are valid - The last context will be used but that may not be counter intuitive to the user. Also it involves unnecessary processing. Extracting is not cheap.
@dyladan
Copy link
Member

dyladan commented Feb 28, 2020

IMO propagators should be validating extracted contexts to ensure they are valid before setting them on the context. I could see this being important for use cases where the propagation format spec changes however. For instance, if Trace Context releases a version 2, you may want to have a v1 propagator as a fallback if the v2 propagator fails or vice versa.

@Flarna
Copy link
Member

Flarna commented Mar 2, 2020

Maybe we should pass any extracted SpanContext down to the next propagators in chain to allow them to decide if they should extract and overwrite,...?

@Oberon00
Copy link
Member

Oberon00 commented Mar 2, 2020

I think this already happens implicitly because each propagator gets as input the context returned by the previous one. However, there is no way to know for the propagator whether any value left there was set by a another composite propagator component or is the value from the parent context, which should be overwritten.

I think you should only add orthogonal components to an aggregate propagator. A new special ChainedPropagator API is probably required to coordinate multiple propagators competing for the same Context key.

@carlosalberto
Copy link
Contributor

So in theory there should only be a propagator for each cross-cutting concern (tracing, correlation context), so they work on only one 'slot' of the Context.

What you mention has been known as 'fallback' propagators, as a stack-alike Propagator that contains different propagators for the same concern. An example would be a fallback Propagator that contains both a TraceContext and a B3 propagator, acting as a single propagator.

In such case, it would work like:

extract(ctx, carrier)
  for propagator in contained_propagators
    new_ctx = propagator.extract(ctx, carrier)
    if tracing_api.get_span_context(new_ctx) is not null
      return new_ctx
...

tracing_propagator = create_fallback_propagator(trace_context_propagator, b3propagator)

I'm wondering if we should add a section on this case (probably yes).

@Flarna
Copy link
Member

Flarna commented Mar 2, 2020

As setting of propagators is possible via API which may have a lot independent clients it's maybe hard to ensure that there are no duplicates. e.g. HttpTextPropagator seem to be reused in grpc (and maybe other protocols having text header support) so each OTel plugin may add one instance of such a processor.

@rghetia
Copy link
Contributor Author

rghetia commented Mar 2, 2020

A composite extractor with a policy would work to satisfy all use cases. For example,

type ExtractPolicy int

const {
    StopOnValid
    ExtractAll
}

type struct CompositeExtractorWithPolicy {
    extractors []*extractor
    policy ExtractPolicy
    extractValdiator func(context.Context) bool
}
func (cewp *CompositeExtractorPolicy) Extract(ctx context.Context, supplier HTTPSupplier) (context.Context) {
    for extractor := range cewp.extractors {
       newCtx := extractor.Extract(ctx, supplier)
       valid := cewp.extractValidator(newCtx)
       if cewp.policy == StopOnValid {
          if valid {
            return newCtx
          }
       }
    }
}

@Flarna
Copy link
Member

Flarna commented Mar 2, 2020

@rghetia But wouldn't this result in a propagator for one concern may stops extraction of another, unrelated concern?

@rghetia
Copy link
Contributor Author

rghetia commented Mar 2, 2020

@rghetia But wouldn't this result in a propagator for one concern may stops extraction of another, unrelated concern?

you just supply two sets for propagator with policy, one for each concern.

@pavolloffay
Copy link
Member

We know that only a single context/key can be extracted. The precedence order should be configurable by the user of the system/SDK. Then if the extraction fails for any reason the SDK could "try again" with the next extractor.

@carlosalberto carlosalberto self-assigned this May 26, 2020
@carlosalberto
Copy link
Contributor

I'm doing related work around this, so will take it from here.

@carlosalberto
Copy link
Contributor

Please see open-telemetry/opentelemetry-java#1339 as a prototype for this.

A new StackPropagator is added (name to be refined) as a custom propagator that takes a list of Trace propagators and, upon extraction, stops when the first successful operation happens, else trying with the next propagator.

Two questions arise:

  1. Do we want this as a general purpose Propagator? I see us only needing this for the many existing tracing formats, but if we need to have a general purpose one, we should provide a policy/function (as the one Rahul mentions) to let the propagator know when to stop iterating over the propagators.

  2. To simplify things, upon failed extraction the propagators won't set an invalid/nil/empty value in Context - this saves us a few allocations happening under failed extraction, but also may allow 'leaking' of lingering, old active Span instances - but this is an error in itself anyway.

Thoughts? cc @dyladan @pavolloffay @Oberon00

@carlosalberto carlosalberto added area:api Cross language API specification issue spec:context Related to the specification/context directory labels Jun 30, 2020
@carlosalberto
Copy link
Contributor

Now that #671 is merged, wondering whether we should close this issue, or should we add such MultiPropagator to the Specification? @Oberon00 @dyladan

@carlosalberto carlosalberto added the release:required-for-ga Must be resolved before GA release, or nice to have before GA label Jul 2, 2020
@Oberon00
Copy link
Member

Oberon00 commented Jul 7, 2020

I think #671 solved the issue I pointed out in my comment #496 (comment). If we think a MultiPropagator is something that is generally useful, I think we should add it to the spec. Trying to specify it in detail might surface more issues.

@carlosalberto
Copy link
Contributor

@Oberon00 For the record: I think specifing a MultiPropagator for tracing should be the way to start, given that this multiple-format support seems to be a tracing-only issue (for now). Any opinion on that?

@Oberon00
Copy link
Member

Oberon00 commented Jul 10, 2020

I'm not sure how the concrete concern plays a role when defining a multi-propagator? I would think of something like

class MultiHttpTextPropagator implements HttpTextPropagator {
  public MultiHttpTextPropagator(Collection<HttpTextPropagator> extractors, Collection<HttpTextPropagator> injectors) { ... };

  @Override
  public <Carrier> void extract(Context target, Carrier c, Getter<Carrier> g) {...} // Call all extractors (unconditionally)
  @Override
  public <Carrier> void inject(Context target, Carrier c, Setter<Carrier> s) {...} // Call all injectors (unconditionally)
}

If we want to not try all extractors everytime we'd just need to pass some Predicate<Context> (or function from old context + new context -> boolean) to the constructor that tells us whether to continue.

@carlosalberto carlosalberto added the priority:p1 Highest priority level label Jul 24, 2020
@andrewhsu
Copy link
Member

Having a look at this P1 issue with @carlosalberto, should this really be a P2 because it looks like an addition to the API, not a breaking change that should go in earlier? Thinks java is the only one that has this atm.

@Oberon00
Copy link
Member

Oberon00 commented Sep 3, 2020

See my comment above #496 (comment):

If we think a MultiPropagator is something that is generally useful, I think we should add it to the spec. Trying to specify it in detail might surface more issues.

Those "more issues" might require breaking changes to fix them.

@andrewhsu
Copy link
Member

From the issue triage mtg today, talked with @carlosalberto and looks like with the PR #671 this issue can now be bumped down to P2 which means it is not necessarily a blocking change for freezing trace api.

@andrewhsu andrewhsu added priority:p2 Medium priority level and removed priority:p1 Highest priority level labels Sep 9, 2020
@jkwatson
Copy link
Contributor

I'm confused if a MultiPropagator is different than the Composite Propagator already in the spec. Is this an additional thing, or a clarification of the existing one?

@Oberon00
Copy link
Member

I think it's about multiple propagators for the same concern. E.g. B3+W3C trace propagators at once.

@jkwatson
Copy link
Contributor

Ah, so Composite is multiple-concerns and Multi is multiple for a single concern. Got it.

@aabmass
Copy link
Member

aabmass commented May 13, 2021

Looking at the Java TraceMultiPropagator, I think there is still an issue.

Upon extraction, this propagator invokes HttpTextFormat#extract() for every registered
trace propagator, returning immediately when a successful extraction happened.

Not all span context propagators encode the full W3C TraceContext, for example Jaeger format doesn't encode tracestate. If the Jaeger propagator extracted first and you immediately return, the tracestate won't be propagated on.

Also, what if the trace flags are not matching e.g. a load balancer before your service decides to set the sampled flag, but only sets it on the Jaeger header. Then if the Jaeger propagator is configured in TraceMultiPropagator to run after the W3C propagator, the sampled flag is lost.

Maybe the TraceMultiPropagator should merge all the resulting span contexts if they have the same trace ID + span ID?

@jkwatson
Copy link
Contributor

Looking at the Java TraceMultiPropagator, I think there is still an issue.

Upon extraction, this propagator invokes HttpTextFormat#extract() for every registered
trace propagator, returning immediately when a successful extraction happened.

Not all span context propagators encode the full W3C TraceContext, for example Jaeger format doesn't encode tracestate. If the Jaeger propagator extracted first and you immediately return, the tracestate won't be propagated on.

Also, what if the trace flags are not matching e.g. a load balancer before your service decides to set the sampled flag, but only sets it on the Jaeger header. Then if the Jaeger propagator is configured in TraceMultiPropagator to run after the W3C propagator, the sampled flag is lost.

Maybe the TraceMultiPropagator should merge all the resulting span contexts if they have the same trace ID + span ID?

TraceMultiPropagator doesn't exist in the 1.x run of Otel java. You want to look at this: https://github.com/open-telemetry/opentelemetry-java/blob/main/context/src/main/java/io/opentelemetry/context/propagation/MultiTextMapPropagator.java#L57-L67

@aabmass
Copy link
Member

aabmass commented May 13, 2021

That just looks like the regular Composite Propagator mentioned in the spec. That has the issue of overwriting still, right?

@jkwatson
Copy link
Contributor

That just looks like the regular Composite Propagator mentioned in the spec. That has the issue of overwriting still, right?

Yes. I just wanted to make sure that the old thing you pointed at wasn't a relevant part of the conversation, since it was deleted before 1.0.0 was released.

@austinlparker
Copy link
Member

If this issue is still relevant, please open a new issue with more detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:api Cross language API specification issue release:after-ga Not required before GA release, and not going to work on before GA spec:context Related to the specification/context directory triage:rejected:declined
Projects
None yet
Development

No branches or pull requests

10 participants