Skip to content
This repository has been archived by the owner on May 23, 2023. It is now read-only.

introduce and take advantage of SpanContext #82

Merged
merged 2 commits into from
Jul 6, 2016
Merged

Conversation

bhs
Copy link
Contributor

@bhs bhs commented Jun 11, 2016

No description provided.

@@ -7,23 +7,17 @@ type contextKey struct{}
var activeSpanKey = contextKey{}

// ContextWithSpan returns a new `context.Context` that holds a reference to
// the given `Span`.
// the given `Span`'s `SpanContext`.
func ContextWithSpan(ctx context.Context, span Span) context.Context {
Copy link
Contributor Author

@bhs bhs Jun 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the proposed semantics allow for Span.SpanContext() to be called after Span.Finish(), I think it's fine to store the Span in the context.Context. If others disagree, we could store the SpanContext here instead, but that limits the users of the API in certain ways.

@yurishkuro
Copy link
Member

Overall, I like the change. However, one of the motivations for re-introducing SpanContext, among many, was that it would provide the datatype we could use to solve opentracing/opentracing.io#28. I don't think the current change does that, because there is no API to deserialize just the SpanContext, without starting a new span.

@lookfwd
Copy link

lookfwd commented Jun 12, 2016

I like everything that weakens the Finish() semantics. I think I would like to see some form of diagram that demonstrates this change. The idea is that you have part of the Span API that outlives a Span? The part of the Span API that moves to SpanContext regards baggage and parent-ship. Is there any point in time after which changes to a SpanContext become undefined behaviour? e.g. if I Inject a context and then add extra Baggage to it, what happens?

image

@bhs
Copy link
Contributor Author

bhs commented Jun 12, 2016

@yurishkuro re:

one of the motivations for re-introducing SpanContext, among many, was that it would provide the datatype we could use to solve opentracing/opentracing.io#28. I don't think the current change does that, because there is no API to deserialize just the SpanContext, without starting a new span.

This change leaves the door open to some simplifications for the API. The data in a SpanContext is 1:1 with the state represented by Inject carriers. As such, we could do something like this:

type SpanContext interface {
    SetBaggageItem(key, val string)
    BaggageItem(key string) string
}

type Tracer interface {
    StartSpan(operationName string, opts ...StartSpanOption) Span
    Inject(sc SpanContext, format interface{}, carrier interface{}) error
}

type BuiltinFormat byte

const (
    // NOTE: an in-memory SpanContext interface descending from the appropriate
    // Tracer type becomes a format, much like the InMemoryCarrier we've
    // discussed previously.
    LocalSpanContext BuiltinFormat = iota
    Binary
    TextMap
)

type SpanReferenceType int

const (
    ParentRef SpanReferenceType = iota
    StartedBeforeRef
    EndedBeforeRef
    // NOTE: other sorts of span refererences go here per
    // https://github.com/opentracing/opentracing.github.io/issues/28
)

type SpanReference struct {
    RefType SpanReferenceType
    Format  interface{}
    Carrier interface{}
}

type StartSpanOptions struct {
    StartTime     time.Time
    InitialTags   map[string]interface{}
    References    []SpanReference
}

type StartSpanOption func(*StartSpanOptions)

func ReferenceParentSpanContext(sc SpanContext) StartSpanOption {
    return ReferenceParent(LocalSpanContext, sc)
}

// TODO: etc, etc, for other SpanReferenceTypes
func ReferenceParent(format, carrier interface{}) StartSpanOption {
    return func(o *StartSpanOptions) {
        o.References = append(o.References, SpanReference{
            RefType: ParentRef,
            Format:  format,
            Carrier: carrier,
        })
    }
}

// TODO: etc, etc, for the other StartSpanOption functions.

I.e., there's no more Join! Just StartSpan(opName, ReferenceParent(format, carrier)).

I would like to avoid an explicit carrier -> SpanContext API, and doing something like the above would accomplish same.

@yurishkuro
Copy link
Member

@bensigelman span context as a carrier format - ingenious!

One problem I see with the above is that by losing a distinction between Start and Join we'd be breaking Zipkin's "one span per RPC" model. Unless we say that a combination of ParentRef and non-LocalSpanContext can be an indication to reuse span ID, which is quite confusing given that the struct refers to "parent". I wish we could drop "one span per RPC" as an option, but I don't see Zipkin changing that any time soon.

FWIW, adding F(format, carrier) -> SpanContext API to tracer might be a simpler option. The above is clever, but not as easy to grasp as an explicit "deserialize" method, imho. And it actually increases the overall API surface by adding so many new types.

@bhs
Copy link
Contributor Author

bhs commented Jun 13, 2016

@yurishkuro

Unless we say that a combination of ParentRef and non-LocalSpanContext can be an indication to reuse span ID, which is quite confusing given that the struct refers to "parent". I wish we could drop "one span per RPC" as an option, but I don't see Zipkin changing that any time soon.

Well, maybe we shouldn't use the term "Parent" here. RPCClientRef would be specific and keep the Zipkin code from reading poorly. I don't mind that the method is called StartSpan, btw (even if the span exists in two processes with different logical start times).

Anyway, a slightly revised mini-proposal along these lines:

import "time"

type SpanContext interface {
    SetBaggageItem(key, val string)
    BaggageItem(key string) string
}

type Tracer interface {
    StartSpan(operationName string, opts ...StartSpanOption) Span

    Inject(sc SpanContext, format interface{}, carrier interface{}) error
    Extract(format, carrier interface{}) (SpanContext, error)
}

type BuiltinFormat byte

const (
    // NOTE: an in-memory SpanContext interface descending from the appropriate
    // Tracer type becomes a format, much like the InMemoryCarrier we've
    // discussed previously.
    LocalSpanContext BuiltinFormat = iota
    Binary
    TextMap
)

type SpanReferenceType int

const (
    BlockedParentRef SpanReferenceType = iota
    RPCClientRef
    StartedBeforeRef
    EndedBeforeRef
    // NOTE: other sorts of span refererences go here per
    // https://github.com/opentracing/opentracing.github.io/issues/28
)

type SpanReference struct {
    RefType SpanReferenceType
    SpanContext
}

type StartSpanOptions struct {
    OperationName string
    StartTime     time.Time
    InitialTags   map[string]interface{}
    References    []SpanReference
}

type StartSpanOption func(*StartSpanOptions)

func ReferenceSpanContext(refType SpanReferenceType, sc SpanContext) StartSpanOption {
    return func(o *StartSpanOptions) {
        o.References = append(o.References, SpanReference{refType, sc})
    }
}

// TODO: etc, etc, for the other StartSpanOption functions.
func OperationName(n string) StartSpanOption {
    return func(o *StartSpanOptions) {
        o.OperationName = n
    }
}

If I have time in the next 36 hours I will try to flesh out the above.

@yurishkuro
Copy link
Member

yurishkuro commented Jun 13, 2016

I like this approach better. Looks like it doesn't need LocalSpanContext BuiltinFormat anymore. Inject/extract are symmetric again.

Minor nit: when using the functional option pattern, if we define StartSpanOption as interface, e.g.

type StartSpanOption interface {
    Apply(* StartSpanOptions)
}

and implement various options explicitly with structs instead of lambdas (Ex: https://play.golang.org/p/3S0tm_7qEe) , then the call to StartSpan will require 0 memory allocations. Otherwise, when a lambda captures surrounding variable(s), it requires an alloc (afaik).

@bhs
Copy link
Contributor Author

bhs commented Jun 13, 2016

Well, I have jury duty next week (:us: :us: :us: :us:) ... hopefully that means I will just be sitting in a courthouse waiting to be called and hacking a little further on this. In which case I will work out the implications for basictracer and some example calling code. My sense, though, is that the approach here is a net improvement. And even though it introduces one new concept (SpanContext), it also removes a bunch of asymmetry and opportunity for confusion in the current set of concepts, esp around Join. And there will only be one way to create a Span: another sign of progress IMO.

I'm going to cc some people here since I realized they may not get Gitter notifications: @dkuebric @bg451 @adriancole @michaelsembwever @dawallin @bcronin @jmacd this is an important change and I'd appreciate if you took a look. If it's a problem that this RFC is in Go, I can try and translate to something analogous in Java or Python or whatever.

@dawallin
Copy link

I like the spanContext. For me it makes it more explicit in Inject that only the context part is passed on to the carrier. It is also good if Inject and Join/Extract could stay symmetric.

I like the introduction of multiple parents, but in the solution above, it looks to me that all parents are needed at the start of the span. Is this always true? Shouldn't it be possible to add parents later or that introduces other problems ?

What is the parent-after-finish problem I saw mentioned somewhere? Is it not ok to let a new span be created after the parent is finished? This would be the standard scenario in "fire and forget" eventdriven architecture.

What are the SpanReferenceTypes (BlockedParentRef, RPCClientRef, StartedBeforeRef,
EndedBeforeRef)? They seem specialized to solve a specific problem?

@bg451
Copy link
Contributor

bg451 commented Jun 14, 2016

I spent time thinking about this yesterday and I'm a really big fan of this change. It solves a lot of the problems around the semantics surrounding Finish, and will make async type programs so much easier to instrument. I really like the idea of Join being removed per one of the above proposals as well, and Extract is a lot easier to understand imo. I'm going to spend some more time thinking about this today and see if there's anything I disagree with, but overall 👍

What is the parent-after-finish problem I saw mentioned somewhere?

@dawallin Since the finish semantics means a) this is the last call on the span and b) I won't be using the span anymore, a few opentracing implementations recycle the underlying span objects after Finish() is called. As a result, there were some use after finish cases where a child span using an recycled parent span, leading to a variety of issues. Check out opentracing/basictracer-go#23 for an example.

@bhs
Copy link
Contributor Author

bhs commented Jun 15, 2016

@yurishkuro @bg451 @dawallin @lookfwd @jmacd

Since you've all commented on this, I pushed another change to make symmetric Inject+Extract.

I think this is a big improvement. The actual CausalReference enum values are lousy and inconsistent here, but the basic concept seems strong to me.

@dawallin asked:

I like the introduction of multiple parents, but in the solution above, it looks to me that all parents are needed at the start of the span. Is this always true? Shouldn't it be possible to add parents later or that introduces other problems ?

This should be possible, yes. I think we need to decide whether those refs should happen via a new method on Span or as an extension of the LogKV sort of mechanism described in opentracing/opentracing.io#96 . I would propose we leave things as-is until we have clearer use cases for references that arrive after Span start time.

@bhs
Copy link
Contributor Author

bhs commented Jun 15, 2016

(and I updated opentracing/basictracer-go#29 to illustrate the ramifications for calling code and implementations)

}
span := tracer.StartSpan(
operationName,
opts...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer

var span Span
if parentSpan := := SpanFromContext(ctx); parentSpan != nil {
    span = tracer.StartSpan(operationName, Reference(RefBlockedParent, parentSpan.SpanContext()))
} else {
    span = tracer.StartSpan(operationName)
}

avoids allocation/append of a slice

@bhs
Copy link
Contributor Author

bhs commented Jun 15, 2016

Okay, per today's weekly hangout, we are going to move forward with this. I will clean things up and ping the PR when it's ready for a detailed look.

})
return parent.Tracer().StartSpan(
operationName,
Reference(RefBlockedParent, parent.SpanContext()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment for RefBlockedParent implies this is a change of semantics. StartChildSpan says nothing about blocking its parent.

@yurishkuro
Copy link
Member

Do an %s/SpanContext/SpanMetadata/g

@bensigelman iirc you've always resisted the term "metadata" :-)

@bhs
Copy link
Contributor Author

bhs commented Jun 17, 2016

@yurishkuro

@bensigelman iirc you've always resisted the term "metadata" :-)

I did some soul-searching... my options, as I perceived them:

  • SpanContext (too general, too verbose, too redundant)
  • Context (too general, terribly confusing in Go)
  • Baggage (too weird, doesn't feel right for References which is important)
  • Baton (also too weird, also doesn't feel right for References)
  • Coordinates (also too weird, doesn't feel right for SetBaggageItem (though great for References)
  • Metadata (too vague)
  • SpanMetadata (I don't like it but doesn't have the problems above)

Other suggestions??

@bhs
Copy link
Contributor Author

bhs commented Jun 17, 2016

Anyway, this is now cleaned up and ready for a nitpicky review... @yurishkuro @jmacd @bg451 @dawallin @lookfwd

@bhs bhs force-pushed the bhs/span_context_revival branch from 8ced677 to a57601e Compare June 17, 2016 21:32
// value is copied into every local *and remote* child of this Span, and
// that can add up to a lot of network and cpu overhead.
//
// IMPORTANT NOTE #3: Baggage item keys have a restricted format:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I got rid of the baggage key restrictions in this PR.

@bhs
Copy link
Contributor Author

bhs commented Jun 22, 2016

I've thought a bunch more about the semantics for the references (regardless of what we call them). A few conclusions:

  • there's no point in specifying StartedBeforeRef or similar... all referenced Spans must have already started or we wouldn't have a context object for them.
  • it's feasible to do all of this with bitmasking, but I think that's too complicated to explain... when I prototyped it I felt like the comments were getting jargony
  • "parent" was a poor choice of terminology in dapper... but it's also something a lot of us think about and say, so I like that term more than FinishesAfterRef or similar (which is really the same idea as a blocking parent)
  • per previous comments, I don't want to bite off the not-in-the-same-trace relationships at this stage

All of this has me thinking we should only support three sorts of references for now... (Per the above, all of them imply StartedBefore which we can thus leave out)

  1. Parent (equivalent to FinishesAfter)
  2. RPCClient (a special case, but an important one)
  3. FinishedBefore (for distributed queues and so on)

... or, if we express things from the perspective of the Span that's about to start:

  1. Child (equivalent to FinishesBefore)
  2. RPCServer
  3. StartsAfter

I'm going to sleep on this, but those are my thoughts at the moment.

@bhs
Copy link
Contributor Author

bhs commented Jun 22, 2016

... well, I did sleep on this and I had a few more thoughts about it.

  • The reason we're making RPCClient its own reference type is so that tracers like Zipkin can determine whether to create a new span_id or reuse an existing one... we could (and, I'd argue, should) just use the Tags mechanism for this.
  • The only cases I feel comfortable distinguishing between at this point are the following:
    1. Fire-and-forget: all we know is that the new Span starts after the Span we're referencing
    2. Blocking children: the Span we're starting is supposed to finish before the (parent) Span we're referencing
    3. Pipelines: the Span we're referencing must have already finished

I'll try to codify this in Go later today.

@yurishkuro
Copy link
Member

The reason we're making RPCClient its own reference type is so that tracers like Zipkin can determine whether to create a new span_id or reuse an existing one... we could (and, I'd argue, should) just use the Tags mechanism for this.

There is no guarantee that the tags will be provided at span construction time and not in a separate statement. In the latter case it's too late to assign span ID.

@bhs
Copy link
Contributor Author

bhs commented Jun 22, 2016

@yurishkuro

The reason we're making RPCClient its own reference type is so that tracers like Zipkin can determine whether to create a new span_id or reuse an existing one... we could (and, I'd argue, should) just use the Tags mechanism for this.
There is no guarantee that the tags will be provided at span construction time and not in a separate statement. In the latter case it's too late to assign span ID.

If the caller is able to provide the reference at span start time, then they could equivalently provide the tag at span start time... and it seems likely to me that there will eventually be a way to add references after span start time, too. Furthermore, we already use span tags to differentiate the RPC vs non-RPC case.

I like the idea of the references being strictly about causality and not about how many process boundaries were crossed... seems like a cleaner separation of concerns.

@dawallin
Copy link

I like this. The RPCClient was the ref type I had most problem with.

@bhs
Copy link
Contributor Author

bhs commented Jun 23, 2016

I pushed some "minimum viable reference types" as well as a helper for the RPC server span case (in ext/). PTAL.


func (r rpcServerOption) Apply(o *opentracing.StartSpanOptions) {
opentracing.Parent(r.clientMetadata).Apply(o)
(opentracing.Tags{SpanKind: SpanKindRPCServer}).Apply(o)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably need string(SpanKind)

+1

@yurishkuro
Copy link
Member

I'm ok with the latest version, modulo minor renaming (meta->context, SpanReferenceType) and a few clarifications in the comments.

@bhs bhs changed the title introduce and take advantage of SpanMetadata introduce and take advantage of SpanContext Jun 23, 2016
@yurishkuro
Copy link
Member

LGTM

@bhs
Copy link
Contributor Author

bhs commented Jun 24, 2016

I'm going to leave this unmerged until the analogous change is in for the docs site (which should be the "master copy").

@bhs bhs force-pushed the bhs/span_context_revival branch from 14972a4 to 4232cc0 Compare July 2, 2016 22:22
@bhs
Copy link
Contributor Author

bhs commented Jul 3, 2016

This has been updated per opentracing/opentracing.io#100 and #85 + #86

@bhs
Copy link
Contributor Author

bhs commented Jul 4, 2016

I will merge this tomorrow if there are no objections.

(cc @basvanbeek since I noticed you were kind enough to fix recent breakage in openzipkin-contrib/zipkin-go-opentracing#5 ... Bas, I am happy to help with the downstream changes to support what's here if that's helpful.)

@basvanbeek
Copy link
Member

@bensigelman... that would be great as I'm quite busy at the moment and not sure how quickly I'd be able to look at it myself

@bhs
Copy link
Contributor Author

bhs commented Jul 5, 2016

@basvanbeek I'll take a look and try to prepare a PR for you to review before I merge this opentracing-go change

// Create a span referring to the RPC client.
serverSpan = opentracing.StartSpan(
appSpecificOperationName,
opentracing.RPCServerOption(wireContext))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/opentracing/ext/ since RPCServerOption is in the ext package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, fixed.

@bhs
Copy link
Contributor Author

bhs commented Jul 6, 2016

@basvanbeek see openzipkin-contrib/zipkin-go-opentracing#9

I will plan to merge this pr #82 tomorrow unless you have concerns about the zipkin change.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.