Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: make span a union type #54737

Merged
merged 20 commits into from
Oct 23, 2020
Merged

tracing: make span a union type #54737

merged 20 commits into from
Oct 23, 2020

Conversation

tbg
Copy link
Member

@tbg tbg commented Sep 24, 2020

Clarify that a span is anywhere between zero to three subspans:

  • net/trace,
  • external opentracing Tracer, and
  • crdb-internal trace span,

where zero subspans corresponds to what is today noopSpan (not
implemented yet, but TODO added).

Lots of threads to pull, but this is a good checkpoint since
everything compiles and the diff is mechanical and small enough
to actually review.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg tbg marked this pull request as ready for review September 30, 2020 08:10
Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where zero subspans corresponds to what is today noopSpan (not
implemented yet, but TODO added).

Let's pull on this this and remove the noopSpan type. Otherwise I'm fairly confused about what the possible types of spans are.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)


pkg/util/tracing/tracer_span.go, line 163 at r1 (raw file):

type otSpan struct {
	// TODO(tbg): see if we can lose the shadowTr here and rely on shadowSpan.Tracer().

I think so...


pkg/util/tracing/tracer_span.go, line 171 at r1 (raw file):

type span struct {
	crdbSpan // can be zero

Can it really be zero? Doesn't it need to have at least a tracer in it? I'm thinking it should have a tracer in it so that users just need a span in order to create other spans, not a span and a tracer. I guess in that world the noopSpan would become a per-tracer instance.
If it does always have a tracer in it, should we hang our *Tracer from span instead of from crdbSpan? The crdbSpan could really be zero.


pkg/util/tracing/tracer_span.go, line 182 at r1 (raw file):

func (s *span) isBlackHole() bool {
	return !s.crdbSpan.isRecording() && s.netTr == nil && s.otSpan == (otSpan{})

why aren't we testing crdbSpan == span{} ? Or in fact span{}


pkg/util/tracing/tracer_span.go, line 683 at r1 (raw file):

// tracing-related work that would be discarded anyway.
func IsBlackHoleSpan(s opentracing.Span) bool {
	// There are two types of black holes: instances of noopSpan and, when tracing

what's the plan here? noopSpan goes away?

Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a commit that removes noopSpan. I'll do more clean-ups but in another PR.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)


pkg/util/tracing/tracer_span.go, line 163 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think so...

Will pull on this separately.


pkg/util/tracing/tracer_span.go, line 171 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Can it really be zero? Doesn't it need to have at least a tracer in it? I'm thinking it should have a tracer in it so that users just need a span in order to create other spans, not a span and a tracer. I guess in that world the noopSpan would become a per-tracer instance.
If it does always have a tracer in it, should we hang our *Tracer from span instead of from crdbSpan? The crdbSpan could really be zero.

Done.


pkg/util/tracing/tracer_span.go, line 182 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

why aren't we testing crdbSpan == span{} ? Or in fact span{}

What do you mean?


pkg/util/tracing/tracer_span.go, line 683 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

what's the plan here? noopSpan goes away?

Done.

@tbg
Copy link
Member Author

tbg commented Oct 8, 2020

I need to fix up the "remove noop span" commit so that we're turning all operations on it into noops. Luckily the race detector caught this in CI.

@tbg
Copy link
Member Author

tbg commented Oct 8, 2020

I will probably add another commit that makes the noop span have a nil crdbSpan instead of a zero one. This makes sure any attempts to accidentally reach into a noopSpan will be NPEs.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that's clear. Thanks

@tbg
Copy link
Member Author

tbg commented Oct 9, 2020

Added more commits - made more aggressive changes. As of right now, this PR results in there being just a single workhorse method that creates all spans. This is a major simplification from before and there's a lot left to be cleaned up still. I am getting more comfortable with this package and am starting to understand what the semantics are (did my best to document).

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM up to and including r13, but I'd like Andrei to review the last commit r14.

Reviewed 5 of 5 files at r1, 3 of 3 files at r3, 2 of 2 files at r4, 1 of 1 files at r5, 4 of 4 files at r6, 4 of 4 files at r7, 26 of 26 files at r8, 2 of 2 files at r9, 1 of 1 files at r10, 1 of 1 files at r11, 1 of 1 files at r12, 1 of 1 files at r13, 5 of 5 files at r14.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)


pkg/util/tracing/tracer_span.go, line 703 at r10 (raw file):

func (s *crdbSpan) ImportRemoteSpans(remoteSpans []tracingpb.RecordedSpan) error {
	if !s.isRecording() {
		return errors.New("adding Raw Spans to a span that isn't recording")

looks like this should be errors.AssertionFailedf

@tbg
Copy link
Member Author

tbg commented Oct 15, 2020

Friendly ping @andreimatei. I have more work to do here still but the pile only keeps growing.

@tbg tbg mentioned this pull request Oct 15, 2020
Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:LGTM:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)


pkg/sql/distsql/server.go, line 208 at r14 (raw file):

		//  If that field is unset, we might still want to create a child span if
		//  this flow is run synchronously.
		sp = ds.Tracer.(*tracing.Tracer).StartChildSpan(opName, parentSpan.(*tracing.Span).SpanContext(), logtags.FromContext(ctx), false /* separateRecording */)

long line


pkg/util/tracing/tracer.go, line 92 at r3 (raw file):

// state.
type Tracer struct {
	// Preallocated noopSpan, used to avoid creating spans when we are not using

comment is stale


pkg/util/tracing/tracer.go, line 248 at r14 (raw file):

		}
		parentType = r.Type
		// Note that the logic around here doesn't support spans with multiple

let's assert that it stays the case


pkg/util/tracing/tracer.go, line 340 at r14 (raw file):

		parentContext.TraceID == 0 &&
		parentContext.recordingType == NoRecording &&
		!bool(recordable) {

I think recordable == NonRecordableSpan is much better - so that the constants can be flipped


pkg/util/tracing/tracer.go, line 369 at r14 (raw file):

	s.crdb.SpanID = uint64(rand.Int63())

	// We use the shadowTr from the parent context over that of our

I'd rather not think about this case... How about checking parentContext.shadowTr == t.getShadowTracer() and, if they're not the same, not using either?
Besides requiring less thinking, it also seems the right thing to do from the perspective of the previous shadowTracer - if I've unlinked it, I want to stop using it as soon as possible.
And it's also better when transitioning from no shadowTracer to some shadowTracer - I probably don't want to have children with no parent in the newly-linked tracer.


pkg/util/tracing/tracer_span.go, line 182 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

What do you mean?

nvm


pkg/util/tracing/tracer_span.go, line 168 at r3 (raw file):

type otSpan struct {
	// TODO(tbg): see if we can lose the shadowTr here and rely on shadowSpan.Tracer().
	// Probably not - but worth checking.

why not?


pkg/util/tracing/tracer_span.go, line 192 at r3 (raw file):

	// Special case: trace id zero implies that everything else,
	// with the exception of the tracer, is also zero.
	return s.isBlackHole() && s.crdbSpan.TraceID == 0

pls comment isNoop and isBlackhole - in particular the difference between them


pkg/util/tracing/tracer_span.go, line 192 at r3 (raw file):

	// Special case: trace id zero implies that everything else,
	// with the exception of the tracer, is also zero.
	return s.isBlackHole() && s.crdbSpan.TraceID == 0

If we could do something else other than looking at TraceID, I think it'd be better. I hope we can get rid of TraceID; I don't think we do anything with it.


pkg/util/tracing/tracer_span.go, line 111 at r13 (raw file):

	//
	// TODO(tbg): there's ChildSpanSeparateRecording which avoids sharing
	// the recording with the parent. I hope we can remove it.

I don't know if we can remove it... I'd strike this part of the comment or expand it.
I think the idea is that in DistSQL we want the recording of different parts of the trace (each processor) to be collected separately, and we don't want the parent's recording to also include them (right?). And the child recordings might be collected even after the parent's (?).


pkg/util/tracing/tracer_span.go, line 795 at r14 (raw file):

}

// TODO(tbg): these are temporary to get things to compile without larger

ummm I don't really know what's going on here, but I'd rather do the mechanical refactors, or just make the types exported for now. There's so many types of spans going around that one more spelling difference can't help.

Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs! I'll try to get this PR passing (a bunch of tests are highlighting what are likely bugs I introduced outside the tracing package's coverage) and follow up with more cleanups.

It looks like the main next step is to rid us of our internal use of the opentracing interfaces. I tried making some changes I considered worthwhile without getting into that, but there's always a blocker.

Dismissed @andreimatei from a discussion.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @irfansharif, @knz, and @TheSamHuang)


pkg/sql/distsql/server.go, line 208 at r14 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

long line

Done.


pkg/util/tracing/tracer.go, line 92 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

comment is stale

But it's... not? What would you change?


pkg/util/tracing/tracer.go, line 248 at r14 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

let's assert that it stays the case

Done.


pkg/util/tracing/tracer.go, line 340 at r14 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think recordable == NonRecordableSpan is much better - so that the constants can be flipped

Done.


pkg/util/tracing/tracer.go, line 369 at r14 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I'd rather not think about this case... How about checking parentContext.shadowTr == t.getShadowTracer() and, if they're not the same, not using either?
Besides requiring less thinking, it also seems the right thing to do from the perspective of the previous shadowTracer - if I've unlinked it, I want to stop using it as soon as possible.
And it's also better when transitioning from no shadowTracer to some shadowTracer - I probably don't want to have children with no parent in the newly-linked tracer.

Done.


pkg/util/tracing/tracer_span.go, line 168 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

why not?

Because shadowSpan.Tracer() gives you the opentracing.Tracer, but you need the surrounding shadowTracer that can give you the type.

There's probably still a way to get rid of this once more of this "shadow" business unravels, but right now, we have to keep this. This all ties in with spanContext as well so I will pull on that first..


pkg/util/tracing/tracer_span.go, line 192 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

pls comment isNoop and isBlackhole - in particular the difference between them

Done.


pkg/util/tracing/tracer_span.go, line 192 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

If we could do something else other than looking at TraceID, I think it'd be better. I hope we can get rid of TraceID; I don't think we do anything with it.

One step at a time. I agree that TraceID isn't used in our case, but let me hold off on removing it until I am a bit better oriented.


pkg/util/tracing/tracer_span.go, line 703 at r10 (raw file):

Previously, knz (kena) wrote…

looks like this should be errors.AssertionFailedf

Done.


pkg/util/tracing/tracer_span.go, line 111 at r13 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I don't know if we can remove it... I'd strike this part of the comment or expand it.
I think the idea is that in DistSQL we want the recording of different parts of the trace (each processor) to be collected separately, and we don't want the parent's recording to also include them (right?). And the child recordings might be collected even after the parent's (?).

I think recordings should be tied to the span in which they occur and should not be shared with the parent, period (or if so, to grab the "transitive" recording from a parent is the non-normal case that requires a special option). This sharing business requires the user to keep track of who's a child of who to make sure the recordings all pan out. It's a pretty ugly contract.


pkg/util/tracing/tracer_span.go, line 795 at r14 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

ummm I don't really know what's going on here, but I'd rather do the mechanical refactors, or just make the types exported for now. There's so many types of spans going around that one more spelling difference can't help.

Done. V satisfying.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @irfansharif, @knz, @tbg, and @TheSamHuang)


pkg/util/tracing/tracer_span.go, line 111 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I think recordings should be tied to the span in which they occur and should not be shared with the parent, period (or if so, to grab the "transitive" recording from a parent is the non-normal case that requires a special option). This sharing business requires the user to keep track of who's a child of who to make sure the recordings all pan out. It's a pretty ugly contract.

I don't think I agree with this. Tracing is all about recursion and connections between spans. If today I get some information from a span, and tomorrow I chunk of some of that span's work into a sub-span, I don't want to lose any information I was previously getting (au contraire, I wand to get more info).
This is not to say that I like this ChildSeparateRecording option - in fact I very much dislike it. It's an artifact of the way our trace collection works that doesn't work well with the opentracing concepts. I too hope we can get rid of it somehow - perhaps by never collecting the recording of the parent span that creates all the child DistSQL spans that end up being collected, or by clearing the children's recording when they get collected like I think you've suggested elsewhere.

As a matter of style, I would still remove the "I hope we can remove it." comment, or expand it. It's just troll bait as it stands.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finished re-reviewing the PR after the rebase and then the additional commit.
No further comment other than what andrei already raised.

Tobias, a kind request to not rebase / amend this further until the end of the review and instead use fixup commits if necessary; otherwise the reviewable experience will become unbereable. At the end of the review whene verything is approved you can then autosquash the fixup commits if you made any.

Reviewed 29 of 29 files at r15, 3 of 3 files at r16, 2 of 2 files at r17, 1 of 1 files at r18, 4 of 4 files at r19, 4 of 4 files at r20, 26 of 26 files at r21, 2 of 2 files at r22, 1 of 1 files at r23, 1 of 1 files at r24, 1 of 1 files at r25, 1 of 1 files at r26, 5 of 5 files at r27, 1 of 1 files at r28, 1 of 1 files at r29, 1 of 1 files at r30, 1 of 1 files at r31, 8 of 8 files at r32, 2 of 2 files at r33, 2 of 2 files at r34.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @irfansharif, @knz, @tbg, and @TheSamHuang)

@tbg tbg requested a review from a team as a code owner October 22, 2020 05:52
@tbg tbg requested a review from andreimatei October 22, 2020 05:53
@tbg tbg requested a review from knz October 22, 2020 05:53
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point Raphael, thanks for the suggestion. No major changes in these last fixups, just greening CI.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @irfansharif, @knz, and @TheSamHuang)


pkg/util/tracing/tracer_span.go, line 111 at r13 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I don't think I agree with this. Tracing is all about recursion and connections between spans. If today I get some information from a span, and tomorrow I chunk of some of that span's work into a sub-span, I don't want to lose any information I was previously getting (au contraire, I wand to get more info).
This is not to say that I like this ChildSeparateRecording option - in fact I very much dislike it. It's an artifact of the way our trace collection works that doesn't work well with the opentracing concepts. I too hope we can get rid of it somehow - perhaps by never collecting the recording of the parent span that creates all the child DistSQL spans that end up being collected, or by clearing the children's recording when they get collected like I think you've suggested elsewhere.

As a matter of style, I would still remove the "I hope we can remove it." comment, or expand it. It's just troll bait as it stands.

Done. We'll continue to discuss recordings outside of this PR.

tbg and others added 20 commits October 23, 2020 15:34
Clarify that a `span` is anywhere between zero to three subspans:

- net/trace,
- external opentracing Tracer, and
- crdb-internal trace span,

where zero subspans corresponds to what is today noopSpan (not
implemented yet, but TODO added).

Lots of threads to pull, but this is a good checkpoint since
everything compiles and the diff is mechanical and small enough
to actually review.

Release note: None
Remove the noopSpan and noopSpanContext types. Instead, we use a `*span`
(and `*spanContext`) with a zero trace ID (and in fact, all zero except
for the tracer).

This allows a sleuth of further cleanups which are deferred to follow-up
commits.

Release note: None
That's where it belongs.

Release note: None
Lots of top-level logic reaches into the `crdbSpan`, which is not ideal.
We want top-level tracing methods to delegate to a method on each of the
subspans of our main `span` type.

This commit doesn't do anything about that, but it gets us one step
closer by mechanically unembedding the crdbSpan field. The diff
highlights a lot of code that needs to be moved into methods that sit on
`*crdbSpan`.

Release note: None
Release note: None
This starts towards a picture in which operations on a `*span`
essentially turn into an opaque invocation of something on a `crdbSpan`
plus possibly additional logic for the net/trace and external span.

The hope is that by pulling further on this thread `crdbSpan` can move
into its own package and thus have a clean interaction with the tracer
and top-level methods.

Release note: None
I had to understand the semantics by reading the code, and now that
I think I understand them, want to save the next person from having
to do the same.

Release note: None
We were previously maintaining very similar code in three places:

1. `(*Tracer).StartSpan`
2. `StartRootSpan`
3. `StartChildSpan`

This commit creates a method `(*Tracer).startSpanGeneric` and makes
the above a thin wrapper around it.

This allows us to simplify just one version of the code.  Merging the
three also showed minor differences: tags didn't seem to propagate the
same in all cases, and a bug (that I just introduced) about what
constitutes a `noopSpanContext` was highlighted and fixed.

I still have plenty of questions about this code and far from good
confidence that I didn't introduce any bugs. I did pay some attention
to performance regressions, but no benchmarks are introduced at this
point. My interest is to simplify the package first, then make it
cheap (again). We can live with a temporary performance regression,
as long as it is limited to the case in which tracing is actually
enabled.

Release note: None
They will be used instead of `opentracing.Span{,Context}` in future
commits.

Release note: None
Improve commentary about recordability while I'm there.

Release note: None
This got lost during the refactors and caused some test failures.

Release note: None
@tbg
Copy link
Member Author

tbg commented Oct 23, 2020

Choo choo goes the merge train. Thanks folks!
bors r=knz,andreimatei

@craig
Copy link
Contributor

craig bot commented Oct 23, 2020

Build succeeded:

@craig craig bot merged commit 33e3abc into cockroachdb:master Oct 23, 2020
@tbg tbg deleted the tracing-cleanup branch October 27, 2020 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants