Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: incorporate remote tracing spans from snapshots #86436

Merged
merged 1 commit into from
Aug 25, 2022

Conversation

AlexTalks
Copy link
Contributor

This adds collected tracing spans into a SnapshotResponse object in
order to incorporate remote traces from the receiver side of a snapshot
into the client's (i.e. the sender's) context.

Release justification: Low-risk observability change.
Release note: None

@AlexTalks AlexTalks requested a review from andreimatei August 19, 2022 06:51
@AlexTalks AlexTalks requested a review from a team as a code owner August 19, 2022 06:51
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks and @andreimatei)


pkg/kv/kvserver/store_snapshot.go line 400 at r1 (raw file):

		if req.Header != nil {
			err := errors.New("client error: provided a header mid-stream")
			return noSnap, sendSnapshotError(stream, err, nil)

pls comment all the nils, as per the style guide.
Or better, make a sendSnapshotErrWithTrace.


pkg/kv/kvserver/store_snapshot.go line 1034 at r1 (raw file):

		}
	}
	rCtx, rSp := tracing.EnsureChildSpan(ctx, s.cfg.Tracer(), "receive snapshot data")

let's reassign ctx like before unless there's a very good reason not too. It's too easy to use the wrong context when there's a ctx in scope.


pkg/kv/kvserver/store_snapshot.go line 1049 at r1 (raw file):

	if err := s.processRaftSnapshotRequest(applyCtx, header, inSnap); err != nil {
		return sendSnapshotError(stream, errors.Wrap(err.GoError(), "failed to apply snapshot"),
			sp.GetConfiguredRecording())

nit: lift a rec := sp.GetConfiguredRecording() and use it in both cases


pkg/kv/kvserver/store_snapshot.go line 1057 at r1 (raw file):

}

func sendSnapshotError(stream incomingSnapshotStream, err error, recordedSpans tracingpb.Recording) error {

consider renaming recordedSpans -> trace


pkg/kv/kvserver/store_snapshot.go line 1057 at r1 (raw file):

}

func sendSnapshotError(stream incomingSnapshotStream, err error, recordedSpans tracingpb.Recording) error {

the trace deserves a comment. And say it can be missing.


pkg/kv/kvserver/store_snapshot.go line 1423 at r1 (raw file):

// importSnapshotTracingSpans incorporates traces collected in a
// (Delegated)SnapshotResponse into the current context's tracing span.
func importSnapshotTracingSpans(ctx context.Context, resp kvserverpb.ResponseWithTracingSpans) {

Do we really need this ResponseWithTrancingSpans interface? It seems simpler and better to just pass a recording to this function directly.
I also think it'd be better if this took the span directly - one of the callers has a reference to it, and it'd be better to see that variable be used. At which point, consider not having this function at all.


pkg/kv/kvserver/store_snapshot.go line 1428 at r1 (raw file):

		span := tracing.SpanFromContext(ctx)
		if span == nil {
			log.Warningf(

I think this warning is a bad idea. It relies on the fact that that, on the server side, sp.GetRecording() returns nil for RecordingOff. That might not be true in the future, and it's not even true today when the server span's recording is enabled dynamically from /tracez.


pkg/kv/kvserver/store_snapshot.go line 1475 at r1 (raw file):

		return err
	}
	importSnapshotTracingSpans(ctx, resp)

if only the APPLIED message carries a trace, let's only import in response to that one.
It would probably be hard for multiple messages to carry traces; if they did and the traces overlapped, importing the overlapping part multiple times would be an error.


pkg/kv/kvserver/store_snapshot.go line 1563 at r1 (raw file):

			return errors.Wrapf(err, "%s: expected EOF, got resp=%v with error", to, unexpectedResp)
		}
		importSnapshotTracingSpans(ctx, resp)

resp or unexpectedResp?
In any case, I'd remove this import altogether.


pkg/kv/kvserver/kvserverpb/raft.proto line 206 at r1 (raw file):

}

message SnapshotResponse {

please comment either here or on the service RPC declaration how many messages the client should expect


pkg/kv/kvserver/kvserverpb/raft.proto line 217 at r1 (raw file):

  string message = 2;
  reserved 3;
  repeated util.tracing.tracingpb.RecordedSpan collected_spans = 4 [(gogoproto.nullable) = false];

this needs a comment. Do all responses carry it or only the APPLIED one?

@AlexTalks AlexTalks added release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 labels Aug 23, 2022
@AlexTalks AlexTalks force-pushed the snapshot_remote_traces branch from 00c7609 to 9ddf1b0 Compare August 24, 2022 04:51
Copy link
Contributor Author

@AlexTalks AlexTalks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks and @andreimatei)


pkg/kv/kvserver/store_snapshot.go line 400 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

pls comment all the nils, as per the style guide.
Or better, make a sendSnapshotErrWithTrace.

Done.


pkg/kv/kvserver/store_snapshot.go line 1034 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

let's reassign ctx like before unless there's a very good reason not too. It's too easy to use the wrong context when there's a ctx in scope.

Done.


pkg/kv/kvserver/store_snapshot.go line 1057 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

the trace deserves a comment. And say it can be missing.

OK, assuming this is resolved by adding sendSnapshotErrorWithTrace


pkg/kv/kvserver/store_snapshot.go line 1423 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Do we really need this ResponseWithTrancingSpans interface? It seems simpler and better to just pass a recording to this function directly.
I also think it'd be better if this took the span directly - one of the callers has a reference to it, and it'd be better to see that variable be used. At which point, consider not having this function at all.

OK - I got rid of this function.

In general I might have preferred to keep the ResponseWithTracingSpans interface, since I like the pattern of using an interface to describe a common capability, but as it is no longer necessary I got rid of that too.


pkg/kv/kvserver/store_snapshot.go line 1428 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think this warning is a bad idea. It relies on the fact that that, on the server side, sp.GetRecording() returns nil for RecordingOff. That might not be true in the future, and it's not even true today when the server span's recording is enabled dynamically from /tracez.

Done, removed this.


pkg/kv/kvserver/store_snapshot.go line 1475 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

if only the APPLIED message carries a trace, let's only import in response to that one.
It would probably be hard for multiple messages to carry traces; if they did and the traces overlapped, importing the overlapping part multiple times would be an error.

Done.


pkg/kv/kvserver/store_snapshot.go line 1563 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

resp or unexpectedResp?
In any case, I'd remove this import altogether.

Done - good catch, should have been unexpectedResp, but per the above comment I've removed this import anyway.


pkg/kv/kvserver/kvserverpb/raft.proto line 206 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please comment either here or on the service RPC declaration how many messages the client should expect

Done.


pkg/kv/kvserver/kvserverpb/raft.proto line 217 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

this needs a comment. Do all responses carry it or only the APPLIED one?

Done.

@AlexTalks AlexTalks force-pushed the snapshot_remote_traces branch from 9ddf1b0 to 4d24c9d Compare August 24, 2022 06:31
This adds collected tracing spans into a `SnapshotResponse` object in
order to incorporate remote traces from the receiver side of a snapshot
into the client's (i.e. the sender's) context.

Release justification: Low-risk observability change.
Release note: None
@AlexTalks AlexTalks force-pushed the snapshot_remote_traces branch from 4d24c9d to 09a1d46 Compare August 24, 2022 15:44
Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)

@AlexTalks
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 24, 2022

👎 Rejected by too few approved reviews

@AlexTalks AlexTalks removed the request for review from a team August 24, 2022 22:22
@AlexTalks
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 24, 2022

👎 Rejected by too few approved reviews

@AlexTalks
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 25, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 25, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 25, 2022

Build succeeded:

@craig craig bot merged commit 9a7ec4b into cockroachdb:master Aug 25, 2022
@AlexTalks AlexTalks deleted the snapshot_remote_traces branch August 25, 2022 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants