Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for tracing asynchronous operations #124

Merged
merged 20 commits into from
Sep 4, 2019

Conversation

carterkozak
Copy link
Contributor

Added a Tracer API to create a span that is not attached to a
thread.

@carterkozak carterkozak requested a review from a team as a code owner April 1, 2019 15:10
/**
* Like {@link #startSpan(String, SpanType)}, but does not set or modify tracing thread state.
*/
public static DetachedSpan startDetachedSpan(String operation, SpanType type) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be clearer if we call it startAsyncSpan instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However it's not necessarily asynchronous, it's no different from any other span except that it is not connected to any thread-local state.

@carterkozak carterkozak requested review from esword and dsd987 April 1, 2019 15:32
@markelliot
Copy link
Contributor

perhaps an alternative here would be exposing Trace + having Trace#createSpan exposed, and we could add a method copyOrCreateTrace which returns a new Trace captured from the thread state, but is otherwise 'detached' and still mutable in the way that's required by AsyncTrace

@carterkozak
Copy link
Contributor Author

I'd prefer not to expose the Trace or OpenSpan objects directly because API consumers shouldn't need access to that internal data to instrument their code. I realize these are already exposed to some extent, but we can avoid further api/implementation coupling while providing both easier and safer utilities.

@markelliot
Copy link
Contributor

DetachedSpan seems like as odd a thing to expose as Trace or OpenSpan. I guess the thing I'm poking at is that in cases where we want traces to be detached from threads, we'd logically be implementing exactly the things that we attach to threads. I'd rather we expose the basics than invent other concepts More generally, what seems appropriate for API surface here seems to follow a pretty arbitrary set of rules -- the initial intent was something extremely limited, and the drift that's grown with the library (from having basically three methods) now makes it hard to hold a reasonable bar.

@carterkozak
Copy link
Contributor Author

Agreed that we're in an odd place, and there's not a lot we can do without either breaking API or creating sprawl. The Trace object doesn't work well across threads, it's meant to represent a single threads state -- at it's core it's a tuple of traceId, sampled, stack<open-span>. The stack maps directly to the call stack, copying this stack across threads destroys this mapping and makes it easy to shoot ourselves in the foot.

If we wanted to focus on safety, the Trace object would take the form {traceId, sampled, currentSpan}, and tracing operations would be encapsulated in a runnable e.g. Tracer.startSpan("operation", () -> myTracedOperation); where the Trace stack is replaced by span invocations on the application stack. In this model, the proposed DefaultDetachedSpan would be unnecessary, and the Trace object could make sense. Unfortunately we have taken advantage of too many sharp edges of the current Trace object (e.g. AsyncTracer) to make this change without a major rev, which is likely more trouble than it's worth at the moment.

blab: I don't have a great solution.

@ellisjoe
Copy link
Contributor

Hey, just jumping in on this now since I'm going to need something like this for dialogue. Has any more thought gone into this issue, or are the comments here up to date and we just need to come to an agreement on some proposal?

@carterkozak
Copy link
Contributor Author

Hey @ellisjoe, a month or so ago I caught up with @markelliot and we discussed a couple models. Mark was planning to put together a sample proposal/implementation that included a concept of span ownership, but he may have been pulled into other things. I'd be happy to chat through possibilities.

* as the parent instead of thread state.
*/
@MustBeClosed
SpanToken attach(String operationName, SpanType type);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between calling detach(operation) and then complete() vs attach(operation) and then close()? is it just that calling attach also overrides my current thread local state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attach applies a new span to the current thread, from attach() to close() This can be done multiple times to create sibling spans with the DetachedSpan as a parent.

DetachedSpan.complete, is used to complete the detached span, which is not associated with a thread. There is danger that complete is never called, and we fail to associate the child spans (from attach) with the original parent of this detached span, but I don't think there are good options to avoid that.

This model means that you cannot ever detach or attach an existing span that is already bound to a thread, each time that we attach or detach we mark a new operation. I've found that otherwise, it can become difficult to trace where our tracing spans have actually come from.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, so you'd start your DetachedSpan, then you could attach() it once your async task starts running, and then go back to using the normal Tracer.startSpan() methods throughout.

Think the thing that seems a bit funky is that you need to close the SpanToken rather than just using Tracer.completeSpan().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct, the reason for the different type is so we can reset the original tracing data on the attached thread. We could ignore that, and apply state directly, but it would be confusing to api consumers whether or not the attach creates a new tracing span, and how many they are responsible for completing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may have been more obvious to me if the attach() method was on Tracers rather than here. In my head, at least at the moment, Tracers deals with all the threading business, and so I could imagine wanting to call Tracers.attach(detachedSpan) which would then give me a SpanToken to close and reset the thread's trace. And then on DetachedSpan you just would just have newSpan() which gives you a new DetachedSpan parented by the current one. That way DetachedSpan has nothing to do with threading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 That sounds good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracers.attach(detachedSpan) would also need an operation name param. It would also require us to pass some additional data between DetachedSpan and Tracer, which isn't currently possible. Right now, the DetachedSpan interface doesn't leak whether or not it is sampled, but that is required in order to attach to a thread.

@carterkozak
Copy link
Contributor Author

Holding off fixing this up until #180 has gone in to unblock other work

@carterkozak
Copy link
Contributor Author

Rebased, have to catch a flight shortly, will knock out the rest when I can.

@carterkozak carterkozak force-pushed the ckozak/detached_tracing_api branch from e93254f to b45f25d Compare August 21, 2019 19:18
@carterkozak
Copy link
Contributor Author

I've updated the API, there's still some implementation code that I need to move around. I don't like having the DetachedSpan implementations living in Tracer, but I can clean that up once we're happy with the ergonomics.

@carterkozak carterkozak force-pushed the ckozak/detached_tracing_api branch from 4a1aae2 to b9b7d4a Compare August 23, 2019 17:05
@carterkozak carterkozak changed the title [proposal] Support for tracing asynchronous operations Support for tracing asynchronous operations Aug 23, 2019
return Optional.empty();
}

private static final class SampledDetachedSpan implements DetachedSpan {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we convert DetachedSpan into an abstract class, and inline Smapled/UnsampledDetachedSpan there? It would make the layout consistent with how Trace is currently implemented

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could go either way on this. the DetachedSpan interface is meant to be relatively straightforward to read, and detached from implementation details because it's an API that will be consumed by developers using this library. Trace is an implementation detail, and isn't really consumed outside of this library.

If we moved these to the DetachedSpan class, we would also need to implement a package private accessor for the raw value of the Tracer trace thread-local, which could be dangerous if folks try to get clever.

}

@Override
public DetachedSpan detach(String operation, SpanType type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What use case do you envision for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider a webserver: When a request is received on a non-blocking thread, I'd like to start a detached span representing the entire request. Later, before processing, the request is enqueued to a larger thread pool where we allow work -- I'd like the ability to track time spent enqueued, despite not having attached the span to thread.

@iamdanfox
Copy link
Contributor

iamdanfox commented Sep 3, 2019

Unblocking 👍 I think the next steps are to publish an RC from here and then make sure it looks sensible in conjure-java-runtime.

@iamdanfox
Copy link
Contributor

iamdanfox commented Sep 3, 2019

Dan and I are trying this out on conjure-java-runtime's https://github.com/palantir/conjure-java-runtime/compare/ds/use-new-tracing?expand=1, findings so far:

  • we want to use CloseableSpan instead of the Tracer#startSpan to get the X-B3-SpanId etc headers onto the wire, but don't have an easy way to get the current header. released as 3.2.0-rc3
  • maybe we should use the java assert keyword to make sure people wire up their instrumentation correctly in tests?

iamdanfox and others added 3 commits September 4, 2019 12:29
* Revert "CloseableSpan exposes ids necessary to set headers"

This reverts commit 1f188ac.

* Tracer#getTraceMetadata

* parentSpanId is empty

* nit

* Update tracing/src/main/java/com/palantir/tracing/Tracer.java

Co-Authored-By: Carter Kozak <ckozak@ckozak.net>
@ferozco
Copy link
Contributor

ferozco commented Sep 4, 2019

👍

@iamdanfox iamdanfox merged commit c8ade88 into develop Sep 4, 2019
@svc-autorelease
Copy link
Collaborator

Released 3.2.0

@iamdanfox iamdanfox deleted the ckozak/detached_tracing_api branch September 4, 2019 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants