API for minimizing no-tracer / non-sampled overhead #31

dkuebric · 2016-01-18T19:39:58Z

As discussed in #27, a no-op tracer is a good default for the API, allowing instrumentation to be sitting in a project without most project users being active tracers.

However, if the instrumentation paths will be hit in cases where users have no tracer, or are using a tracer which samples, there should be some way for instrumentation to avoid work that might currently be done "above the API"--i.e. pulling backtraces, formatting data structures, etc.

An example API might be along the lines of a bool isTracing() method. Thoughts?

The text was updated successfully, but these errors were encountered:

bhs · 2016-01-18T22:28:18Z

@dankosaur this is an important and rich topic, thanks for bringing it up.

As food for thought, I'll link to this thing (which I have mixed feelings about for other reasons, but I'll try to stay on topic!): https://godoc.org/golang.org/x/net/trace#Trace

In particular, there are methods like these:

    // LazyLog adds x to the event log. It will be evaluated each time the
    // /debug/requests page is rendered. Any memory referenced by x will be
    // pinned until the trace is finished and later discarded.
    LazyLog(x fmt.Stringer, sensitive bool)

The function signature is not important here, but more the idea that the expensive work (in this case, whatever the fmt.Stringer has to do) is deferred until it's needed. Sometimes that is not feasible (e.g., the stack trace example), but I try to remember that sampling need not be a decision made before-the-fact; i.e., isTracing() may be difficult to define with certainty in a system that makes sampling decisions after a span has finished or nearly-finished. (Not to say we couldn't have an isNoop() method that would just always return false for Tracers that do sampling after the fact)

bogdandrutu · 2016-01-18T22:49:16Z

My vote will be for an always on system. Imagine you have a stuck operation and you try to debug it. Agreed we want to have levels:

Is distributed tracing sampled Yes/No
Verbose level?

But by default should be always on and the cost to add an annotation(log) + tag should be minimal to not affect the overall system performance.

yurishkuro · 2016-01-18T22:53:24Z

I can see merits in both points of view. The lazy tagging/logging is definitely the way to go, but it does not completely remove the overhead. Do we know any performance metrics from tracing systems that do post-trace sampling? I.e. are they feasible in really high throughput services? If services over certain qps do require pre-trace sampling, then it would be the argument in favor of having a shortcut check is_traced().

bhs · 2016-01-18T22:56:19Z

@yurishkuro there are some truly high-throughput, low-latency systems at google (e.g., bigtable) that do deferred evaluation of logging data: i.e., it definitely can scale. Of course there's a trade-off, though, and in that case it was an API which involved (subtle) refcounting/pinning.

bogdandrutu · 2016-01-18T23:02:28Z

In terms of scalability I think it can scale well enough, if a tracing system is < 1% of the total cost in my opinion it is acceptable and for a very low-latency system to be <1% is usually more than 1us (100us per op). So we can definitely do that and we have some good examples which I cannot present right now but they will be presented soon :).

Lazy evaluation is definitely a good thing that needs to be implemented, also the cost is direct proportional with the number of logs (annotations) so I think for a low-latency system this can be managed by the owner of the service.

About the post-trace sampling part it is important only if you guys want to support something like dapper and push or pull the data out of the task.

Also kind of an "is_traced" it is useful for very expensive annotations that are useful but not critical for the system as I mentioned in case you want to debug something live.

dkuebric · 2016-01-18T23:26:56Z

LazyLog-style is definitely one way to address this concern, and to the extent that we want to obey the when in Rome directive, it may be more or less idiomatic in various languages. It also may provide some even more interesting post-hoc sampling possibilities in a truly sophisticated tracer-- @bensigelman is that what you are saying?

I would be fine with LazyX, though a general bool guard seems useful as a basic cross-platform primitive and is easy to reason about. The disadvantage of the general bool guard is it may hamstring lazy collection of guarded data. A compromise might be a callback-based system, which would allow for arbitrary code execution and also deferred processing?

Wanted to clarify that I think this is valuable regardless of philosophy on tracing backends--we will need to prove OT instrumentation introduces minimal overhead in the no-op case, and this type of stuff helps ensure that is the case by providing easy safety tools for instrumentation authors.

bhs · 2016-11-16T01:17:32Z

... moved to opentracing/specification#8

dkuebric mentioned this issue Jan 18, 2016

Should GlobalTracer be a no-op tracer by default? #27

Closed

bhs added the not urgent label Feb 10, 2016

bhs mentioned this issue Nov 16, 2016

API for minimizing no-tracer / non-sampled overhead opentracing/specification#8

Open

bhs closed this as completed Nov 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API for minimizing no-tracer / non-sampled overhead #31

API for minimizing no-tracer / non-sampled overhead #31

dkuebric commented Jan 18, 2016

bhs commented Jan 18, 2016

bogdandrutu commented Jan 18, 2016

yurishkuro commented Jan 18, 2016

bhs commented Jan 18, 2016

bogdandrutu commented Jan 18, 2016

dkuebric commented Jan 18, 2016

bhs commented Nov 16, 2016

API for minimizing no-tracer / non-sampled overhead #31

API for minimizing no-tracer / non-sampled overhead #31

Comments

dkuebric commented Jan 18, 2016

bhs commented Jan 18, 2016

bogdandrutu commented Jan 18, 2016

yurishkuro commented Jan 18, 2016

bhs commented Jan 18, 2016

bogdandrutu commented Jan 18, 2016

dkuebric commented Jan 18, 2016

bhs commented Nov 16, 2016