-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API for minimizing no-tracer / non-sampled overhead #31
Comments
@dankosaur this is an important and rich topic, thanks for bringing it up. As food for thought, I'll link to this thing (which I have mixed feelings about for other reasons, but I'll try to stay on topic!): https://godoc.org/golang.org/x/net/trace#Trace In particular, there are methods like these: // LazyLog adds x to the event log. It will be evaluated each time the
// /debug/requests page is rendered. Any memory referenced by x will be
// pinned until the trace is finished and later discarded.
LazyLog(x fmt.Stringer, sensitive bool) The function signature is not important here, but more the idea that the expensive work (in this case, whatever the |
My vote will be for an always on system. Imagine you have a stuck operation and you try to debug it. Agreed we want to have levels:
But by default should be always on and the cost to add an annotation(log) + tag should be minimal to not affect the overall system performance. |
I can see merits in both points of view. The lazy tagging/logging is definitely the way to go, but it does not completely remove the overhead. Do we know any performance metrics from tracing systems that do post-trace sampling? I.e. are they feasible in really high throughput services? If services over certain qps do require pre-trace sampling, then it would be the argument in favor of having a shortcut check |
@yurishkuro there are some truly high-throughput, low-latency systems at google (e.g., bigtable) that do deferred evaluation of logging data: i.e., it definitely can scale. Of course there's a trade-off, though, and in that case it was an API which involved (subtle) refcounting/pinning. |
In terms of scalability I think it can scale well enough, if a tracing system is < 1% of the total cost in my opinion it is acceptable and for a very low-latency system to be <1% is usually more than 1us (100us per op). So we can definitely do that and we have some good examples which I cannot present right now but they will be presented soon :). Lazy evaluation is definitely a good thing that needs to be implemented, also the cost is direct proportional with the number of logs (annotations) so I think for a low-latency system this can be managed by the owner of the service. About the post-trace sampling part it is important only if you guys want to support something like dapper and push or pull the data out of the task. Also kind of an "is_traced" it is useful for very expensive annotations that are useful but not critical for the system as I mentioned in case you want to debug something live. |
I would be fine with Wanted to clarify that I think this is valuable regardless of philosophy on tracing backends--we will need to prove OT instrumentation introduces minimal overhead in the no-op case, and this type of stuff helps ensure that is the case by providing easy safety tools for instrumentation authors. |
... moved to opentracing/specification#8 |
As discussed in #27, a no-op tracer is a good default for the API, allowing instrumentation to be sitting in a project without most project users being active tracers.
However, if the instrumentation paths will be hit in cases where users have no tracer, or are using a tracer which samples, there should be some way for instrumentation to avoid work that might currently be done "above the API"--i.e. pulling backtraces, formatting data structures, etc.
An example API might be along the lines of a
bool isTracing()
method. Thoughts?The text was updated successfully, but these errors were encountered: