-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase tracing within hcsshim #1280
Conversation
a556667
to
8bd4fe6
Compare
59a8385
to
6d49325
Compare
04678d4
to
d0a82a0
Compare
I havent thought about this in ages so asking a few questions. For spans, don't we inherit the baggage for the trace statements? Is it better to have a param as baggage or as a specific trace? IE: PodId, TaskId, ContainerId, Pid seem like they should be on the span given they help to correlate all data. Or are we wanting them on the trace and use the spanid,traceid and look back to correlate? |
I was toying with that, especially if we put parameters only in the Spans, then debug statements from that same function will have no information if the spans are not printed. I was thinking about modifying I have no idea what the convention is, but the only reason I had for putting params solely in the spans and then correlating them back, was for performance reason, but I have no idea what the performance hit logging/tracing add. |
My 2c, span's should have identifying attributes, added by the method itself as not to assume its caller and be created on all entry and exit scopes (ie: the proto api, or maybe a method call out to a kernel or opengcs we dont control). IE: a method However, the config of the container should not be on the span. It's not identifying attributes. It should be a log statement upon entry to the method about the entry args and actions being taken. However, I get the problem with sampling so this can create its own host of issues. But sometimes this doesnt exactly work. For example, containerd has CRI id -> Task Id -> Container ID -> Container Pid. From the calling span (CRI) we only know exactly one of the id's. From the gcs we know them all if they were forwarded. So correlation of a pid to a cri id is only possible via the span/trace id's. But the lowest level traces its very easy to correlate them. Make sense? |
I think so, yeah. I'll try that out in the next week or so, hopefully. |
064cc68
to
366ed05
Compare
Updated span export to: * include span kind * log if span attributes were dropped * format time as string instead Added `log.S()` to set the log entry stored in the context with provided fields. `log.G()` now checks the context for a stored context. Added `log.Copy()` to add log entry and trace span from source context to destination, allowing for duplicating contexts but not cancellation. Added `log.U()` to update the context an entry (in the context) points to, allowing it to reference the latest span and other information. Added `oc.StartSpan` to set the context log-entries reference to the newly created valaue. Added helper `log.Format*` functions to format Time and other structs to JSON, but only if the logging level is high enough that the information would be logged. Set span kind to client/server, as required, for bridge RPC calls. Updated internal/cmd to use spans within `.Start()` and `.Wait()` Reduced the number of `INFO` level logs printed, downgrading to `DEBUG`. Moved traces and spans from inside `Once.Do()`, so that they are always logged, even if the `Once` does not execute. Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
Removing unnecessary spans in the codebase and replacing with trace logs Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
Portions of hcsshim are un-instrumented in terms of debug logs and spans.
This PR increases the logging and tracing, and ties span generation to logging level.
Several opinions are taken within this PR:
Warning
or higher. However, they are exported at log levelInfo
. These events include:a.
ttrpc
spans of the formcontainerd.task.v2.Task.*
b.
containerd-shim-runhcs-v1
service spans of the formCreate
,State
, etc...waitBackground
) for code ininternal/
, but are only sampled if the logging level is higher thanTrace
.a. Other functions may use a
.Trace(
logging call to denote that a particular function is called, if they do not run long enough or are not deemed important enough to warrant the overhead of creating a span.log.S()
andlog.WithContext()
. They can be extracted withlog.FromContext()
.log.G()
has also been updated to look for an entry stored in the context.a. This allows setting fields that will be present for all logs in the same or derived contexts.
Debug
level, and are used to log what parameters or options are being used, that a particular object was created, or an action was finishedSpanKindClient
orSpanKindServer
, respectively. This is done automatically for TTRPC and gRPC servers.Functions that are called from functions with sampled spans from
containerd-shim-runhcs-v1
explicitly useStartTraceSpan
to force their sampling to be tied to the logging level. Functions that are only called from internal code use the stocktrace.StartSpace
and inherit their sampler.A nuance of this is that if a span is started with the
LoggingLevelSampler
or viaoc.StartTraceSpan
and the logging level is less thanTrace
, it will not be logged even if it registers an error and has a non-zero return status, since the span was disabled and will not reach the exporter stage.Logging level can be passed in by the
LogLevel
setting in the containerd.toml file, so tracing can be enabled there.