-
Notifications
You must be signed in to change notification settings - Fork 95
Sampling mode
If you run magic-trace with the flag -sampling
or when running on a machine which doesn't support Intel PT, magic-trace will instead give a trace collected from sampling callstacks rather than reconstructed from Intel PT events. This does mean many short function calls will be missed and there is much higher overhead.
This feature allows one to use magic-trace on machines which don't support Intel PT, e.g. an AMD machine. Additionally this can be useful for long running traces where one wants less granularity. Most of the configuration works similarly across modes, except -snapshot-size
and -timer-resolution
. -snapshot-size
will now be ignored and magic-trace will always output a trace consisting of 512K of data unless -full-execution
is passed. See Timer resolution configuration for information on -timer-resolution
. There is also a flag -callgraph-mode
flag used to configure how to reconstruct callstacks.
When running magic-trace with the sampling backend (i.e. with -sampling
), a -callgraph-mode
can be passed or will be selected by default.
The options for this argument are:
-
(Last_branch_record (stitched true))
or(Last_branch_record)
orlbr
-
(Last_branch_record (stitched false))
orlbr-no-stitch
-
Dwarf
ordwarf
-
Frame_pointers
orfp
If the user does not select a mode, lbr
will be selected if the user is running on an Intel machine which supports LBR (many recent chips do) and dwarf
will be selected otherwise. These three options correspond to the argument --call-graph
in perf
(see here for more info).
When running with lbr
or lbr-no-stitch
, this will use the last branch record hardware feature from Intel which logs branches to specific MSRs. Generally this supports callstacks of up to 32 entries but differs by architecture. lbr-no-stitch
enables perf
's --stitch-lbr
which can increase callstack sizes around 34%. See this for more info on LBR and this for more info on stitching LBR.
When running with dwarf, this will use the DWARF debugging information. As long as perf
is recent enough to be linked with libunwind
or libdw
this should work. The downside here though is the high overhead from writing the debugging information to perf.data
files and the high overhead during decoding. This means there will be ~20x larger file sizes for same number of samples. And it can take multiple orders of magnitude longer to decode. However using a recent version of perf speeds this up significantly, so we recommend running with the most recent perf available (we had success using 5.17).
When running with fp
, this will use the frame pointers from the binary in order to reconstruct the callstacks. This requires your binary to be compiled with -fno-omit-frame-pointer
(including all libraries linked with). If that is the case, fp
will work well with reasonably similar overhead to lbr
. If you get bogus looking callstacks with fp
, we recommend trying with another option.