Allow running with sampling if Intel PT is unavailable #224

lamoreauxaj · 2022-06-15T20:50:25Z

The goal of this PR is to allow magic-trace to run within an environment which is not on a physical Intel machine. This adds the ability to (if Intel PT is unavailable) to still run magic-trace and have it use perf with sampling instead.

There is now a flag -sampling which can enable this and if Intel PT is not present, then this happens automatically. The interface is mostly the same, however now -timer-resolution can specify as Sample { freq } and -snapshot-size is ignored (with warning) if passed. The traces will default to 512K it seems (approximately ~30ms it seems with max sampling frequency).

Some things to note:

I added another Event.t type which is Sample and refactored this type to share information with other existing events types. There are processed in trace_writer.ml in a similar way to before, but just ret / call based on the current state of the callstacks.
Perf gives multiple options for reconstructing the callgraphs which are fp, dwarf, and lbr. LBR is only supported on Intel and we check for support and use it if possible. Otherwise it defaults to fp which will work if the binary and associated libraries were compiled with -fno-omit-frame-pointer. Dwarf tends to lead to massive perf.data files and slow decoding, so this was the choice I made.
I took out Perf_line from perf_tool_backend.ml and there is now a perf_decode.ml to refactor things a little bit. There is also decoding support which works on the sampling output from perf.
In order to take a snapshot, I use --switch-output=signal however this will have perf outputting multiple timestamped perf.data files. Thus decoding now can handle multiple files and return multiple pipes. Additionally this makes it easier to reset trace writing state between each snapshot which ends up fixing -multi-snapshot leads to staircase traces #213 as a result.

lpw25 · 2022-06-16T09:00:00Z

I'm not sure it should fallback automatically. I expect people will often fail to notice they are using the wrong machine if it does that.

lamoreauxaj · 2022-06-16T13:09:08Z

Yeah that's fair. It does technically warn the user at least though. However if you think it would be better to exit instead, I can change this behavior.

lpw25 · 2022-06-16T13:12:40Z

Maybe a warning is enough, I'm not sure. I'll defer to @Xyene and @cgaebel's judgement.

lamoreauxaj · 2022-06-16T16:04:54Z

src/trace.ml

+  let process_event index ev =
+    (* When processing a new snapshot, clear all [Trace_writer] data in order to
+       avoid sharing callstacks, start times, etc. *)
+    if index > 0 && not (index = !last_index)


@pbaumbacher, If you do take a look, here is the logic where I reset the Trace_writer state after the end of a single snapshot (given one snapshot per file).

Xyene

Looking good.

I think it'd also be worth it to add a larger test under tests/. Maybe a C hello-world, or similar.

core/backend_intf.ml

Xyene · 2022-06-17T10:10:25Z

core/event.mli

+          } (** Represents an event collected from Intel PT. *)
+      | Sample of { callstack : Location.t list }
+          (** Represents event collected through sampling. *)
+      | Power of { freq : int } (** Power event collected by Intel PT. *)


Nit: sort these grouped by sample type, i.e. have Power next to Trace.

Could also nest by perf mode, along the lines of Pt (Trace _) | Pt (Power _) | Sample _ since any given pipe will consist only of (Trace and Power) or Sample

That does seem like a reasonable option. One thought though, is what if we supported power events which aren't part of Intel PT as well and used the same Power event? (it seems like you can get the same information through other perf events). And another thought, although we could group them under Pt, they aren't really being processed or handled together which makes me lean towards leaving them as is since it doesn't seem useful to pattern match only on Pt events.

src/perf_capabilities.ml

src/perf_decode.ml

src/perf_tool_backend.ml

src/trace.ml

src/trace_writer.ml

test/perf_script.ml

core/backend_intf.ml

Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>

core/collection_mode.ml

src/perf_tool_backend.ml

src/callgraph_mode.ml

src/perf_tool_backend.ml

src/callgraph_mode.ml

src/trace_writer.ml

…ble. Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>

Xyene

LGTM, thanks!

This feature has two changes related to `callgraph_mode`. * Based on user feedback from janestreet#224, I added warnings when the `-callgraph-mode` defaults to either `Dwarf` or `Last_branch_record` to notify the user of potential drawbacks from such selection. These warnings only show if they do not explicitly set the flag. * Added a fourth option to run with `Last_branch_record { stitched = true }` which calls `perf report --call-graph lbr` and `perf script --stitch-lbr`. There are a few design decisions here though. First I decided to add it as a variant of `Callgraph_mode` for simplicity of interface for the user. However this has the awkward effect of this now being a parameter needed for both decoding and recording (which `Backend_intf` doesn't support). I would think this could perhaps be useful in other scenarios and so instead of pulling this out to `trace.ml` I decided to add a way for `decode_events` to have access to a `Recording.Data.t` record. Open to alternative ideas though. I also added a wiki page to document the options for `-callgraph-mode`. This is not in the PR, but here is the link to the wiki repo: https://github.com/Lamoreauxaj/magic-trace.wiki.git. See [this page](https://github.com/Lamoreauxaj/magic-trace/wiki/Sampling-callgraph-modes). If this PR is merged, this wiki page would need to be merged too (I referenced it in the cli help documentation). And since I referenced it, the domain would need to redirect to the wiki page. Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>

This feature has two changes related to `callgraph_mode`. * Based on user feedback from #224, I added warnings when the `-callgraph-mode` defaults to either `Dwarf` or `Last_branch_record` to notify the user of potential drawbacks from such selection. These warnings only show if they do not explicitly set the flag. * Added a fourth option to run with `Last_branch_record { stitched = true }` which calls `perf report --call-graph lbr` and `perf script --stitch-lbr`. There are a few design decisions here though. First I decided to add it as a variant of `Callgraph_mode` for simplicity of interface for the user. However this has the awkward effect of this now being a parameter needed for both decoding and recording (which `Backend_intf` doesn't support). I would think this could perhaps be useful in other scenarios and so instead of pulling this out to `trace.ml` I decided to add a way for `decode_events` to have access to a `Recording.Data.t` record. Open to alternative ideas though. I also added a wiki page to document the options for `-callgraph-mode`. This is not in the PR, but here is the link to the wiki repo: https://github.com/Lamoreauxaj/magic-trace.wiki.git. See [this page](https://github.com/Lamoreauxaj/magic-trace/wiki/Sampling-callgraph-modes). If this PR is merged, this wiki page would need to be merged too (I referenced it in the cli help documentation). And since I referenced it, the domain would need to redirect to the wiki page. Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>

lamoreauxaj commented Jun 16, 2022

View reviewed changes

lamoreauxaj force-pushed the sampling_backend branch from d380dc7 to dad6974 Compare June 16, 2022 17:18

Xyene reviewed Jun 17, 2022

View reviewed changes

lamoreauxaj force-pushed the sampling_backend branch from 2971a60 to 3bbc5e0 Compare June 17, 2022 20:03

Xyene reviewed Jun 18, 2022

View reviewed changes

core/backend_intf.ml Outdated Show resolved Hide resolved

lamoreauxaj added 2 commits June 21, 2022 10:21

Renamed trace_mode to trace_scope

ba483dc

Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>

Refactored Perf_tool_backend.Perf_line into Perf_decode

d8fc904

Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>

lamoreauxaj force-pushed the sampling_backend branch 5 times, most recently from 7b6c9ea to af6b39a Compare June 22, 2022 14:36

Xyene reviewed Jun 22, 2022

View reviewed changes

lamoreauxaj force-pushed the sampling_backend branch 2 times, most recently from 2af1d16 to 2515dcb Compare June 22, 2022 17:16

Xyene reviewed Jun 23, 2022

View reviewed changes

src/callgraph_mode.ml Outdated Show resolved Hide resolved

Xyene reviewed Jun 23, 2022

View reviewed changes

src/trace_writer.ml Outdated Show resolved Hide resolved

lamoreauxaj force-pushed the sampling_backend branch from 31abb9a to 11effd7 Compare June 23, 2022 17:35

Allow magic-trace with run perf with sampling if Intel PT is unavaila…

5d32952

…ble. Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>

lamoreauxaj force-pushed the sampling_backend branch from 11effd7 to 5d32952 Compare June 23, 2022 20:42

lamoreauxaj mentioned this pull request Jun 23, 2022

Improved speed of hex parsing in more places #227

Merged

Xyene approved these changes Jun 24, 2022

View reviewed changes

Xyene merged commit e614913 into janestreet:master Jun 24, 2022

lamoreauxaj mentioned this pull request Jun 24, 2022

Added warnings to -callgraph-mode and added option for stitched LBR. #228

Merged

Xyene mentioned this pull request Jun 29, 2022

-multi-snapshot leads to staircase traces #213

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow running with sampling if Intel PT is unavailable #224

Allow running with sampling if Intel PT is unavailable #224

lamoreauxaj commented Jun 15, 2022 •

edited

Loading

lpw25 commented Jun 16, 2022

lamoreauxaj commented Jun 16, 2022

lpw25 commented Jun 16, 2022

lamoreauxaj Jun 16, 2022

Xyene left a comment

Xyene Jun 17, 2022

pbaumbacher Jun 17, 2022

lamoreauxaj Jun 17, 2022

Xyene left a comment

Allow running with sampling if Intel PT is unavailable #224

Allow running with sampling if Intel PT is unavailable #224

Conversation

lamoreauxaj commented Jun 15, 2022 • edited Loading

lpw25 commented Jun 16, 2022

lamoreauxaj commented Jun 16, 2022

lpw25 commented Jun 16, 2022

lamoreauxaj Jun 16, 2022

Choose a reason for hiding this comment

Xyene left a comment

Choose a reason for hiding this comment

Xyene Jun 17, 2022

Choose a reason for hiding this comment

pbaumbacher Jun 17, 2022

Choose a reason for hiding this comment

lamoreauxaj Jun 17, 2022

Choose a reason for hiding this comment

Xyene left a comment

Choose a reason for hiding this comment

lamoreauxaj commented Jun 15, 2022 •

edited

Loading