-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow running with sampling if Intel PT is unavailable #224
Conversation
I'm not sure it should fallback automatically. I expect people will often fail to notice they are using the wrong machine if it does that. |
Yeah that's fair. It does technically warn the user at least though. However if you think it would be better to exit instead, I can change this behavior. |
let process_event index ev = | ||
(* When processing a new snapshot, clear all [Trace_writer] data in order to | ||
avoid sharing callstacks, start times, etc. *) | ||
if index > 0 && not (index = !last_index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pbaumbacher, If you do take a look, here is the logic where I reset the Trace_writer state after the end of a single snapshot (given one snapshot per file).
d380dc7
to
dad6974
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good.
I think it'd also be worth it to add a larger test under tests/
. Maybe a C hello-world, or similar.
core/event.mli
Outdated
} (** Represents an event collected from Intel PT. *) | ||
| Sample of { callstack : Location.t list } | ||
(** Represents event collected through sampling. *) | ||
| Power of { freq : int } (** Power event collected by Intel PT. *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: sort these grouped by sample type, i.e. have Power
next to Trace
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also nest by perf mode, along the lines of Pt (Trace _) | Pt (Power _) | Sample _
since any given pipe will consist only of (Trace
and Power
) or Sample
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does seem like a reasonable option. One thought though, is what if we supported power events which aren't part of Intel PT as well and used the same Power
event? (it seems like you can get the same information through other perf events). And another thought, although we could group them under Pt
, they aren't really being processed or handled together which makes me lean towards leaving them as is since it doesn't seem useful to pattern match only on Pt
events.
2971a60
to
3bbc5e0
Compare
Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>
Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>
7b6c9ea
to
af6b39a
Compare
2af1d16
to
2515dcb
Compare
31abb9a
to
11effd7
Compare
…ble. Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>
11effd7
to
5d32952
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
This feature has two changes related to `callgraph_mode`. * Based on user feedback from janestreet#224, I added warnings when the `-callgraph-mode` defaults to either `Dwarf` or `Last_branch_record` to notify the user of potential drawbacks from such selection. These warnings only show if they do not explicitly set the flag. * Added a fourth option to run with `Last_branch_record { stitched = true }` which calls `perf report --call-graph lbr` and `perf script --stitch-lbr`. There are a few design decisions here though. First I decided to add it as a variant of `Callgraph_mode` for simplicity of interface for the user. However this has the awkward effect of this now being a parameter needed for both decoding and recording (which `Backend_intf` doesn't support). I would think this could perhaps be useful in other scenarios and so instead of pulling this out to `trace.ml` I decided to add a way for `decode_events` to have access to a `Recording.Data.t` record. Open to alternative ideas though. I also added a wiki page to document the options for `-callgraph-mode`. This is not in the PR, but here is the link to the wiki repo: https://github.com/Lamoreauxaj/magic-trace.wiki.git. See [this page](https://github.com/Lamoreauxaj/magic-trace/wiki/Sampling-callgraph-modes). If this PR is merged, this wiki page would need to be merged too (I referenced it in the cli help documentation). And since I referenced it, the domain would need to redirect to the wiki page. Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>
This feature has two changes related to `callgraph_mode`. * Based on user feedback from #224, I added warnings when the `-callgraph-mode` defaults to either `Dwarf` or `Last_branch_record` to notify the user of potential drawbacks from such selection. These warnings only show if they do not explicitly set the flag. * Added a fourth option to run with `Last_branch_record { stitched = true }` which calls `perf report --call-graph lbr` and `perf script --stitch-lbr`. There are a few design decisions here though. First I decided to add it as a variant of `Callgraph_mode` for simplicity of interface for the user. However this has the awkward effect of this now being a parameter needed for both decoding and recording (which `Backend_intf` doesn't support). I would think this could perhaps be useful in other scenarios and so instead of pulling this out to `trace.ml` I decided to add a way for `decode_events` to have access to a `Recording.Data.t` record. Open to alternative ideas though. I also added a wiki page to document the options for `-callgraph-mode`. This is not in the PR, but here is the link to the wiki repo: https://github.com/Lamoreauxaj/magic-trace.wiki.git. See [this page](https://github.com/Lamoreauxaj/magic-trace/wiki/Sampling-callgraph-modes). If this PR is merged, this wiki page would need to be merged too (I referenced it in the cli help documentation). And since I referenced it, the domain would need to redirect to the wiki page. Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>
The goal of this PR is to allow magic-trace to run within an environment which is not on a physical Intel machine. This adds the ability to (if Intel PT is unavailable) to still run magic-trace and have it use perf with sampling instead.
There is now a flag
-sampling
which can enable this and if Intel PT is not present, then this happens automatically. The interface is mostly the same, however now-timer-resolution
can specify asSample { freq }
and-snapshot-size
is ignored (with warning) if passed. The traces will default to 512K it seems (approximately ~30ms it seems with max sampling frequency).Some things to note:
Event.t
type which isSample
and refactored this type to share information with other existing events types. There are processed intrace_writer.ml
in a similar way to before, but just ret / call based on the current state of the callstacks.-fno-omit-frame-pointer
. Dwarf tends to lead to massive perf.data files and slow decoding, so this was the choice I made.Perf_line
fromperf_tool_backend.ml
and there is now aperf_decode.ml
to refactor things a little bit. There is also decoding support which works on the sampling output from perf.--switch-output=signal
however this will have perf outputting multiple timestampedperf.data
files. Thus decoding now can handle multiple files and return multiple pipes. Additionally this makes it easier to reset trace writing state between each snapshot which ends up fixing-multi-snapshot
leads to staircase traces #213 as a result.