-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for magic-trace to sample and show other events such as cache misses and branch misses #234
Conversation
561ed06
to
b6d026f
Compare
b6d026f
to
05b42d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a fair bit of code ends up needing to be moved here, so I'll hold off on reviewing the rest for now :)
core/collection_mode.ml
Outdated
|
||
let arg_type = Command.Arg_type.create of_string | ||
|
||
let to_perf_event_config { config; name } = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stuff in core should be agnostic to backend, so we shouldn't have perf
-specific bits here. These functions should live in src/
(and they shouldn't rely on to_string
from core
files matching up to what perf
expects).
core/collection_mode.ml
Outdated
{ primary; extra_events } | ||
;; | ||
|
||
let to_perf_record_arg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let to_perf_record_arg | |
let to_perf_record_args |
even though we return a 1-item list, it's still a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, this should also live somewhere in src/
.
05b42d8
to
3b0561b
Compare
@Xyene I refactored the perf specific things out of core. My bad for those changes, was trying to organize it a bit, but didn't consider the implications of core being backend agnostic. |
86ce93b
to
2ba73d6
Compare
{ when_to_sample = Period 50; name = Branch_misses; precision = Maximum_possible } | ||
| "cache-misses" -> | ||
{ when_to_sample = Period 50; name = Cache_misses; precision = Maximum_possible } | ||
| str -> t_of_sexp (Sexp.of_string str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Xyene a couple things to think about here. One, what do you think are good defaults for precision and when_to_sample? Two, do you think there should be an alternative to the sexp in order to control when_to_sample / precision?
2ba73d6
to
1695834
Compare
core/collection_mode.ml
Outdated
match use_sampling with | ||
| true -> Stacktrace_sampling | ||
| true -> Primary.Stacktrace_sampling | ||
| false -> | ||
(match Core_unix.access "/sys/bus/event_source/devices/intel_pt" [ `Exists ] with | ||
| Ok () -> Intel_processor_trace | ||
| Error _ -> | ||
Core.printf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want eprintf
here.
core/collection_mode.ml
Outdated
type t = | ||
| Intel_processor_trace | ||
| Stacktrace_sampling | ||
{ primary : Primary.t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, "Primary" is a weird name for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have thoughts on a good name? I was thinking perhaps mode, but seems odd with collection_mode.mode.
core/event.mli
Outdated
(** Represents cycles event collected through sampling. *) | ||
| Event_sample of | ||
{ location : Location.t | ||
; period : int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think period
is quite the right name in this context. count
is a bit more accurate, though it is very much a function of the selected period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think perf does actually call this field period though. I can change it to count though as it feels more accurate.
src/perf_decode.ml
Outdated
| "cycles" -> `Cycles | ||
| "branch-misses" -> `Branch_misses | ||
| "cache-misses" -> `Cache_misses | ||
| _ -> raise_s [%message "Unexpected event type when parsing perf output."] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth printing some extra debug info here as part of this message.
src/perf_decode.ml
Outdated
| results -> | ||
raise_s | ||
[%message "Perf output did not match expected regex." (results : string array)] | ||
[%message | ||
"Perf output did not match expected regex when parsing callstack entry." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Perf output did not match expected regex when parsing callstack entry." | |
"perf output did not match expected regex when parsing callstack entry." |
match lines with | ||
| [] -> | ||
(match Re.Group.all (Re.exec perf_extra_sampled_event_re line) with | ||
| [| _str; _; instruction_pointer; symbol_and_offset |] -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| [| _str; _; instruction_pointer; symbol_and_offset |] -> | |
| [| _; _; instruction_pointer; symbol_and_offset |] -> |
src/perf_decode.ml
Outdated
Ok | ||
{ thread | ||
; time | ||
; data = Event_sample { location; period = Option.value_exn period; name } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is unconditionally going to raise on a None
period, this should be reflected in the type signature (don't take an option).
src/trace_writer.ml
Outdated
@@ -9,7 +9,7 @@ let is_kernel_address addr = Int64.(addr < 0L) | |||
compensate, the trace writer subtracts the time of the first event from all time spans, producing | |||
what we call a "mapped time". Only mapped times may be written to the trace file. *) | |||
module Mapped_time : sig | |||
type t = private Time_ns.Span.t [@@deriving sexp_of, compare] | |||
type t = Time_ns.Span.t [@@deriving sexp_of, compare] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to keep this type private so we don't accidentally pass a regular Time_ns.Span.t
to a function that requires Mapped_time.t
. If you have a place where this is definitely necessary, you can use the :>
operator.
src/trace_writer.ml
Outdated
@@ -111,6 +111,8 @@ module Thread_info = struct | |||
; start_events : (Mapped_time.t * Pending_event.t) Deque.t | |||
(* When the last event arrived. Used to give timestamps to events lacking them. *) | |||
; mutable last_event_time : Mapped_time.t | |||
; trace_pid : int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
; trace_pid : int | |
; trace_pid : Pid.t |
?
0e8a755
to
5e2a9a4
Compare
src/perf_decode.ml
Outdated
(match event with | ||
| `Branches -> | ||
Some (parse_perf_branches_event ?perf_maps thread time remaining_line) | ||
(* cbr (core-to-bus ratio) are events which show frequency changes. *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put these comments inside the blocks rather than above, so they're closer to the logic they describe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
~timer_resolution | ||
~trace_scope | ||
collection_mode | ||
|> Deferred.return | ||
in | ||
let kcore_opts = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside, the kcore mode is not necessary if we're doing sampling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
src/trace_writer.ml
Outdated
; [ "symbol", String (Symbol.display_name location.symbol) ] | ||
; [ "addr", Pointer location.instruction_pointer ] | ||
; [ "count", Int count ] | ||
; Option.map (Event.thread outer_event).pid ~f:(fun pid -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val value_map : 'a t -> default:'b -> f:('a -> 'b) -> 'b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
…t names for classification `perf script` has support to pass the field `event` which actually outputs the type of the event. This actually helps improve classification by not just matching the perf line against regex and instead parsing the name of the event. And this also helps us recognize the new sampled events which we would not be able to differentiate otherwise. This commit just reworks the decoding to support parsing these event names and also the periods that can be outputted. Additionally all the tests are updated to support this. Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>
…through to perf This commit changes `Collection_mode` to also allow passing the `-events` flag. This includes constructing the perf event configs and passing them to perf here. There was also a bit of refactoring where some of the config parsing was moved into `collection_mode.ml` and out of `perf_tool_backend.ml`. Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>
5e2a9a4
to
98fc2f3
Compare
This commit adds new event type to `event.ml` and changes `perf_decode.ml` to have regex for parsing these new events. Additionally tests were added of these new perf outputs. Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>
`trace_writer.ml` now is able to handle writing these new events. It keeps track of the types of events and creates new 'thread' tracks which represent them. Each sample is represented as a zero-duration slice (arrow). Additionally since now events can appear before the actual snapshot took place, this also flushes the trace up to the start time of the snapshot. Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>
The goal of this PR is to allow support to sample other events from perf, e.g. `branch-misses`, `cache-misses`. Although this is a nice feature, the goal is also to already have support for processing these events through sampling, so that when we have PEBS-via-PT available, we can also have these events in the processor trace stream. Now the user can pass `-events` along with a comma separated list of either names like `branch-misses` or sexps which allow the user to control the event period. These will then show up on separate tracks along with the actual trace as arrows representing every sample. I also drafted a [new wiki page](https://github.com/Lamoreauxaj/magic-trace/wiki/Sampling-extra-events) and referenced in the help docs for `-events` which would need to be updated along with this PR. This commit solely adds new tests which include both branch-misses and cache-misses as well as the Intel PT stream or a stacktrace sampled stream. Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>
98fc2f3
to
db27094
Compare
LGTM, thanks! |
The goal of this PR is to allow support to sample other events from perf, e.g.
branch-misses
,cache-misses
. Although this is a nice feature, the goal is also to already have support for processing these events through sampling, so that when we have PEBS-via-PT available, we can also have these events in the processor trace stream.Now the user can pass
![image](https://user-images.githubusercontent.com/15175891/176537263-6c886fbd-8e91-41f8-a8dd-8c7b3741c273.png)
-events
along with a comma separated list of either names likebranch-misses
or sexps which allow the user to control the event period. These will then show up on separate tracks along with the actual trace as arrows representing every sample. For example:I also drafted a new wiki page and referenced in the help docs for
-events
which would need to be updated along with this PR.This feature is split up into a few commits as follows:
Updated perf decoding to process event types and periods and use event names for classification
perf script
has support to pass the fieldevent
which actually outputs the type of the event. This actually helps improve classification by not just matching the perf line against regex and instead parsing the name of the event. And this also helps us recognize the new sampled events which we would not be able to differentiate otherwise. This commit just reworks the decoding to support parsing these event names and also the periods that can be outputted. Additionally all the tests are updated to support this.Updated
Collection_mode
to allow passing additional sampled events through to perfThis commit changes
Collection_mode
to also allow passing the-events
flag. This includes constructing the perf event configs and passing them to perf here. There was also a bit of refactoring where some of the config parsing was moved intocollection_mode.ml
and out ofperf_tool_backend.ml
.Updated
perf_decode.ml
to handle decoding additional sampled eventsThis commit adds new event type to
event.ml
and changesperf_decode.ml
to have regex for parsing these new events. Additionally tests were added of these new perf outputs.Updated
trace_writer.ml
to write new sampled eventstrace_writer.ml
now is able to handle writing these new events. It keeps track of the types of events and creates new 'thread' tracks which represent them. Each sample is represented as a zero-duration slice (arrow). Additionally since now events can appear before the actual snapshot took place, this also flushes the trace up to the start time of the snapshot.Added decoding/trace writing tests for sampling extra events
This commit solely adds new tests which include both branch-misses and cache-misses as well as the Intel PT stream or a stacktrace sampled stream.