Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for magic-trace to sample and show other events such as cache misses and branch misses #234

Merged
merged 5 commits into from
Jul 22, 2022

Conversation

lamoreauxaj
Copy link
Contributor

@lamoreauxaj lamoreauxaj commented Jun 29, 2022

The goal of this PR is to allow support to sample other events from perf, e.g. branch-misses, cache-misses. Although this is a nice feature, the goal is also to already have support for processing these events through sampling, so that when we have PEBS-via-PT available, we can also have these events in the processor trace stream.

Now the user can pass -events along with a comma separated list of either names like branch-misses or sexps which allow the user to control the event period. These will then show up on separate tracks along with the actual trace as arrows representing every sample. For example:
image

I also drafted a new wiki page and referenced in the help docs for -events which would need to be updated along with this PR.

This feature is split up into a few commits as follows:

Updated perf decoding to process event types and periods and use event names for classification

perf script has support to pass the field event which actually outputs the type of the event. This actually helps improve classification by not just matching the perf line against regex and instead parsing the name of the event. And this also helps us recognize the new sampled events which we would not be able to differentiate otherwise. This commit just reworks the decoding to support parsing these event names and also the periods that can be outputted. Additionally all the tests are updated to support this.

Updated Collection_mode to allow passing additional sampled events through to perf

This commit changes Collection_mode to also allow passing the -events flag. This includes constructing the perf event configs and passing them to perf here. There was also a bit of refactoring where some of the config parsing was moved into collection_mode.ml and out of perf_tool_backend.ml.

Updated perf_decode.ml to handle decoding additional sampled events

This commit adds new event type to event.ml and changes perf_decode.ml to have regex for parsing these new events. Additionally tests were added of these new perf outputs.

Updated trace_writer.ml to write new sampled events

trace_writer.ml now is able to handle writing these new events. It keeps track of the types of events and creates new 'thread' tracks which represent them. Each sample is represented as a zero-duration slice (arrow). Additionally since now events can appear before the actual snapshot took place, this also flushes the trace up to the start time of the snapshot.

Added decoding/trace writing tests for sampling extra events

This commit solely adds new tests which include both branch-misses and cache-misses as well as the Intel PT stream or a stacktrace sampled stream.

Copy link
Member

@Xyene Xyene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a fair bit of code ends up needing to be moved here, so I'll hold off on reviewing the rest for now :)


let arg_type = Command.Arg_type.create of_string

let to_perf_event_config { config; name } =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stuff in core should be agnostic to backend, so we shouldn't have perf-specific bits here. These functions should live in src/ (and they shouldn't rely on to_string from core files matching up to what perf expects).

{ primary; extra_events }
;;

let to_perf_record_arg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let to_perf_record_arg
let to_perf_record_args

even though we return a 1-item list, it's still a list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, this should also live somewhere in src/.

@lamoreauxaj
Copy link
Contributor Author

lamoreauxaj commented Jul 5, 2022

@Xyene I refactored the perf specific things out of core. My bad for those changes, was trying to organize it a bit, but didn't consider the implications of core being backend agnostic.

@lamoreauxaj lamoreauxaj force-pushed the pebs_sampling branch 2 times, most recently from 86ce93b to 2ba73d6 Compare July 6, 2022 13:47
{ when_to_sample = Period 50; name = Branch_misses; precision = Maximum_possible }
| "cache-misses" ->
{ when_to_sample = Period 50; name = Cache_misses; precision = Maximum_possible }
| str -> t_of_sexp (Sexp.of_string str)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xyene a couple things to think about here. One, what do you think are good defaults for precision and when_to_sample? Two, do you think there should be an alternative to the sexp in order to control when_to_sample / precision?

match use_sampling with
| true -> Stacktrace_sampling
| true -> Primary.Stacktrace_sampling
| false ->
(match Core_unix.access "/sys/bus/event_source/devices/intel_pt" [ `Exists ] with
| Ok () -> Intel_processor_trace
| Error _ ->
Core.printf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want eprintf here.

type t =
| Intel_processor_trace
| Stacktrace_sampling
{ primary : Primary.t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, "Primary" is a weird name for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have thoughts on a good name? I was thinking perhaps mode, but seems odd with collection_mode.mode.

core/event.mli Outdated
(** Represents cycles event collected through sampling. *)
| Event_sample of
{ location : Location.t
; period : int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think period is quite the right name in this context. count is a bit more accurate, though it is very much a function of the selected period.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think perf does actually call this field period though. I can change it to count though as it feels more accurate.

| "cycles" -> `Cycles
| "branch-misses" -> `Branch_misses
| "cache-misses" -> `Cache_misses
| _ -> raise_s [%message "Unexpected event type when parsing perf output."]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth printing some extra debug info here as part of this message.

| results ->
raise_s
[%message "Perf output did not match expected regex." (results : string array)]
[%message
"Perf output did not match expected regex when parsing callstack entry."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Perf output did not match expected regex when parsing callstack entry."
"perf output did not match expected regex when parsing callstack entry."

match lines with
| [] ->
(match Re.Group.all (Re.exec perf_extra_sampled_event_re line) with
| [| _str; _; instruction_pointer; symbol_and_offset |] ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| [| _str; _; instruction_pointer; symbol_and_offset |] ->
| [| _; _; instruction_pointer; symbol_and_offset |] ->

Ok
{ thread
; time
; data = Event_sample { location; period = Option.value_exn period; name }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is unconditionally going to raise on a None period, this should be reflected in the type signature (don't take an option).

@@ -9,7 +9,7 @@ let is_kernel_address addr = Int64.(addr < 0L)
compensate, the trace writer subtracts the time of the first event from all time spans, producing
what we call a "mapped time". Only mapped times may be written to the trace file. *)
module Mapped_time : sig
type t = private Time_ns.Span.t [@@deriving sexp_of, compare]
type t = Time_ns.Span.t [@@deriving sexp_of, compare]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to keep this type private so we don't accidentally pass a regular Time_ns.Span.t to a function that requires Mapped_time.t. If you have a place where this is definitely necessary, you can use the :> operator.

@@ -111,6 +111,8 @@ module Thread_info = struct
; start_events : (Mapped_time.t * Pending_event.t) Deque.t
(* When the last event arrived. Used to give timestamps to events lacking them. *)
; mutable last_event_time : Mapped_time.t
; trace_pid : int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
; trace_pid : int
; trace_pid : Pid.t

?

@lamoreauxaj lamoreauxaj force-pushed the pebs_sampling branch 2 times, most recently from 0e8a755 to 5e2a9a4 Compare July 6, 2022 21:11
(match event with
| `Branches ->
Some (parse_perf_branches_event ?perf_maps thread time remaining_line)
(* cbr (core-to-bus ratio) are events which show frequency changes. *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put these comments inside the blocks rather than above, so they're closer to the logic they describe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

~timer_resolution
~trace_scope
collection_mode
|> Deferred.return
in
let kcore_opts =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside, the kcore mode is not necessary if we're doing sampling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

; [ "symbol", String (Symbol.display_name location.symbol) ]
; [ "addr", Pointer location.instruction_pointer ]
; [ "count", Int count ]
; Option.map (Event.thread outer_event).pid ~f:(fun pid ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val value_map : 'a t -> default:'b -> f:('a -> 'b) -> 'b

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

…t names for classification

`perf script` has support to pass the field `event` which actually outputs the
type of the event. This actually helps improve classification by not just
matching the perf line against regex and instead parsing the name of the event.
And this also helps us recognize the new sampled events which we would not be
able to differentiate otherwise. This commit just reworks the decoding to
support parsing these event names and also the periods that can be outputted.
Additionally all the tests are updated to support this.

Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>
…through to perf

This commit changes `Collection_mode` to also allow passing the `-events` flag.
This includes constructing the perf event configs and passing them to perf here.
There was also a bit of refactoring where some of the config parsing was moved
into `collection_mode.ml` and out of `perf_tool_backend.ml`.

Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>
src/perf_decode.ml Outdated Show resolved Hide resolved
src/perf_decode.ml Outdated Show resolved Hide resolved
This commit adds new event type to `event.ml` and changes `perf_decode.ml` to
have regex for parsing these new events. Additionally tests were added of these
new perf outputs.

Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>
`trace_writer.ml` now is able to handle writing these new events. It keeps track
of the types of events and creates new 'thread' tracks which represent them.
Each sample is represented as a zero-duration slice (arrow). Additionally since
now events can appear before the actual snapshot took place, this also flushes
the trace up to the start time of the snapshot.

Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>
The goal of this PR is to allow support to sample other events from perf, e.g.
`branch-misses`, `cache-misses`. Although this is a nice feature, the goal is
also to already have support for processing these events through sampling, so
that when we have PEBS-via-PT available, we can also have these events in the
processor trace stream.

Now the user can pass `-events` along with a comma separated list of either
names like `branch-misses` or sexps which allow the user to control the event
period. These will then show up on separate tracks along with the actual trace
as arrows representing every sample.

I also drafted a [new wiki
page](https://github.com/Lamoreauxaj/magic-trace/wiki/Sampling-extra-events) and
referenced in the help docs for `-events` which would need to be updated along
with this PR.

This commit solely adds new tests which include both branch-misses and
cache-misses as well as the Intel PT stream or a stacktrace sampled stream.

Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>
@Xyene Xyene merged commit 73abe48 into janestreet:master Jul 22, 2022
@Xyene
Copy link
Member

Xyene commented Jul 22, 2022

LGTM, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants