Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch self profile to use HW counters instead of walltime #1647

Merged
merged 3 commits into from
Aug 12, 2023

Conversation

Kobzol
Copy link
Contributor

@Kobzol Kobzol commented Jul 8, 2023

This PR changes the gathered self profile to use instructions as a metric, instead of wall time. There are some problems with this:

  1. This PR simply switches the interpretation of the data to instructions. However, old artifacts will have values stored in the DB as walltime. We could record the metric (instructions/walltime) in the DB, so that we show e.g. the correct units on the detailed query page table, but this does not change the fact that there will be some pairs of artifacts where one side will have walltime, and the other instructions, and these will be basically uncomparable.

  2. Instructions will be added together for all threads, which might be a bit misleading where parallelism is used.

Fixes: #1345

@Kobzol Kobzol requested a review from Mark-Simulacrum July 8, 2023 15:24
Copy link
Member

@Mark-Simulacrum Mark-Simulacrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is broadly reasonable - I'm not too worried about the N commits that happen in the transit time, but maybe we can temporarily add a note: message to the HTML noting that? I don't think we have easy ways of tying that back to the deployed version of rustc-perf (though I seem to recall the information being persisted into the DB), so it would just be unconditional.

Adding together across threads also seems like best we can do and is already done for the general view.

@Kobzol Kobzol force-pushed the self-profile-hw-counters branch from 2fcbe78 to 5367565 Compare July 17, 2023 19:30
@Kobzol Kobzol force-pushed the self-profile-hw-counters branch from 5367565 to 0b72313 Compare July 18, 2023 07:20
@Kobzol Kobzol marked this pull request as ready for review July 18, 2023 07:39
@Kobzol Kobzol force-pushed the self-profile-hw-counters branch from 07fa96c to 47e20b8 Compare July 18, 2023 08:24
@Kobzol
Copy link
Contributor Author

Kobzol commented Jul 18, 2023

I added the note to the detailed query page and added more documentation about this to the code. I also changed the formatting of values in the self profile table.

@Kobzol
Copy link
Contributor Author

Kobzol commented Aug 3, 2023

I tried locally that combining perf stat -e instructions and measureme using instructions doesn't seem to affect the results (I'm not sure if it's OK in general to "nest" PMU counter recording).

@Mark-Simulacrum I don't have any more ideas on how to test this further locally. I guess that we'll just have to merge it and see what happens :) Feel free to do that.

@bjorn3
Copy link
Member

bjorn3 commented Aug 3, 2023

https://perf.wiki.kernel.org/index.php/Tutorial

Furthermore, the perf_events interface allows multiple tools to measure the same thread or CPU at the same time.

@Kobzol
Copy link
Contributor Author

Kobzol commented Aug 3, 2023

Yeah it definitely seems like it's supported, I just wanted to be sure that when I measure the same event (instructions) twice, it won't take two counter slots (to avoid possible multiplexing).

Events are currently managed in round-robin fashion. Therefore each event will eventually get a chance to run. If there are N counters, then up to the first N events on the round-robin list are programmed into the PMU. In certain situations it may be less than that because some events may not be measured together or they compete for the same counter. Furthermore, the perf_events interface allows multiple tools to measure the same thread or CPU at the same time. Each event is added to the same round-robin list. There is no guarantee that all events of a tool are stored sequentially in the list.

It's not clear to me whether this (counter sharing/multiplexing) can happen when two processes measure the exact same event.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use hardware performance counter data for the detailed/self-profile data view
3 participants