-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move ProfilerManager Start/Stop routines closer to benchmark #1807 #1818
Conversation
this is a very performance sensitive part of the code. i don't think we should be calling out to unknown services at the top of every loop. maybe we can annotate the branch prediction to expect the pointer to the profile manager to be null so we at least maximise the performance of this section? this might also be an argument for having multiple iterations so that they "drown out" the startup bit of the trace? |
Note that this does not play nice at all with pause/resume, |
I don't understand, can you elaborate? |
|
Actually, there is an important difference. |
An important concern indeed.
Adding branch prediction markers is certainly a good idea. Note that the added test+conditional-branch is done before Also, an important goal is that the trace match the collected performance counters: With an accurate enough simulator we want to get out of it as similar as possible the values reported by the performance counters.
How so? [I'm assuming you're referring to the |
59470fb
to
cd305ea
Compare
that's a very good point. i hadn't spotted that.
yes .. if we allow more iterations for the run under profile management then the traces from inside the performance loop will take up more of the reporting space than the small bit outside the loop. but i think the current PR is fine, given we accept there's a deliberate difference (to @LebedevRI's point) between memory tracing and profiling. |
…oogle#1807 Previously, the Start/Stop routines were called before the benchmark function was called and after it returned. However, what we really want is for them to be called within the core of the benchmark: for (auto _ : state) { // This is what we want traced, not the entire BM_foo function. }
cd305ea
to
db498ec
Compare
Previously, the Start/Stop routines were called before the benchmark function was called and after it returned. However, what we really want is for them to be called within the core of the benchmark:
for (auto _ : state) {
// This is what we want traced, not the entire BM_foo function.
}