Move ProfilerManager Start/Stop routines closer to benchmark #1807 #1818

xdje42 · 2024-07-22T19:43:57Z

Previously, the Start/Stop routines were called before the benchmark function was called and after it returned. However, what we really want is for them to be called within the core of the benchmark:

for (auto _ : state) {
// This is what we want traced, not the entire BM_foo function.
}

dmah42 · 2024-07-24T08:52:47Z

this is a very performance sensitive part of the code. i don't think we should be calling out to unknown services at the top of every loop. maybe we can annotate the branch prediction to expect the pointer to the profile manager to be null so we at least maximise the performance of this section?

this might also be an argument for having multiple iterations so that they "drown out" the startup bit of the trace?

LebedevRI · 2024-07-24T13:30:10Z

this is a very performance sensitive part of the code. i don't think we should be calling out to unknown services at the top of every loop.

Note that this does not play nice at all with pause/resume,
and really only triggers before the whole loop and after the whole loop,
so it's outside of the timed section. It seems safe-ish,
but i guess i don't like that e.g. the memory manager will now measure different thing from this one.

xdje42 · 2024-07-24T15:10:28Z

[...] but i guess i don't like that e.g. the memory manager will now measure different thing from this one.

I don't understand, can you elaborate?

LebedevRI · 2024-07-24T15:33:14Z

[...] but i guess i don't like that e.g. the memory manager will now measure different thing from this one.

I don't understand, can you elaborate?

BenchmarkRunner::RunProfilerManager() started as a carbon copy of BenchmarkRunner::RunMemoryManager().
This PR changes BenchmarkRunner::RunProfilerManager() but not BenchmarkRunner::RunMemoryManager(),
therefore they now measure different code scopes. (memory manager still measures the whole function,
while profile manager only measures the for (auto _ : state) {} loop)

xdje42 · 2024-07-24T23:13:39Z

BenchmarkRunner::RunProfilerManager() started as a carbon copy of BenchmarkRunner::RunMemoryManager().

Actually, there is an important difference. RunMemoryManager calls its Start() before calling b.Setup() whereas
RunProfilerManager calls its Start() after b.Setup(). This is intentional because for the motivating use case (collecting a trace of the benchmark) we do not want to include b.Setup(), and b.Teardown() in the trace.
Also, this is part of the motivation for naming ProfilerManager's Start routine AfterSetupStart: to allow for a day when some user of ProfilerManager wants to have a Start routine called before setup (and after teardown). [eg, maybe some day MemoryManager could be built on top of ProfilerManager - a tall order given breakage concerns, but the added API doesn't preclude it]

xdje42 · 2024-07-24T23:35:09Z

this is a very performance sensitive part of the code.

An important concern indeed.

i don't think we should be calling out to unknown services at the top of every loop. maybe we can annotate the branch prediction to expect the pointer to the profile manager to be null so we at least maximise the performance of this section?

Adding branch prediction markers is certainly a good idea.

Note that the added test+conditional-branch is done before ResumingTiming and after PauseTiming.
Those functions are non-trivial so my thinking is that the impact is minimal for the normal case (profiler_manager == nullptr).

Also, an important goal is that the trace match the collected performance counters: With an accurate enough simulator we want to get out of it as similar as possible the values reported by the performance counters.

this might also be an argument for having multiple iterations so that they "drown out" the startup bit of the trace?

How so? [I'm assuming you're referring to the profiler_manager != nullptr case.]

dmah42 · 2024-07-25T09:54:10Z

this is a very performance sensitive part of the code.

An important concern indeed.

i don't think we should be calling out to unknown services at the top of every loop. maybe we can annotate the branch prediction to expect the pointer to the profile manager to be null so we at least maximise the performance of this section?

Adding branch prediction markers is certainly a good idea.

Note that the added test+conditional-branch is done before ResumingTiming and after PauseTiming. Those functions are non-trivial so my thinking is that the impact is minimal for the normal case (profiler_manager == nullptr).

that's a very good point. i hadn't spotted that.

Also, an important goal is that the trace match the collected performance counters: With an accurate enough simulator we want to get out of it as similar as possible the values reported by the performance counters.

this might also be an argument for having multiple iterations so that they "drown out" the startup bit of the trace?

How so? [I'm assuming you're referring to the profiler_manager != nullptr case.]

yes .. if we allow more iterations for the run under profile management then the traces from inside the performance loop will take up more of the reporting space than the small bit outside the loop.

but i think the current PR is fine, given we accept there's a deliberate difference (to @LebedevRI's point) between memory tracing and profiling.

…oogle#1807 Previously, the Start/Stop routines were called before the benchmark function was called and after it returned. However, what we really want is for them to be called within the core of the benchmark: for (auto _ : state) { // This is what we want traced, not the entire BM_foo function. }

xdje42 marked this pull request as ready for review July 22, 2024 19:44

xdje42 force-pushed the i1807-move-start-stop-calls branch from 59470fb to cd305ea Compare July 24, 2024 23:40

xdje42 force-pushed the i1807-move-start-stop-calls branch from cd305ea to db498ec Compare July 25, 2024 17:56

Merge branch 'google:main' into i1807-move-start-stop-calls

c4e5a43

dmah42 approved these changes Aug 1, 2024

View reviewed changes

dmah42 merged commit ebb5e39 into google:main Aug 1, 2024
80 checks passed

BrewTestBot mentioned this pull request Aug 16, 2024

google-benchmark 1.9.0 Homebrew/homebrew-core#181374

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move ProfilerManager Start/Stop routines closer to benchmark #1807 #1818

Move ProfilerManager Start/Stop routines closer to benchmark #1807 #1818

xdje42 commented Jul 22, 2024

dmah42 commented Jul 24, 2024

LebedevRI commented Jul 24, 2024

xdje42 commented Jul 24, 2024

LebedevRI commented Jul 24, 2024

xdje42 commented Jul 24, 2024

xdje42 commented Jul 24, 2024

dmah42 commented Jul 25, 2024

Move ProfilerManager Start/Stop routines closer to benchmark #1807 #1818

Move ProfilerManager Start/Stop routines closer to benchmark #1807 #1818

Conversation

xdje42 commented Jul 22, 2024

dmah42 commented Jul 24, 2024

LebedevRI commented Jul 24, 2024

xdje42 commented Jul 24, 2024

LebedevRI commented Jul 24, 2024

xdje42 commented Jul 24, 2024

xdje42 commented Jul 24, 2024

dmah42 commented Jul 25, 2024