#2080: Fix rate and thread rate counter aggregates #2081

FRosner · 2025-11-27T07:32:06Z

Changes

When using counters that represent a global rate (benchmark::Counter::kIsRate), before this PR, the rate was effectively computed per thread because we pass the sum of all seconds (wall or CPU time) passed across all threads. This breaks the definition of the global rate and subsequently, when using kAvgThreadsRate, the rate is divided by the number of threads (again), yielding non-sense results.

This is a regression introduced by #1836. This PR fixes it by dividing the total seconds count by the number of threads before passing it to the counter finalization, which then computes the rates etc.

We're also fixing the test expectations.

References

google-cla · 2025-11-27T07:32:10Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

dmah42 · 2025-11-27T11:45:19Z

src/counter.cc

+    v /= (cpu_time / num_threads);
  }
  if ((c.flags & Counter::kAvgThreads) != 0) {
    v /= num_threads;


i should know this but i've lost track: can flags be both IsRate and AvgThreads? if so, are we then dividing twice incorrectly?

i should know this but i've lost track: can flags be both IsRate and AvgThreads

IIUC, yes, that's what kAvgThreadsRate will do:

benchmark/include/benchmark/benchmark.h

Line 655 in 2279f2a

kAvgThreadsRate = kIsRate | kAvgThreads,

if so, are we then dividing twice incorrectly?

I think it is correct. For IsRate we are multiplying by the number of threads (note the brackets, a / (b / c) = a / b * c). Then for kAvgThreads we are dividing by the number of threads again a / b * c / c = a / b, so we get what we'd expect for the per-thread average?

But this should just be tested in some unit tests. I need to check where the existing tests are.

LebedevRI

The lossless way to do this would be to introduce kIsThreadInvariant.

FRosner · 2025-11-27T15:43:16Z

The lossless way to do this would be to introduce kIsThreadInvariant.

Let's continue the high level discussion in #2080 (comment), since you left a longer response over there. I don't think introducing a new flag is the way to go here.

src/counter.cc

LebedevRI · 2025-12-08T18:11:56Z

(@dmah42 after merging #2089 the diff will make more sense..)

dmah42 · 2025-12-08T18:59:39Z

merged 2089

Fixes google#2080

LebedevRI · 2025-12-08T20:18:36Z

Well, this does what it claims to.
I think this will be correct for manual/wall-time/thread-time timers,
i'm not sure how ->MeasureProcessCPUTime() iteracts with ->Threads().
Does the semantics change make sense? If so, i think this is it.

FRosner · 2025-12-09T08:03:28Z

i'm not sure how ->MeasureProcessCPUTime() iteracts with ->Threads().
Does the semantics change make sense? If so, i think this is it.

Is that a question for me? I haven't used MeasureProcessCPUTime, so I'd need to take a look. Is it just using a different "clock" to measure the total time?

dmah42 · 2025-12-09T10:11:24Z

agreed, this does what the issue suggested. we still need some documentation in the docs somewhere, and yes please check the ProcessCPUTime also makes sense.

FRosner · 2025-12-09T10:15:37Z

Thank you so much for adding all the tests @LebedevRI and @dmah42 and sorry I didn't get to it earlier. I updated the PR description and will look into the docs and check ProcessCPUTime before marking it as ready for review.

FRosner · 2025-12-09T12:18:41Z

If I understand the docs correctly, MeasureProcessCPUTime affects only the way the number of required iterations is computed, right? I ran a few combinations on 1.9.4 (not this branch) and it seems that the setting has no effect on the counters.

static void BM_ExampleTiming(benchmark::State& state) {
    for (auto _ : state) {
        benchmark::DoNotOptimize(1 + 2);
        std::this_thread::sleep_for(std::chrono::milliseconds(1000));
        state.SetIterationTime(1);
    }
    state.counters["counter"] = benchmark::Counter(1);
    state.counters["counter_rate"] = benchmark::Counter(1, benchmark::Counter::kIsRate);
    state.counters["counter_thread_rate"] = benchmark::Counter(1, benchmark::Counter::kAvgThreadsRate);
}

BENCHMARK(BM_ExampleTiming)
    ->Threads(1)
    ->Threads(10);

BENCHMARK(BM_ExampleTiming)
    ->Threads(1)
    ->Threads(10)
    ->UseManualTime();

BENCHMARK(BM_ExampleTiming)
    ->Threads(1)
    ->Threads(10)
    ->UseRealTime();

BENCHMARK(BM_ExampleTiming)
    ->Threads(1)
    ->Threads(10)
    ->MeasureProcessCPUTime();

BENCHMARK(BM_ExampleTiming)
    ->Threads(1)
    ->Threads(10)
    ->UseManualTime()
    ->MeasureProcessCPUTime();

BENCHMARK(BM_ExampleTiming)
    ->Threads(1)
    ->Threads(10)
    ->UseRealTime()
    ->MeasureProcessCPUTime();

---------------------------------------------------------------------------------------------------------------
Benchmark                                                     Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------
BM_ExampleTiming/threads:1                           1003846688 ns        40700 ns           10 counter=1 counter_rate=2.457k/s counter_thread_rate=2.457k/s
BM_ExampleTiming/threads:10                          1004385959 ns        40400 ns           10 counter=10 counter_rate=24.7525k/s counter_thread_rate=2.47525k/s
BM_ExampleTiming/manual_time/threads:1               1000000000 ns        54000 ns            1 counter=1 counter_rate=1/s counter_thread_rate=1/s
BM_ExampleTiming/manual_time/threads:10              1000000000 ns        44500 ns           10 counter=10 counter_rate=1/s counter_thread_rate=0.1/s
BM_ExampleTiming/real_time/threads:1                 1005048707 ns        43000 ns            1 counter=1 counter_rate=0.994977/s counter_thread_rate=0.994977/s
BM_ExampleTiming/real_time/threads:10                1002049363 ns        24600 ns           10 counter=10 counter_rate=0.997955/s counter_thread_rate=0.0997955/s
BM_ExampleTiming/process_time/threads:1              1002829254 ns        50700 ns           10 counter=1 counter_rate=1.97239k/s counter_thread_rate=1.97239k/s
BM_ExampleTiming/process_time/threads:10             1002060634 ns       310600 ns           10 counter=10 counter_rate=3.21958k/s counter_thread_rate=321.958/s
BM_ExampleTiming/process_time/manual_time/threads:1  1000000000 ns        64000 ns            1 counter=1 counter_rate=1/s counter_thread_rate=1/s
BM_ExampleTiming/process_time/manual_time/threads:10 1000000000 ns       406800 ns           10 counter=10 counter_rate=1/s counter_thread_rate=0.1/s
BM_ExampleTiming/process_time/real_time/threads:1    1003886083 ns        50000 ns            1 counter=1 counter_rate=0.996129/s counter_thread_rate=0.996129/s
BM_ExampleTiming/process_time/real_time/threads:10   1004308770 ns       307700 ns           10 counter=10 counter_rate=0.99571/s counter_thread_rate=0.099571/s

The rates are consistent as long as you don't use CPU time for the rate calculation (which makes sense given my sleep / manual timing). So I think we're good on this front?

FRosner · 2025-12-09T12:23:53Z

I updated the docs in 894fa39. Let me know if you'd like me to add an example or if that's enough :)

LebedevRI · 2025-12-09T12:28:46Z

If I understand the docs correctly, MeasureProcessCPUTime affects only the way the number of required iterations is computed, right?

As the name suggests, it measures the Process CPU Time,
aka the time of all the threads that may have been created
in the function-under-benchmark.
I don't know how it's supposed to interact with ->Threads().

dmah42 · 2025-12-10T18:02:17Z

If I understand the docs correctly, MeasureProcessCPUTime affects only the way the number of required iterations is computed, right?

As the name suggests, it measures the Process CPU Time, aka the time of all the threads that may have been created in the function-under-benchmark. I don't know how it's supposed to interact with ->Threads().

at this point i'm not sure either.

the docs say "// Measure the total CPU consumption, use it to decide for how long to
// run the benchmark loop. This will always measure to no less than the
// time spent by the main thread in single-threaded case."

the difference is (on Linux) between getrusage for Process time and using clock_gettime for Thread time (the default iirc).

i'm afraid i'll need to let you decide how this should correspond to Threads and timing outputs.

FRosner mentioned this pull request Nov 27, 2025

[BUG] Rate counters are per-thread in multi-threaded benchmarks, kAvgThreadsRate does not make sense #2080

Closed

dmah42 reviewed Nov 27, 2025

View reviewed changes

LebedevRI requested changes Nov 27, 2025

View reviewed changes

FRosner changed the title ~~Update counter.cc~~ #2080: Fix rate and thread rate counter aggregates Nov 27, 2025

LebedevRI reviewed Nov 29, 2025

View reviewed changes

src/counter.cc Outdated Show resolved Hide resolved

LebedevRI force-pushed the patch-1 branch from 0fe9d48 to 3cfaa6a Compare December 8, 2025 18:10

LebedevRI requested review from LebedevRI and dmah42 December 8, 2025 18:10

FRosner and others added 2 commits December 8, 2025 22:02

Update counter.cc

021a05f

User counters: normalize time by thread count

2aa23c2

Fixes google#2080

LebedevRI force-pushed the patch-1 branch from 3cfaa6a to 2aa23c2 Compare December 8, 2025 19:03

LebedevRI marked this pull request as ready for review December 8, 2025 20:12

docs

894fa39

LebedevRI approved these changes Dec 10, 2025

View reviewed changes

Merge branch 'main' into patch-1

f5bd3db

LebedevRI merged commit 3e7dac6 into google:main Dec 10, 2025
135 of 136 checks passed

#2080: Fix rate and thread rate counter aggregates #2081

#2080: Fix rate and thread rate counter aggregates #2081

Conversation

FRosner commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

References

Uh oh!

google-cla bot commented Nov 27, 2025

Uh oh!

dmah42 Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

FRosner Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

LebedevRI left a comment

Choose a reason for hiding this comment

Uh oh!

FRosner commented Nov 27, 2025

Uh oh!

Uh oh!

LebedevRI commented Dec 8, 2025

Uh oh!

dmah42 commented Dec 8, 2025

Uh oh!

LebedevRI commented Dec 8, 2025

Uh oh!

FRosner commented Dec 9, 2025

Uh oh!

dmah42 commented Dec 9, 2025

Uh oh!

FRosner commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FRosner commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FRosner commented Dec 9, 2025

Uh oh!

LebedevRI commented Dec 9, 2025

Uh oh!

dmah42 commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

FRosner commented Nov 27, 2025 •

edited

Loading

FRosner commented Dec 9, 2025 •

edited

Loading

FRosner commented Dec 9, 2025 •

edited

Loading