Skip to content

Commit 63ff57d

Browse files
committed
Avoid integer division in the benchmark inner-most loop.
Previously the benchmark code used an integer division (%) with a non-constant in the inner-loop. This is quite slow on many processors, especially ones like ARM that lack a hardware divide. Even on fairly recent x86_64 like haswell an integer division can take something like 100 cycles-- making it comparable to the runtime of siphash. This change avoids the division by using bitmasking instead. This was especially easy since the count was only increased by doubling. This change also restarts the timing when the execution time was very low this avoids mintimes of zero in cases where one execution ends up below the timer resolution. It also reduces the impact of the overhead on the final result. The formatting of the prints is changed to not use scientific notation make it more machine readable (in particular, gnuplot croaks on the non-fixedpoint, and it doesn't sort correctly). This also hoists out all the floating point divisions out of the semi-hot path because it was easy to do so. It might be prudent to break out the critical test into a macro just to guarantee that it gets inlined. It might also make sense to just save out the intermediate counts and times and get the floating point completely out of the timing loop (because e.g. on hardware without a fast hardware FPU like some ARM it will still be slow enough to distort the results). I haven't done either of these in this commit.
1 parent a80de15 commit 63ff57d

File tree

2 files changed

+28
-14
lines changed

2 files changed

+28
-14
lines changed

src/bench/bench.cpp

Lines changed: 24 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
#include "bench.h"
66

77
#include <iostream>
8+
#include <iomanip>
89
#include <sys/time.h>
910

1011
using namespace benchmark;
@@ -25,7 +26,7 @@ BenchRunner::BenchRunner(std::string name, BenchFunction func)
2526
void
2627
BenchRunner::RunAll(double elapsedTimeForOne)
2728
{
28-
std::cout << "Benchmark" << "," << "count" << "," << "min" << "," << "max" << "," << "average" << "\n";
29+
std::cout << "#Benchmark" << "," << "count" << "," << "min" << "," << "max" << "," << "average" << "\n";
2930

3031
for (std::map<std::string,BenchFunction>::iterator it = benchmarks.begin();
3132
it != benchmarks.end(); ++it) {
@@ -38,22 +39,34 @@ BenchRunner::RunAll(double elapsedTimeForOne)
3839

3940
bool State::KeepRunning()
4041
{
42+
if (count & countMask) {
43+
++count;
44+
return true;
45+
}
4146
double now;
4247
if (count == 0) {
43-
beginTime = now = gettimedouble();
48+
lastTime = beginTime = now = gettimedouble();
4449
}
4550
else {
46-
// timeCheckCount is used to avoid calling gettime most of the time,
47-
// so benchmarks that run very quickly get consistent results.
48-
if ((count+1)%timeCheckCount != 0) {
49-
++count;
50-
return true; // keep going
51-
}
5251
now = gettimedouble();
53-
double elapsedOne = (now - lastTime)/timeCheckCount;
52+
double elapsed = now - lastTime;
53+
double elapsedOne = elapsed * countMaskInv;
5454
if (elapsedOne < minTime) minTime = elapsedOne;
5555
if (elapsedOne > maxTime) maxTime = elapsedOne;
56-
if (elapsedOne*timeCheckCount < maxElapsed/16) timeCheckCount *= 2;
56+
if (elapsed*128 < maxElapsed) {
57+
// If the execution was much too fast (1/128th of maxElapsed), increase the count mask by 8x and restart timing.
58+
// The restart avoids including the overhead of this code in the measurement.
59+
countMask = ((countMask<<3)|7) & ((1LL<<60)-1);
60+
countMaskInv = 1./(countMask+1);
61+
count = 0;
62+
minTime = std::numeric_limits<double>::max();
63+
maxTime = std::numeric_limits<double>::min();
64+
return true;
65+
}
66+
if (elapsed*16 < maxElapsed) {
67+
countMask = ((countMask<<1)|1) & ((1LL<<60)-1);
68+
countMaskInv = 1./(countMask+1);
69+
}
5770
}
5871
lastTime = now;
5972
++count;
@@ -64,7 +77,7 @@ bool State::KeepRunning()
6477

6578
// Output results
6679
double average = (now-beginTime)/count;
67-
std::cout << name << "," << count << "," << minTime << "," << maxTime << "," << average << "\n";
80+
std::cout << std::fixed << std::setprecision(15) << name << "," << count << "," << minTime << "," << maxTime << "," << average << "\n";
6881

6982
return false;
7083
}

src/bench/bench.h

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,14 +40,15 @@ namespace benchmark {
4040
std::string name;
4141
double maxElapsed;
4242
double beginTime;
43-
double lastTime, minTime, maxTime;
43+
double lastTime, minTime, maxTime, countMaskInv;
4444
int64_t count;
45-
int64_t timeCheckCount;
45+
int64_t countMask;
4646
public:
4747
State(std::string _name, double _maxElapsed) : name(_name), maxElapsed(_maxElapsed), count(0) {
4848
minTime = std::numeric_limits<double>::max();
4949
maxTime = std::numeric_limits<double>::min();
50-
timeCheckCount = 1;
50+
countMask = 1;
51+
countMaskInv = 1./(countMask + 1);
5152
}
5253
bool KeepRunning();
5354
};

0 commit comments

Comments
 (0)