-
Notifications
You must be signed in to change notification settings - Fork 1.1k
bench: replace wall-clock timer with per-process CPU timer #1732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Just some quick comments:
I think there's a reason to have this. Some benchmarks take much longer than others, so it probably makes sense to run fewer iters for these.
I think
Well, okay, that has a history; see #689. It's debatable if it makes sense to avoid floating point math, but as long as it doesn't get in your way here, it's a cool thing to keep it. :D |
It will be useful to split your changes into meaningful and separate commits, see https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#committing-patches. |
I think |
If we're going to rework this, I'd suggest using the stabilized quartiles approach from https://cr.yp.to/papers/rsrst-20250727.pdf:
|
6aff035
to
d456fad
Compare
right now all benchmarks are run with count=10 and fixed iters (apart from ecmult_multi which adjusts the number of iters, not count). therefore |
I disagree with #689. It overcomplicate things for the sake of not having floating point math. those divisions aren't even in the hot path, they're outside the benchmarks. |
Concept NACK on removing any ability to observe variance in timing. The current min/avg/max are far from perfect, but they work fairly well in practice. Improving is welcome, but removing them is a step backwards. |
what is the usefulness of measuring min/max when we are removing OS interference & thermal throttling out of the equation? min/max will be extremely close to the avg no matter how bad the benchmarked function is. |
97e5264
to
254a014
Compare
1d9d6d0
to
4c9a074
Compare
by the way, |
ddeaede
to
71dff3f
Compare
even though the manual says that I added a line in the README.md for best practices to run the benchmarks. I also tried adding a function to pin the process to a core directly in C, but there's no standard POSIX compliant way to do so. There is |
3e43c75
to
ef9e40e
Compare
The point is exactly having a simple way of verifying that there's indeed no interference. Getting rid of sources of variance is hard to get right, and it's impossible to get a perfect solution. (This discussion shows this!) So we better have a way of spotting if something is off. I like the stabilized quartiles idea. |
tbh it scares me a bit, will see what I can do. Maybe in a future PR. |
ef9e40e
to
66745f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to https://www.man7.org/linux/man-pages/man3/clock_gettime.3.html:
Link with
-lrt
(only for glibc versions before 2.17).
I think this check, and adding the -lrt
flag if necessary, should be included in both build systems.
Agreed, good catch! what do you think about something like this? include(CheckFunctionExists)
check_function_exists(clock_gettime HAVE_CLOCK_GETTIME)
if(NOT HAVE_CLOCK_GETTIME)
# On some platforms, clock_gettime requires librt
target_link_libraries(your_target PRIVATE rt)
endif() |
Friendly ping @martinus, benchmark connoisseur :) Could you please give a rough estimate of these changes at the concept level? |
And what's your take on using nanobench instead, even though this is a pure C library instead of C++? |
I've tested 8925b95 by running it alongside Could you please clarify how one would measure or observe the claimed "less influenced by the environment" behaviour? |
a better example would be starting various threads with random usage spikes at random times on the same core the benchmark is running on. basically, if you have no other processes running on that CPU, then this PR won't show any improvement. this PR helps in the more realistic scenario where a user has background processes running, and the scheduler assigns multiple of them on the same CPU that's running the benchmark, putting the thread to sleep, which the wall clock timer doesn't account for. also, this PR is not merely a replacement of wall clock time with CPU time, but it also modernizes the clock function as per the Unix standard. |
Feel free to grab f59c45f from https://github.com/hebasto/secp256k1/commits/pr1732/0925.cmake. UPD. @real-or-random do we need a CI job to build on some old system with old glibc? |
static void print_clock_info(void) { | ||
#if defined(CLOCK_PROCESS_CPUTIME_ID) | ||
printf("INFO: Using per-process CPU timer\n\n"); | ||
#else | ||
printf("WARN: Using wall-clock timer instead of per-process CPU timer.\n\n"); | ||
#endif | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these messages make sense on Windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while on windows there's no option for per-process cpu clock (or at least not high precision), I still think issuing the warning is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a bit of a stretch to call the native Windows QPC framework a "wall-clock timer".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about this?
"WARN: Using global timer instead of per-process CPU timer."
This commit improves the reliability of benchmarks by removing some of the influence of other background running processes. This is achieved by using CPU bound clocks that aren't influenced by interrupts, sleeps, blocked I/O, etc.
This seems overcomplicated and convoluted. but if you manage to simplify it I'll include your commit. |
8925b95
to
cce0147
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach NACK cce0147.
The current implementation breaks compatibility with systems using glibc
versions prior to 2.17:
$ ldd --version | head -1
ldd (Ubuntu EGLIBC 2.15-0ubuntu10.23) 2.15
$ cmake -B build
$ cmake --build build -t bench
<snip>
[100%] Linking C executable ../bin/bench
CMakeFiles/bench.dir/bench.c.o: In function `gettime_us':
/secp256k1/src/bench.h:45: undefined reference to `clock_gettime'
collect2: ld returned 1 exit status
make[3]: *** [bin/bench] Error 1
make[2]: *** [src/CMakeFiles/bench.dir/all] Error 2
make[1]: *** [src/CMakeFiles/bench.dir/rule] Error 2
make: *** [bench] Error 2
Fixed by unconditionally adding
|
find_library(RT_LIBRARY rt) | ||
add_library(optional_rt INTERFACE) | ||
if(RT_LIBRARY) | ||
target_link_libraries(optional_rt INTERFACE rt) | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please extract this logic into a find-module, like FindRT.cmake
? That module should provide an IMPORTED
target with a namespace in the name.
Have a look at the FindIconv.cmake
module as an example. This module also provides an interface library for functionality that may be found in an actual library or built into the C standard library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks compilicated, almost overcomplicated, without much benefits, I won't be able to replicate FindIconv.cmake on my own. but you're welcome to provide a commit and I'll see if it should be cherrypicked.
As per @purpleKarrot suggestion, benchmarks now can be run via ctest as well (as opposed to ./bench_name). This would be the preferable way to run them as it automatically handles CPU pinning and affinity. This means that, if both Otherwise we can evaluate the possibility of using labels for groups of tests. |
ed8a799
to
7df6023
Compare
7df6023
to
6f0e5d4
Compare
Goal
This PR refactors the benchmarking functions as per #1701, in order to make benchmarks more deterministic and less influenced by the environvment.
This is achieved by replacing Wall-Clock Timer with Per-Process CPU Timer when possible.