-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Description
Summary
I propose replacing the current gettimeofday()
-based timing measurements in the benchmark suite with CPU cycle counting using hardware performance counters (RDTSC
). This change would significantly improve benchmark reliability, precision, and consistency.
Current Issues with Time-Based Measurements
The current benchmarking implementation in bench.h uses wall-clock time measurements with microsecond precision:
static int64_t gettime_i64(void) {
struct timeval tv;
gettimeofday(&tv, NULL);
return (int64_t)tv.tv_usec + (int64_t)tv.tv_sec * 1000000LL;
}
This approach has several significant limitations:
- Limited Precision: Microsecond resolution is insufficient for modern CPUs where cryptographic operations can complete in hundreds of nanoseconds. Requiring a lot of iterations to get an acceptable result
- System Interference: Wall-clock time is affected by:
- OS scheduler interrupts and context switches
- Other running processes competing for CPU time
- Power management and frequency scaling
- Thermal throttling
- Shared caches between cores
- High Variability: Benchmark results can vary by 50%+ between runs due to system noise
- Non-deterministic: Results depend on system load, making comparisons unreliable
- Overhead: System call overhead affects measurement accuracy
- Lack of comparison: doesn't provide a stable, reliable metric for team-wide performance discussions and comparisons.
Proposed Implementation
Use clocks that don't include the time that the process is paused etc. I propose something like clock_gettime(CLOCK_PROCESS_CPUTIME_ID)
or perf stat -e cpu-clock
Metadata
Metadata
Assignees
Labels
No labels