-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark run time baseline estimation #78
Conversation
Marking this as a draft for now as it doesn't contain an attempt to include (subtract) the baseline from the actual benchmark results
Let me know what you think. |
Subtracting the average no-op time from the average benchmark time sounds reasonable. |
I've just added a demo implementation. Obtaining the baseline timing is a very simple const baseline: u64 = t.read();
t.reset(); It is not subtracted from the results, there is no active baseline correction yet since I wanted to check first if this implementation works in principal. In the demo, the baseline timings are appended to the results. If you run the baseline example, you can observe for example
Some observations:
|
just added a simple implementation of the baseline correction; looks still reasonable to me (very second run has the correction active):
To be fair, the |
Should we compare this feature with all build options? I think you mentioned that you tested this in debug mode. |
added a checkbox :) Also, I'll see how this all looks on Windows this evening (probably). This has been an interesting deep-dive into system clocks already! |
Comparing different optimization modes, it looks all fine to me on Linux. I've put the results here since this would be a bit clunky to post as a comment. For Windows, results are here. The baseline correction doesn't look very effective, TBH. All in all, I think on Linux this can actually help to improve accuracy for benchmarking functions that run very fast, in the nanosecond range. @hendriknielaender could you test this on Mac? As for Windows, the 100ns granularity of the "performance counter" is giving trouble; potentially, subtracting the average baseline time from the results afterwards instead of subtracting each no-op time individually in |
Sure I will test this on mac 👍 For the windows case, we could implement a generic experimental flag. This would allow us to enable the feature on linux and macos while excluding it on windows until the necessary refactoring is complete. In the long run I think we should have a stable baseline calculation, which is per default subtracted. |
Here's an idea how we could do this in one go. It might not be that complicated. My idea would be to keep the baseline correction feature a simple on/off setting in the benchmark config. It should be up to the user to experiment with it, independent of the platform. The changes I imagine for this, based on the current state of this branch:
|
Here are the mac baseline results they were executed with an old 2016 macbook on macOS 12.7. I could also add another test with a newer macbook pro, but earliest 3 of june. |
By the way, the |
You are right, I think we need to move the timer instantiation out of the |
@FObersteiner how we wanna continue with this PR? I really like the feature, to have a more precise benchmark. |
To be honest, I didn't reach a point where I thought this is ready to ship. Determining the no-op time turned out to be very hard; results are highly variable. And that is even true if you use the RDTSC register of the CPU directly. Here's some code that does this: https://codeberg.org/FObersteiner/benchmarks/src/branch/main/src/bench_noop_rdtsc.zig - While this was a nice deep-dive into computer clocks, I find that micro-benchmarking is really hard. Don't trust a benchmarking library that claims to do "micro-benchmarks" and just uses the standard means ^^ So for "normal" benchmarking, I think it is important to be aware that there is an uncertainty of, say 15-20 nanoseconds. Maybe we can clarify that in the readme. At the moment, I don't see a reliable "baseline correction", one that actually improves precision (or accuracy). |
Good point! Then i will close this, and then lets add this info to the readme. |
addressing #77