-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add benchmarks #2153
Merged
Merged
Add benchmarks #2153
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Awesome, thank you a lot for working on this! One small detail I noticed: |
I fixed the issue, and changed it to use Here's a sample of the instruction counting benchmark output:
|
infogulch
force-pushed
the
benchmark
branch
2 times, most recently
from
November 11, 2023 04:07
b0a7c08
to
4bb6ebe
Compare
infogulch
force-pushed
the
benchmark
branch
3 times, most recently
from
November 11, 2023 18:51
dcbd273
to
ba01a5d
Compare
This latest version is ready I think.
Any comments welcome. Edit: Rebased and fixed the formatting issues. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is an idea for how we can add benchmarks to scryer-prolog. The ultimate goal would be to have an array of benchmarks and publish a report that displays the measurements over time. But first things first...
Benchmarks in CI?
The challenge of running benchmarks on public runners like GitHub Actions is that the variation in wall clock time can be as high as 20% due to environment differences out of your control like noisy vm neighbors. This obliterates the utility of using automated benchmark results to judge whether a proposed change actually helps performance.
So this PR pursues a strategy of measuring the number of instructions executed during the benchmark. Unlike wall time this metric is very stable and sometimes even deterministic. Admittedly the instructions executed metric is only correlated to the wall time, but it's generally a good tradeoff to allow running it in an otherwise noisy automation environment.
This is accomplished by integrating with valgrind, specifically the callgrind api that uses architecture specific features to do precise measurements of the code execution. See iai's Comparison with Criterion-rs for a general overview of this strategy. This PR uses the iai-callgrind library which is an actively maintained fork of the original iai.
Benchmark design
This uses the new library api (#1880) to execute prolog from rust, and executes the
edges.pl
benchmark. There are two benchmark suites,benches/run.rs
that uses rust's built-in benchmarking tool to allow doing wall-time measurements locally, andrun_iai.rs
which does the same using callgrind to track metrics and which runs significantly slower (2s vs 6m). They share some benchmark setup for consistency. I'm pretty undecided whether I like the way this is laid out, any feedback here is welcome.lib_machine.rs is incomplete
Unfortunately, running this benchmark fails (see the report job), and I'm not sure how to fix it. I don't know if this is a bug in the benchmark or in the library api. Help here would be greatly appreciated.I've worked around the error for now by just not showing the problematic variable. In any case it seems that lib_machine.rs is unable to parse all possible values output by the top level.
This pr includes the commit in #2152.
Future?