Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarks #2153

Merged
merged 3 commits into from
Nov 12, 2023
Merged

Add benchmarks #2153

merged 3 commits into from
Nov 12, 2023

Conversation

infogulch
Copy link
Contributor

@infogulch infogulch commented Nov 8, 2023

This PR is an idea for how we can add benchmarks to scryer-prolog. The ultimate goal would be to have an array of benchmarks and publish a report that displays the measurements over time. But first things first...

Benchmarks in CI?

The challenge of running benchmarks on public runners like GitHub Actions is that the variation in wall clock time can be as high as 20% due to environment differences out of your control like noisy vm neighbors. This obliterates the utility of using automated benchmark results to judge whether a proposed change actually helps performance.

So this PR pursues a strategy of measuring the number of instructions executed during the benchmark. Unlike wall time this metric is very stable and sometimes even deterministic. Admittedly the instructions executed metric is only correlated to the wall time, but it's generally a good tradeoff to allow running it in an otherwise noisy automation environment.

This is accomplished by integrating with valgrind, specifically the callgrind api that uses architecture specific features to do precise measurements of the code execution. See iai's Comparison with Criterion-rs for a general overview of this strategy. This PR uses the iai-callgrind library which is an actively maintained fork of the original iai.

Benchmark design

This uses the new library api (#1880) to execute prolog from rust, and executes the edges.pl benchmark. There are two benchmark suites, benches/run.rs that uses rust's built-in benchmarking tool to allow doing wall-time measurements locally, and run_iai.rs which does the same using callgrind to track metrics and which runs significantly slower (2s vs 6m). They share some benchmark setup for consistency. I'm pretty undecided whether I like the way this is laid out, any feedback here is welcome.

lib_machine.rs is incomplete

Unfortunately, running this benchmark fails (see the report job), and I'm not sure how to fix it. I don't know if this is a bug in the benchmark or in the library api. Help here would be greatly appreciated.

I've worked around the error for now by just not showing the problematic variable. In any case it seems that lib_machine.rs is unable to parse all possible values output by the top level.

This pr includes the commit in #2152.

Future?

  • Add more benchmarks, see Add benchmarking suite to CI #1782
  • Collect more metrics: avg/max memory, inferences could be another extremely useful metric, user-time (noisy but could be useful for comparisons over a long time frame)
  • Aggregate benchmark results into a separate repo, and publish a report showing changes over time
  • Benchmarking with valgrind also produces profiles, so maybe enable PGO using the previous build's profile?

@infogulch infogulch marked this pull request as draft November 8, 2023 20:31
@triska
Copy link
Contributor

triska commented Nov 8, 2023

Awesome, thank you a lot for working on this!

One small detail I noticed: pairs_keys_values/3 has since become available in library(pairs) which can be used to shorten the code a bit.

@infogulch
Copy link
Contributor Author

infogulch commented Nov 9, 2023

I fixed the issue, and changed it to use library(pairs).

Here's a sample of the instruction counting benchmark output:

 run_iai::bench_scryer::bench_edges
  Instructions:         23835883675
  L1 Hits:              33799880719
  L2 Hits:                391485553
  RAM Hits:                 3449801
  Total read+write:     34194816073
  Estimated Cycles:     35878051519

https://github.com/mthom/scryer-prolog/actions/runs/6806340622/job/18507436925?pr=2153#step:8:272

@infogulch infogulch marked this pull request as ready for review November 9, 2023 03:36
@infogulch infogulch force-pushed the benchmark branch 2 times, most recently from b0a7c08 to 4bb6ebe Compare November 11, 2023 04:07
@infogulch infogulch changed the title Add one benchmark Add benchmarks Nov 11, 2023
@infogulch infogulch marked this pull request as draft November 11, 2023 15:41
@infogulch infogulch force-pushed the benchmark branch 3 times, most recently from dcbd273 to ba01a5d Compare November 11, 2023 18:51
@infogulch
Copy link
Contributor Author

infogulch commented Nov 11, 2023

This latest version is ready I think.

  • Added benches/README.md that explains the design and how to add new benchmarks
  • Removed unnecessary changes to parsed_results.rs
  • Reorganized the benchmark definitions to make it more clear where they are defined and what the harnesses are

Any comments welcome.

Edit: Rebased and fixed the formatting issues.

@infogulch infogulch marked this pull request as ready for review November 11, 2023 19:04
@mthom mthom merged commit 3b7d4a7 into mthom:master Nov 12, 2023
13 checks passed
@infogulch infogulch deleted the benchmark branch November 12, 2023 06:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants