Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ACVM witness generation benchmarks for a set of test programs #6186

Closed
TomAFrench opened this issue Oct 1, 2024 · 0 comments · Fixed by #6253
Closed

Add ACVM witness generation benchmarks for a set of test programs #6186

TomAFrench opened this issue Oct 1, 2024 · 0 comments · Fixed by #6253
Assignees

Comments

@TomAFrench
Copy link
Member

It's very important for us to have performant ACVM execution as this is required to prevent witness generation from being a bottleneck for proof generation.

We currently have some minor benchmarking for ACVM execution in this file. This boils down to use essentially timing the CLI command nargo execute --force for 3 of our test functions.

This has a couple problems:

  • nargo execute --force is recompiling the circuit from scratch each time so we're benchmarking much more than just execution speed
  • We're running nargo in a subprocess so we don't get any useful data out of the flamegraph which is generated.

Ideally we would have benchmarks which would:

We can do this for a single Noir program to begin with and then roll it out to a decently sized suite in future.

@github-project-automation github-project-automation bot moved this to 📋 Backlog in Noir Oct 1, 2024
github-merge-queue bot pushed a commit that referenced this issue Oct 10, 2024
# Description

## Problem\*

Resolves #6186 

## Summary\*

* Breaks up the `nargo_cli/benches/criterion.rs` into two steps: 1) a
compilation step which happens only once and 2) the execution of the
already compiled program.
* Limits the code in the benchmark loop to be only the circuit
execution, excluding the parsing of the program and its inputs and the
saving of results as the execute command would do.

These are the timings of just the circuit execution:
```text
eddsa_execute           time:   [111.13 ms 111.19 ms 111.25 ms]
regression_execute      time:   [151.27 µs 151.84 µs 152.66 µs]
struct_execute          time:   [602.39 ns 606.05 ns 609.77 ns]
```

The flame graphs look as follows (you can right click and use the "Save
Linked File As..." to download it and open it in a browser to make it
interactive):
* `eddsa`:
![eddsa](https://github.com/user-attachments/assets/c9f35961-65d9-4ac9-b2a6-1d50d14a9adc)
* `regression`:
![regression](https://github.com/user-attachments/assets/5664ce3a-eb6e-4fe8-a832-0ae539c99881)
* `struct`:
![struct](https://github.com/user-attachments/assets/15ebab47-1d52-4152-8d32-88f124fda525)


To generate them run the following commands:

```shell
./scripts/benchmark_start.sh
cargo bench -p nargo_cli --bench criterion -- --profile-time=30
./scripts/benchmark_stop.sh
```

## Additional Context

The problem with the current `nargo_cli/benches/criterion.rs` was that
it executed the `nargo execute --program-dir ... --force` command as a
sub-process, and the profiler that creates the Flamegraph only sampled
the criterion executor itself, not what the subprocess was doing.

My first idea to get the flame graphs to include the actual execution
was to turn `nargo_cli` into a library _and_ a binary
(af19dfc),
so that we can import the commands into the benchmark and run them in
the same process. This resulted in the following flame graphs:

* `eddsa`:
![eddsa](https://github.com/user-attachments/assets/e214653e-c6e3-4614-b763-b35694eeaec8)
* `regression`:
![regression](https://github.com/user-attachments/assets/ade1ef1a-1a1b-4ca4-9c09-62693551a8b0)
* `struct`:
![struct](https://github.com/user-attachments/assets/72838e1c-7c6b-4a0d-8040-acd335007463)

These include the entire `ExecuteCommand::run` command, which
unfortunately results in the flame graph dominated by parsing logic,
reading inputs and writing outputs to files. In fact in all but the
`eddsa` example the `execute_circuit` was so fast I can't even find it
on the flamegraph.

These are the timings for the command execution per test program:
```text
eddsa_execute           time:   [984.12 ms 988.74 ms 993.95 ms]
regression_execute      time:   [71.240 ms 71.625 ms 71.957 ms]
struct_execute          time:   [68.447 ms 69.414 ms 70.438 ms]
```

For this reason I rolled back the library+binary change and broke up the
execution further by parsing the program into a `CompiledProgram` once
and calling `execute_program` in the benchmark loop without saving the
results.

I copied two utility functions to read the program artefact and the
input map. Not sure these are worth moving around, even the errors they
raise a CLI specific.

## Documentation\*

Check one:
- [ ] No documentation needed.
- [x] Documentation included in this PR.
- [ ] **[For Experimental Features]** Documentation to be submitted in a
separate PR.

# PR Checklist\*

- [x] I have tested the changes locally.
- [ ] I have formatted the changes with [Prettier](https://prettier.io/)
and/or `cargo fmt` on default settings.
@github-project-automation github-project-automation bot moved this from 📋 Backlog to ✅ Done in Noir Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

2 participants