Add ACVM witness generation benchmarks for a set of test programs #6186

TomAFrench · 2024-10-01T00:21:27Z

It's very important for us to have performant ACVM execution as this is required to prevent witness generation from being a bottleneck for proof generation.

We currently have some minor benchmarking for ACVM execution in this file. This boils down to use essentially timing the CLI command nargo execute --force for 3 of our test functions.

This has a couple problems:

nargo execute --force is recompiling the circuit from scratch each time so we're benchmarking much more than just execution speed
We're running nargo in a subprocess so we don't get any useful data out of the flamegraph which is generated.

Ideally we would have benchmarks which would:

Compile the circuit once and loads it into memory
Runs the circuit using the ProgramExecutor as exposed by the nargo crate rather than relying on the CLI.
Applies https://doc.rust-lang.org/stable/std/hint/fn.black_box.html to the circuit inputs.
Generates a useful flamegraph which can be used to inform future ACVM optimisations.

We can do this for a single Noir program to begin with and then roll it out to a decently sized suite in future.

The text was updated successfully, but these errors were encountered:

# Description ## Problem\* Resolves #6186 ## Summary\* * Breaks up the `nargo_cli/benches/criterion.rs` into two steps: 1) a compilation step which happens only once and 2) the execution of the already compiled program. * Limits the code in the benchmark loop to be only the circuit execution, excluding the parsing of the program and its inputs and the saving of results as the execute command would do. These are the timings of just the circuit execution: ```text eddsa_execute time: [111.13 ms 111.19 ms 111.25 ms] regression_execute time: [151.27 µs 151.84 µs 152.66 µs] struct_execute time: [602.39 ns 606.05 ns 609.77 ns] ``` The flame graphs look as follows (you can right click and use the "Save Linked File As..." to download it and open it in a browser to make it interactive): * `eddsa`: ![eddsa](https://github.com/user-attachments/assets/c9f35961-65d9-4ac9-b2a6-1d50d14a9adc) * `regression`: ![regression](https://github.com/user-attachments/assets/5664ce3a-eb6e-4fe8-a832-0ae539c99881) * `struct`: ![struct](https://github.com/user-attachments/assets/15ebab47-1d52-4152-8d32-88f124fda525) To generate them run the following commands: ```shell ./scripts/benchmark_start.sh cargo bench -p nargo_cli --bench criterion -- --profile-time=30 ./scripts/benchmark_stop.sh ``` ## Additional Context The problem with the current `nargo_cli/benches/criterion.rs` was that it executed the `nargo execute --program-dir ... --force` command as a sub-process, and the profiler that creates the Flamegraph only sampled the criterion executor itself, not what the subprocess was doing. My first idea to get the flame graphs to include the actual execution was to turn `nargo_cli` into a library _and_ a binary (af19dfc), so that we can import the commands into the benchmark and run them in the same process. This resulted in the following flame graphs: * `eddsa`: ![eddsa](https://github.com/user-attachments/assets/e214653e-c6e3-4614-b763-b35694eeaec8) * `regression`: ![regression](https://github.com/user-attachments/assets/ade1ef1a-1a1b-4ca4-9c09-62693551a8b0) * `struct`: ![struct](https://github.com/user-attachments/assets/72838e1c-7c6b-4a0d-8040-acd335007463) These include the entire `ExecuteCommand::run` command, which unfortunately results in the flame graph dominated by parsing logic, reading inputs and writing outputs to files. In fact in all but the `eddsa` example the `execute_circuit` was so fast I can't even find it on the flamegraph. These are the timings for the command execution per test program: ```text eddsa_execute time: [984.12 ms 988.74 ms 993.95 ms] regression_execute time: [71.240 ms 71.625 ms 71.957 ms] struct_execute time: [68.447 ms 69.414 ms 70.438 ms] ``` For this reason I rolled back the library+binary change and broke up the execution further by parsing the program into a `CompiledProgram` once and calling `execute_program` in the benchmark loop without saving the results. I copied two utility functions to read the program artefact and the input map. Not sure these are worth moving around, even the errors they raise a CLI specific. ## Documentation\* Check one: - [ ] No documentation needed. - [x] Documentation included in this PR. - [ ] **[For Experimental Features]** Documentation to be submitted in a separate PR. # PR Checklist\* - [x] I have tested the changes locally. - [ ] I have formatted the changes with [Prettier](https://prettier.io/) and/or `cargo fmt` on default settings.

github-project-automation bot added this to Noir Oct 1, 2024

github-project-automation bot moved this to 📋 Backlog in Noir Oct 1, 2024

TomAFrench assigned aakoshh Oct 1, 2024

aakoshh mentioned this issue Oct 9, 2024

feat(perf): Flamegraphs for test program execution benchmarks #6253

Merged

5 tasks

aakoshh closed this as completed in #6253 Oct 10, 2024

github-project-automation bot moved this from 📋 Backlog to ✅ Done in Noir Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ACVM witness generation benchmarks for a set of test programs #6186

Add ACVM witness generation benchmarks for a set of test programs #6186

TomAFrench commented Oct 1, 2024

Add ACVM witness generation benchmarks for a set of test programs #6186

Add ACVM witness generation benchmarks for a set of test programs #6186

Comments

TomAFrench commented Oct 1, 2024