When using SNARK tools for practical applications, there exist heuristics to determine which arithmetization is best suited to a certain kind of computation.
For example, it is known that it’s very difficult to implement hash tables in R1CS, due to the difficulty of random accesses. Usually random accesses end up requiring looping over an entire array, since the overhead of implementing a RAM-like solution is too high for reasonably sized circuits. While an implementation of RAM, such as TinyRAM helps asymptotically with R1CS, for a practical RAM-model application, AIR with permutations would be a good fit. However, are there quantifiable properties of computations that can differentiate between different arithmetizations?
Our goal with this project was to answer the following: Can we, in a principled way, categorize which arithmetization is best suited to which kind of computation?
In order to tease out the impact of the arithmetization on various metrics of a proof system, we need to build proof systems which are close to identical in every respect but the arithmetization. What this means is that the cryptographic assumptions and underlying polynomial commitments should be identical for the construction based on each arithmetization. To this end, so far we have built an R1CS-based proof system using the Rust Winterfell library’s existing poly-commit infrastructure. The Winterfell library already came with a highly optimized AIR-based implementation for us to compare with. To test, experimentally, what applications yield better performance on what system we have implemented two applications so far (1) FFTs and (2) Fibonacci sequences, as we discuss below. Ideally, we would also include a theoretical classification and proofs of this classification, for example, "how can this application be quantified in terms of its memory access patterns?" and we would have examples that fit the bill for each classification.
We picked FFT programs to illustrate the idea that the verification time for an AIR proof grows linearly in the number of distinct constraints known as transition constraints. R1CS verification is independent of the structure of the program and found that as expected, optimized R1CS performs better than AIR. We picked the multiplicative Fibonacci program to test if repeated structure in a program provides significant benefits when proving it in an AIR-based prover. So far, we haven’t even been able to run very large instances in R1CS because it runs out of memory.
Note that the R1CS code still has some bugs at certain programs and for certain program sizes it fails to run due to running out of memory. These improvements for our implementation are a work in progress.
- We have implemented Fractal which is a highly optimized, FRI-based R1CS proof system, in Rust, based on the backend from the Winterfell Rust crates. See this branch.
- We have implemented an example FFT program in Winterfell and Winterfell also already has example Fibonacci implementations. This was the AIR examples component of our code.
- Next we optimized examples of Fibonacci and FFT implementations in jsnark.
- We had code to port the .arith and .in files representing R1CS inputs and instances generated by jsnark out (see the Dockerfile in our jsnark fork) and used these as our examples for R1CS.
- Finally, we were able to modify the fields in jsnark to fit the fields needed by the underlying libraries of our R1CS code, parse it in as a result and run benchmarks for different sized FFT and Fibonacci instances.
Here we have provided a blueprint for comparing two kinds of proof systems on two kinds of programs. To add more example programs, we suggest you
- Generate R1CS examples using jsnark (see our implementation for examples of how to effectively extract the requisite files).
- Generate AIR examples using Winterfell's examples crate perhaps in your own fork, import that fork in the
Cargo.toml
and use the examples as we use them here.
Once you have generated the examples themselves, to get them working, you will need to modify the sample_air_benchmarks.rs
and sample_r1cs_benchmarks.rs
files.
Note that we suggest you try to transcribe your algorithms as closely as possible in each framework, in order to get an accurate idea of how the structure of your program impacts the performance of the proof system.
So far, we have the implementation of an iterative FFT computation in both AIR and R1CS and an implementation of multiplicative Fibonacci in both as well.
To benchmark an R1CS program, you have the following options:
- Program type
-p
which can be set tofft
orfib
- Program size
-s
which can be set to a value between 5 and 10. Note that currently some of these values are causing crashes. We recommend trying 5 and 9 to get a sense of how things go. This runs the programs in the following way: if your size isx
, and your program isfib
you will see results for the fibonacci proof for the 2^{x+1} th multiplicative fibonacci number.
Run the following command to see the benchmarks:
cargo run --release --package arithmetization_benchmarks --bin fractal-orchestrator -- -p=fft -s=9
To benchmark AIR programs the commands are similar the program types are fft
or fib
and you can run them without using them as commands exactly. Like so
cargo run --release --package arithmetization_benchmarks --bin stark-orchestrator fft -n=32
Note that in this case, your program size needs to be specified exactly, i.e. if you want to see an FFT of size 32 in the STARK context, you enter 32, as opposed to 5, which you would have entered in the R1CS case.
Note that our current Fractal implementation isn't fully optimized as much as Winterfell, and that explains some of the discrepancies here. Below we provide preliminary example numbers based on a run of these benchmarks on an M1 Macbook Pro with 32GB of RAM.
For R1CS instances, if you would like to check larger instances than the ones provided here, please generate the appropriately renamed .wires
and .arith
files using our jsnark code for the desired sizes of instances.
The current Winterfell codebase is limited in how wide the AIR can be, upper bounded at 511. Further, there is a factor called "blowup factor", which determines the size of the Reed-Solomon encodings, and needs to depend on the instance size. We found this parameter to be a limiting factor so far, in particular for the FFT implementation and plan to modify the implementation to deal with this in the near future.
In our R1CS implementation, for larger instances, we end up running out of memory and the program fail-stops when this happens. Note that we are fairly confident that this is the cause because of the following observations:
- When we optimized the program to remove many instances of cloning of structs, we found that we were able to run larger programs.
- We have observed this behaviour in other SNARK implementations where moving the proof generation to a much larger VM on the cloud has permitted much larger programs to run.
One immediate next step, therefore, is to memory profile the Fractal code and optimize memory usage.
The table below provides equivalent parameters for running FFT programs in R1CS or AIR. For example, running
cargo run --release --package arithmetization_benchmarks --bin fractal-orchestrator -- -p=fft -s=7
proves and verifies the same size of FFT as
cargo run --release --package arithmetization_benchmarks --bin stark-orchestrator fft -n=128
R1CS | 5 | 6 | 7 | 8 | 9 | 10 |
AIR | 32 | 64 | 128 | Not yet supported | Not yet supported | Not yet supported |
On an M1 Macbook Pro with 16GB of RAM, casually running the computations (not using a benchmarking library), we get the following numbers. For an FFT of size 2^7 elements, proof time for R1CS was 285ms and that for AIR was 204ms. The verifier time for the R1CS verifier was 7ms and that for AIR was 5.6ms.
For an FFT of size 2^10, our R1CS prover currently runs in 2724ms and the verifier in 36ms.
We were not able to get the AIR-based implementation to support such a large FFT instance, since it would require committing to a very large number of polynomials. As such, even if we tried to reduce the number of distinct registers, would would need to introduce a corresponding set of selector bits and have more in number as well as more complex transition constraints. We defer further investigation of this intuition for now.
Similarly to the previous subsection, the table below provides equivalent parameters for Fibonacci in R1CS or AIR. For example, running
cargo run --release --package arithmetization_benchmarks --bin fractal-orchestrator -- -p=fib -s=20
proves and verifies the same size of Fibonacci as
cargo run --release --package arithmetization_benchmarks --bin stark-orchestrator fib -n=1048576
that is, both programs prove correct computation of the the 2^21st Fibonacci number.
R1CS | 5 | 6 | 7 | 8 | ... | i | ... | 20 |
AIR | 32 | 64 | 128 | 256 | ... | 2^i | ... | 1048576 |
In the same setting described above for FFT, we get the following preliminary measurements. At the number of iterations 2^20, proof time for R1CS was 217118ms and that for AIR was 8362ms.
The verifier time for the R1CS verifier was 39ms and that for AIR was 2.5ms.
We also have automated benchmarks implemented using Rust's criterion
crate, to be found in ./benches
.
Benchmarking reports appear under target/criteron/report/index.html
.`
The AIR benchmarks have the FFT example for 2^7 implemented, as well as three examples of Fibonacci: 2^15, 2^20 and 2^25. These can be found (and modified) in the ./benches/air_benchmark.rs
file.
The R1CS benchmarks have FFTs for 2^7 and 2^10 implemented as well as Fibonacci examples for 2^15 and 2^20, to compare with the AIR implementation. These benchmarks are in ./benches/r1cs_benchmark.rs
.
To run the AIR benchmarks, use the command cargo bench --bench air_benchmark
.
To run the R1CS benchmarks, correspondingly, run cargo bench --bench r1cs_benchmark
.
See inline comments in the benches
files to see how to add more benchmarks.
One obvious question to ask is, how we know that the numbers generated here are actually comparable across the two arithmetizations. What if our R1CS implementation is, for example, just bad?
Several kinds of optimizations still need to be applied to the winter_fractal
implementation and this work is ongoing.
From a qualitative stand-point, the Winterfell repository has had a lot more effort put into its optimizations. Our R1CS repository, called winter_fractal
which was made public at the initial submission to the Berkeley RDI hackathon, compared to the winterfell repository that his been public and open source for several years. Winterfell has 14 contributers and our repository, winter_fractal
has only 3. Similarly, as of this writing, Winterfell as significatly more commits than the branch of winter_fractal
we are building on. That said, our code now largely avoids any extraneous polynomial or field element evaluations and all but one polynomial interpolation/evaluation operation is done using the same FFT operations used by Winterfell
code. We also batch polynomial commitments for IOP layers as much as possible, and are able to exploit this when querying IOPs later.
Our applications have been highly optimized and are logically equivalent to the algorithm specifications for both AIR and R1CS. A few caveats:
- The application implementations for R1CS have been done in jsnark, as mentioned above and the corresponding files are read into the R1CS code. This means that for larger instances, very large files need to be processed. This uses up a lot of RAM making it hard to run larger experiments and finding a better way would be nice.
- A more difficult goal is to formally verify the implementations of our applications against the algorithm spec, a notoriously hard problem. One tool to do this for circom outputs is Ecne and it might be helpful to try and apply it to the jsnark output used here. There is also some work on formal verification of AIR, but no tool exists which can take a spec and check both implementations. This is a cool open problem.
- It would be good to further confirm our results with more applications, such as hash chains.
Winterfell already has an implementation of an AIR with RAM proving, using a technique called Randomized AIR with Preprocessing (RAPs). We are yet to apply the RAM model to winter_fractal
, such as using TinyRAM. Once this is complete, it would also be good to have examples to illustrate the usefulness of RAM models and when RAM models' efficiency surpasses naive solutions.
Many thanks to Don Beaver for writing the parser from jsnark's R1CS representation to our implementation's R1CS matrices, as well as greately beautifying our code.
We are also very grateful to Bobbin Threadbare for help with implementing the examples for Winterfell as well as general support while building on the Winterfell repository.
Thank you to Andrew Miller for extemely helpful guidance and helping us evolve the ideas for this work.
This project is MIT licensed.