This library is an efficient GPU implemetation of zkSNARK. It contains source code of the paper cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs published at TCHES 2023.
Note: The contributions of this fork are:
- improved reproducibility and scalability: using Docker for building, reduced memory footprint, tested various GPUs
- introduced profiling of time/energy performance
This library is licensed under the Apache License Version 2.0 and MIT licenses.
To compile and run the code you only need a GPU with NVIDIA drivers installed; the CUDA Toolkit with compiling and runtime tools will be supplied by Docker.
The original experiments were accompolished in the following setup:
- Ubuntu 20.04
- CUDA 11.5
- gcc 7.5.0
- Nvidia V100 (32 GB)
The Docker image provides the necessary software by extending the appropriate NVIDIA CUDA image. Build it with
docker build -t cuzk:dev .
Then, start the container mounting the repo directory:
docker run -d \
-it \
--name cuzk \
--runtime=nvidia \
--mount type=bind,source=$(pwd),target=/home \
--privileged \
cuzk:dev
and enter the container (e.g. with the VS Code plugin or via the command line docker exec -it -w /home cuzk bash
).
Inside the cuZK/test
directory, adjust the compilation scope in the headers of Makefile
:
# cuZK/test/Makefile
all: msmb # limit the compilation scope to selected files
and then run make
(it will take a while!)
root@3ed8c7a4de3e:/home/test# make
...
Finally, run the benchmark (adjust the scope of the script)
./energy_benchmark.sh
...
NOTE: See also more on compute capability in the documentation.
NOTE: The original code reserves too much of RAM. This can be adjusted inside the function
multi_init_params
.
Advanced profiling can be done with NVIDIA Management Library. The querying API should be used around the code piece of interest; the example below measures the energy consumption:
nvmlInit();
unsigned long long energy_start, energy_end, energy_elapsed;
nvmlDeviceGetTotalEnergyConsumption(device, &energy_start);
// code to profile ...
nvmlDeviceGetTotalEnergyConsumption(device, &energy_end);
energy_elapsed = energy_end - energy_start;
nvmlShutdown();
The nvml
library should be included in the source code with #include <nvml.h>
and linked at compilation time with -l nvidia-ml
option.
Performance depends on the clock frequency; the range of allowed frequencies can be checked with nvidia-smi -q -d SUPPORTED_CLOCKS
and the memory/graphics clocks can be adjusted with nvidia-smi -ac $mem,$freq
.
See the script profiling the MSM algorithm under a range of frequencies, and a sample Python script to process results.
Here are results obtained on Tesla V100-SXM2-16GB:
To run a test of an MSM of 2^20
scale and EC points on the BLS12-381 curve, run:
## (It will take some time to run for the first time.)
./msmtestb 20
To run a test of Groth protocol with 2^20
constraint scales and EC points on the BLS12-381 curve, run:
## (It will take some time to run for the first time.)
./testb 20
For EC points on the ALT_BN128 curve and MNT4, run:
## ALT_BN128
./msmtesta 20
./testb 20
## MNT4
./msmtestm 20
./testm 20
In addition, our BLS12-377 carve implementation has a Rust binding with the template from Sppark developed by Supranational LLC. To install the latest version of Rust, first install rustup. Once rustup is installed, install the Rust toolchain by invoking:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup install stable
To test BLS12-377 carve implementation.
cd test/BLS377
cargo bench
Here are a selection of the results tested under NVIDIA V100 GPU card with BLS12-381 curve. More results can be found in the paper cuZK.
For MSM computation,
Scale | Bellperson | cuZK | Speedup |
---|---|---|---|
2^19 | 0.23 s | 0.12 | 2.08x |
2^20 | 0.41 s | 0.19 | 2.18x |
2^21 | 0.73 s | 0.33 | 2.20x |
2^22 | 1.30 s | 0.58 | 2.25x |
2^23 | 2.64 s | 1.15 | 2.29x |
For Groth's protocol,
Scale | Bellperson | cuZK | Speedup |
---|---|---|---|
2^19 | 2.62 s | 0.98 | 2.67x |
2^20 | 4.45 s | 1.68 | 2.65x |
2^21 | 7.96 s | 2.76 | 2.88x |
2^22 | 14.20 s | 5.08 | 2.80x |
2^23 | 29.13 s | 9.91 | 2.94x |