-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Changing finite field arithmetic in wasm to 29 bits for multiplications #5435
Merged
Merged
Changes from 42 commits
Commits
Show all changes
43 commits
Select commit
Hold shift + click to select a range
98f44b9
Started on codegen
Rumata888 4bc5c8b
Something works, but there is obviously some bug
Rumata888 032df47
Parallel build for benchmark
Rumata888 5ebec34
implemented 9-limb version (need to update tests)
Rumata888 89ebc39
wasm parameters and converter
Rumata888 9ea9202
minifix for tests
Rumata888 4fd4939
karatsuba is somewhat faster
Rumata888 14a82c8
add a wasmer option for running benchmarks
Rumata888 5088b75
add a version of cook
Rumata888 b161d6c
add karatsuba that turned out to be useless
Rumata888 daa368a
fixed grumpkin constants
Rumata888 030c35e
fixed bn g1 constants
Rumata888 0f52199
fq2
Rumata888 03a6f9b
Fixed constants
Rumata888 da19235
Precompute modulus
Rumata888 ff74b5a
multiplication x2 speedup
Rumata888 18fa81f
uint multiplication for wasm
Rumata888 d32c561
mul_512
Rumata888 f482d44
reduce lines
Rumata888 064b86a
Remove python files
Rumata888 f1900c6
remove some lines
Rumata888 2aefba4
remove unnecessary reductions of last limb
Rumata888 cf45902
a bit prettier
Rumata888 0d4e57b
define constants
Rumata888 aa7e83e
add small check to detect issues in the future
Rumata888 656a0a1
add comments
Rumata888 73e67a5
comments
Rumata888 ad6e000
use_squares
Rumata888 33a61b4
delta fix
Rumata888 6fe900f
fix
Rumata888 455823c
Late reduce is more efficient
Rumata888 b1c3c24
Merge branch 'master' into my domain
Rumata888 d555fbe
Merge branch 'master' into my domain
Rumata888 baf62db
Some optimisations
Rumata888 5d83ce4
Addressing Mara's comments
Rumata888 25d7c61
Micoroptimisation
Rumata888 f10042f
A bit of docs
Rumata888 5c12b14
wip
Rumata888 ad2849d
More docs
Rumata888 172d371
Merge branch 'master' into my domain
Rumata888 4107791
Merge branch 'master' into my domain
Rumata888 d50dbce
add one small snippet
Rumata888 b0717d8
Address Mara's comment
Rumata888 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
#!/usr/bin/env bash | ||
# This script automates the process of benchmarking WASM on a remote EC2 instance. | ||
# Prerequisites: | ||
# 1. Define the following environment variables: | ||
# - BB_SSH_KEY: SSH key for EC2 instance, e.g., '-i key.pem' | ||
# - BB_SSH_INSTANCE: EC2 instance URL | ||
# - BB_SSH_CPP_PATH: Path to barretenberg/cpp in a cloned repository on the EC2 instance | ||
set -eu | ||
|
||
BENCHMARK=${1:-goblin_bench} | ||
COMMAND=${2:-./$BENCHMARK} | ||
HARDWARE_CONCURRENCY=${HARDWARE_CONCURRENCY:-16} | ||
|
||
# Move above script dir. | ||
cd $(dirname $0)/.. | ||
|
||
# Configure and build. | ||
cmake --preset wasm-threads | ||
cmake --build --preset wasm-threads --parallel --target $BENCHMARK | ||
|
||
source scripts/_benchmark_remote_lock.sh | ||
|
||
cd build-wasm-threads | ||
# ensure folder structure | ||
ssh $BB_SSH_KEY $BB_SSH_INSTANCE "mkdir -p $BB_SSH_CPP_PATH/build-wasm-threads" | ||
# copy build wasm threads | ||
scp $BB_SSH_KEY ./bin/$BENCHMARK $BB_SSH_INSTANCE:$BB_SSH_CPP_PATH/build-wasm-threads | ||
# run wasm benchmarking | ||
ssh $BB_SSH_KEY $BB_SSH_INSTANCE \ | ||
"cd $BB_SSH_CPP_PATH/build-wasm-threads ; /home/ubuntu/.wasmer/bin/wasmer run --dir=$BB_SSH_CPP_PATH --enable-threads --env HARDWARE_CONCURRENCY=$HARDWARE_CONCURRENCY $COMMAND" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -364,6 +364,65 @@ void sequential_copy(State& state) | |
} | ||
} | ||
} | ||
|
||
/** | ||
* @brief Evaluate how much uint256_t multiplication costs (in cache) | ||
* | ||
* @param state | ||
*/ | ||
void uint_multiplication(State& state) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Used to understand how much faster 29-limb version is. In wasmer it is twice faster |
||
{ | ||
numeric::RNG& engine = numeric::get_debug_randomness(); | ||
std::vector<uint256_t> copy_vector(2); | ||
for (size_t j = 0; j < 2; j++) { | ||
copy_vector.emplace_back(engine.get_random_uint256()); | ||
copy_vector.emplace_back(engine.get_random_uint256()); | ||
copy_vector[0] += (1 - copy_vector[0].get_bit(0)); | ||
copy_vector[1] += (1 - copy_vector[1].get_bit(0)); | ||
} | ||
|
||
for (auto _ : state) { | ||
state.PauseTiming(); | ||
size_t num_cycles = 1 << static_cast<size_t>(state.range(0)); | ||
state.ResumeTiming(); | ||
for (size_t i = 0; i < num_cycles; i++) { | ||
copy_vector[i & 1] *= copy_vector[1 - (i & 1)]; | ||
} | ||
} | ||
} | ||
|
||
/** | ||
* @brief Evaluate how much uint256_t extended multiplication costs (in cache) | ||
* | ||
* @param state | ||
*/ | ||
void uint_extended_multiplication(State& state) | ||
{ | ||
numeric::RNG& engine = numeric::get_debug_randomness(); | ||
std::vector<uint256_t> copy_vector(2); | ||
for (size_t j = 0; j < 2; j++) { | ||
copy_vector.emplace_back(engine.get_random_uint256()); | ||
copy_vector.emplace_back(engine.get_random_uint256()); | ||
copy_vector[0] += (1 - copy_vector[0].get_bit(0)); | ||
copy_vector[1] += (1 - copy_vector[1].get_bit(0)); | ||
} | ||
|
||
for (auto _ : state) { | ||
state.PauseTiming(); | ||
size_t num_cycles = 1 << static_cast<size_t>(state.range(0)); | ||
state.ResumeTiming(); | ||
for (size_t i = 0; i < num_cycles; i++) { | ||
auto [r0, r1] = copy_vector[i & 1].mul_extended(copy_vector[1 - (i & 1)]); | ||
state.PauseTiming(); | ||
copy_vector[i & 1] += r0; | ||
copy_vector[1 - (i & 1)] += r1; | ||
copy_vector[0] += (1 - copy_vector[0].get_bit(0)); | ||
copy_vector[1] += (1 - copy_vector[1].get_bit(0)); | ||
state.ResumeTiming(); | ||
} | ||
} | ||
} | ||
|
||
} // namespace | ||
|
||
BENCHMARK(parallel_for_field_element_addition)->Unit(kMicrosecond)->DenseRange(0, MAX_REPETITION_LOG); | ||
|
@@ -380,4 +439,6 @@ BENCHMARK(projective_point_doubling)->Unit(kMicrosecond)->DenseRange(12, 22); | |
BENCHMARK(scalar_multiplication)->Unit(kMicrosecond)->DenseRange(12, 18); | ||
BENCHMARK(cycle_waste)->Unit(kMicrosecond)->DenseRange(20, 30); | ||
BENCHMARK(sequential_copy)->Unit(kMicrosecond)->DenseRange(20, 25); | ||
BENCHMARK(uint_multiplication)->Unit(kMicrosecond)->DenseRange(12, 27); | ||
BENCHMARK(uint_extended_multiplication)->Unit(kMicrosecond)->DenseRange(12, 27); | ||
BENCHMARK_MAIN(); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this to run a different wasm runtime as an alternative source of truth for the speedups