-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parallelize
workload imbalance
#186
Fix parallelize
workload imbalance
#186
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think chunks_mut
or chunks
take input chunk_size
instead of number of chunks (try this rust playground for double check), so I think it wouldn't incur load imbalance, tho it might cause us to call more spawn
than we expected (ideally we'd want to call spawn
as many times as number of threads, right?).
Also I think when the size of input is large enough this PR would be almost same as before, but the approach in this PR indeed makes it closer to what we expected, so I think we can update the comments and remove the unsafe
then merge it.
A version without unsafe
might looks like:
pub fn parallelize<T: Send, F: Fn(&mut [T], usize) + Send + Sync + Clone>(v: &mut [T], f: F) {
// Comments...
let len = v.len();
let num_threads = multicore::current_num_threads();
let chunk_size_lo = len / num_threads;
let chunk_size_hi = chunk_size_lo + 1;
let len_hi = (len % num_threads) * chunk_size_hi;
let (v_hi, v_lo) = v.split_at_mut(len_hi);
let f = &f;
multicore::scope(|scope| {
if chunk_size_hi > 0 {
for (idx, v) in v_hi.chunks_exact_mut(chunk_size_hi).enumerate() {
scope.spawn(move |_| f(v, idx * chunk_size_hi));
}
}
if chunk_size_lo > 0 {
for (idx, v) in v_lo.chunks_exact_mut(chunk_size_lo).enumerate() {
scope.spawn(move |_| f(v, len_hi + idx * chunk_size_lo));
}
}
});
}
Oh indeed good catch. In that case for that example, one thread will do 6 iterations, 1 thread will do a single iteration, and the rust will do 3. So we can expect a 33% speedup compared to optimal (threads to either 4 or 3 iterations), there is still a noticeable load imbalance.
For example if we have 2²⁷ = 134M items (size of the trusted setup of Filecoin which AFAIK is the largest), a 112-way split (Intel Xeon Platinum top-end from 2019) would be 1198372*112 + 64 fn main() {
let v = [(); 1 << 27];
let num_threads = 112;
let chunk_size = v.len() / num_threads;
println!("len: {}", v.chunks(chunk_size).count());
println!("chunks: {:?}", v.chunks(chunk_size).map(<[_]>::len).collect::<Vec<_>>());
} So indeed a large input will render these negligeable, however
This is the difficult question and the answer is "it depends". Ideally, you want exactly the same number of spawns as you have threads, this maximize the time making progress and minimize overhead spent in the threadpool (principle behind OpenMP static scheduling). However, in practice, there are other stuff running on the machine, networking, other programs, maybe nested parallelism, unbalanced workload for example you parallelize drawing a scene but part of it is a white wall and part of it has someone with curly hair, so some threads may finish way earlier than others. In that case you want way more tasks than threads (fine-grained parallelism) so that if one thread finishes earlier, they find something to do. Intel TBB uses a heuristic magic factor 4 (which is perfect for some workload and is totally inadequate for many workloads but that's another issue). Thing is currently, we neither have exact number of spawns, or 4x more spawns than threads. We're in the always less efficient valley in-between.
Ah indeed, didn't think of that, updating the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for sharing so much about parallelization, learned a lot!
* - Implements `PartialOrd` for `Value<F>` - Adds a `transpose` method to turn `Value<Result<_>>` into `Result<Value<_>>` - `Expression::identifier()` remove string memory reallocation * Fix MockProver `assert_verify` panic errors (privacy-scaling-explorations#118) * fix: Support dynamic lookups in `MockProver::assert_verify` Since lookups can only be `Fixed` in Halo2-upstream, we need to add custom suport for the error rendering of dynamic lookups which doesn't come by default when we rebase to upstream. This means that now we have to print not only `AdviceQuery` results to render the `Expression` that is being looked up. But also support `Instance`, `Advice`, `Challenge` or any other expression types that are avaliable. This addresses the rendering issue, renaming also the `table_columns` variable for `lookup_columns` as the columns do not have the type `TableColumn` by default as opposite to what happens upstream. * fix: Don't error and emit empty String for Empty queries * feat: Add `assert_sarisfied_par` fn to `MockProver` * fix: Address clippy errors * chore: Address review comments * chore: Fix clippy lints Resolves: privacy-scaling-explorations#116 * Remove partial ordering for value * Remove transpose * Parallelize SHPLONK multi-open prover (privacy-scaling-explorations#114) * feat: parallelize (cpu) shplonk prover * shplonk: improve `construct_intermediate_sets` using `BTreeSet` and `BTreeMap` more aggressively * shplonk: add `Send` and `Sync` to `Query` trait for more parallelization * fix: ensure the order of the collection of rotation sets is independent of the values of the opening points Co-authored-by: Jonathan Wang <jonathanpwang@users.noreply.github.com> * fix: FailureLocation::find empty-region handling (privacy-scaling-explorations#121) After working on fixing privacy-scaling-explorations/zkevm-circuits#1024, a bug was found in the verification fn of the MockProver which implies that while finding a FailureLocation, if a Region doesn't contain any rows. This is fixed by introducing a 2-line solution suggested by @lispc. Resolves: privacy-scaling-explorations#117 * Feature: Expose Fixed columns & Assembly permutation structs in MockProver instance (privacy-scaling-explorations#123) * feat: Expose fixed columns in MockProver * change: Make `Assembly` object public & add getters * chore: Address leftover TODOs * Feature to serialize/deserialize KZG params, verifying key, and proving key into uncompressed Montgomery form (privacy-scaling-explorations#111) * feat: read `VerifyingKey` and `ProvingKey` does not require `params` as long as we serialize `params.k()` * feat: add features "serde-raw" and "raw-unchecked" to serialize/deserialize KZG params, verifying key, and proving key directly into raw bytes in internal memory format. So field elements are stored in Montgomery form `a * R (mod p)` and curve points are stored without compression. * chore: switch to halo2curves 0.3.1 tag * feat: add enum `SerdeFormat` for user to select serialization/deserialization format of curve and field elements Co-authored-by: Jonathan Wang <jonathanpwang@users.noreply.github.com> * Add support for Column annotations for MockProver debugging (privacy-scaling-explorations#109) * feat: Add `name_column` to `Layouter` & `RegionLayouter` This adds the trait-associated function `name_column` in order to enable the possibility of the Layouter to store annotations aobut the colums. This function does nothing for all the trait implementors (V1, SimpleFloor, Assembly....) except for the `MockProver`. Which is responsible of storing a map that links within a `Region` index, the `column::Metadata` to the annotation `String`. * feta: Update metadata/dbg structs to hold Col->Ann mapping * feat: Update emitter module to print Column annotations * feat: Add lookup column annotations This adds the fn `annotate_lookup_column` for `ConstraintSystem` which allows to carry annotations for the lookup columns declared for a circuit within a CS. * feat: Add Lookup TableColumn annotations This allows to annotate lookup `TableColumn`s and print it's annotation within the `assert_satisfied` fn. This has required to change the `ConstraintSystem::lookup_annotations` to have keys as `metadata::Column` rather than `usize` as otherwise it's impossible within the `emitter` scope to distinguish between regular advice columns (local to the Region) and fixed columns which come from `TableColumn`s. * fix: Customly derive PartialEq for metadata::Region This allows to ignore the annotation map of the metadata::Region so that is easier to match against `VerifyFailure` errors in tests. * fix: Update ConstraintNotSatisfied testcase * fix: Update Debug & Display for VerifyFailure It was necessary to improve the `prover.verify` output also. To do so, this required auxiliary types which are obfuscated to any other part of the lib but that are necessary in order to be able to inject the Column names inside of the `Column` section itself. This also required to re-implement manually `Debug` and `Display` for this enum. This closes zcash#705 * fix: Address clippy & warnings * fix: Add final comments & polish * fix: Resolve cherry-pick merge conflics & errors * chore: Change DebugColumn visibility * chore: Allow to fetch annotations from metadata * chore: Fix clippy lints * chore: Remove comments from code for testing * feat: Add support for Advice and Instance anns in lookups * feat: Allow `V1` layouter to annotate columns too * fix: Support `Constant` & `Selector` for lookup exprs * chore: Address review comments * chore: Propagete write! result in `VerifyFailure::Display` * chore: Address clippy lints * chore: Move Codecov, wasm-build, Bitrot & doc-tests to push (privacy-scaling-explorations#125) * chore: Move Codecov, wasm-build, Bitrot & doc-tests to push This should cut down significantly the CI times on every push done to a branch for a PR. Resolves: privacy-scaling-explorations#124 * chore: Add back `push` on CI checks * fix: Allow to compare `Assembly` structs (privacy-scaling-explorations#126) This was missing in privacy-scaling-explorations#123 so this PR fixes it. * Add keccak256 hasher for transcript (#2) * Add keccak256 hasher for transcript * Fix keccak256 common point prefix * Remove unnecessary hasher_* variables * fix: transcript instantiation in poseidon benchmark loop (privacy-scaling-explorations#128) * Improve performance of vk & pk keygen and of default `parallelize` chunking size (privacy-scaling-explorations#127) * Squashed commit of the following: commit 17e3c4e Author: Mickey <hedgefund996@gmail.com> Date: Fri Jul 15 11:10:32 2022 +0800 speed up generate vk pk with multi-thread * fix * Improve performance of vk & pk keygen and of default `parallelize` chunking size. Reduces proving time on large circuits consistently >=3%. Builts upon [speed up generate vk pk with multi-thread](privacy-scaling-explorations#88) Fixes: privacy-scaling-explorations#83 * fix: Force `VerifyFailure` to own the annotations map (privacy-scaling-explorations#131) * fix: Force `VerifyFailure` to own the annotations map Since otherwise we can't move the `VerifyFailure` vec's confortably, and also, we're required to have a lot of lifetime annotations, it was decided to force the `VerifyFailure` to own the Annotation maps. This shouldn't be too harmful as it only triggers when testing. Resolves: privacy-scaling-explorations#130 * chore: Address clippy lints * feat: call synthesize in `MockProver` multiple times to behave same as real prover * feat: check advice assignment consistency between different phases * fix: Support annotations for CellNotAssigned in verify_par (privacy-scaling-explorations#138) * feat: Add `assert_satisfied_at_rows_par` variant (privacy-scaling-explorations#139) Resolves: privacy-scaling-explorations#133 * Expose mod `permutation` and re-export `permutation::keygen::Assembly` (privacy-scaling-explorations#149) * feat: expose mod ule `permutation` and re-export `permutation::keygen::Assembly` * feat: derive `lone` for `permutation::keygen::Assembly` * feat: bump MSRV for `inferno` * feat(MockProver): replace errors by asserts In MockProver, replace all code that returns an error by an assert that panics instead of returning the error. This change aims to make it easier to debug circuit code bugs by getting backtraces. * MockProver test utililities (privacy-scaling-explorations#153) * test/unwrap_value: escape Value safety in the dev module * test/mock-prover-values: MockProver exposes the generated columns to tests * test/mock-prover-values: doc * mockprover-util: remove unwrap_value --------- Co-authored-by: Aurélien Nicolas <info@nau.re> * feat: Parallel random blinder poly impl (privacy-scaling-explorations#152) * feat: Parallelize `commit` blinder poly generator method Solves the concerns raised in privacy-scaling-explorations#151 related to the performance of the random poly generator inside of `commit`. Resolves: privacy-scaling-explorations#151 * chore: add `from_evals` for Polynomial * chore: add benches for commit_zk serial vs par * fix: Correct thread_seeds iter size * fix: Clippy * chore: apply review suggestions * fix: Inconsisten num of Scalars generated parallely This fix from @ed255 fixes an error on the code proposal which was rounding the num of Scalars to be generated and so, was producing failures. Co-authored-by: Edu <eduardsanou@posteo.net> * remove: legacy comments & code --------- Co-authored-by: Edu <eduardsanou@posteo.net> * change: Migrate workspace to pasta_curves-0.5 (privacy-scaling-explorations#157) * change: Migrate workspace to pasta_curves-0.5 This ports the majority of the workspace to the `pasta_curves-0.5.0` leaving some tricky edge-cases that we need to handle carefully. Resolves: privacy-scaling-explorations#132 * fix: Complete latest trait bounds to compile halo2proofs * change: Migrate examples & benches to pasta 0.5 * change: Migrate halo2_gadgets to pasta-0.5 * change: Update gadgets outdated code with latest upstream * fix: Sha3 gadget circuit * fix: doc tests * chore: Update merged main * fix: Apply review suggestions * fix: pin `halo2curves` version to `0.3.2` * Extend Circuit trait to take parameters in config (privacy-scaling-explorations#168) * Extend Circuit trait to take parameters in config The Circuit trait is extended with the following: ``` pub trait Circuit<F: Field> { /// [...] type Params: Default; fn params(&self) -> Self::Params { Self::Params::default() } fn configure_with_params(meta: &mut ConstraintSystem<F>, params: &Self::Params) -> Self::Config { Self::configure(meta) } fn configure(meta: &mut ConstraintSystem<F>) -> Self::Config; } ``` This allows runtime parametrization of the circuit configuration. The extension to the Circuit trait has been designed to minimize the breaking change: existing circuits only need to define the associated `type Params`. Unfortunately "Associated type defaults" are unstable in Rust, otherwise this would be a non-breaking change. See rust-lang/rust#29661 * Implement circuit params under feature flag * Don't overwrite configure method * Fix doc test * Allow halo2 constraint names to have non static names (privacy-scaling-explorations#156) * static ref to String type in Gates, Constraints, VirtualCell, Argument * 'lookup'.to_string() * return &str for gate name and constriant_name, also run fmt * Update halo2_gadgets/Cargo.toml Co-authored-by: Han <tinghan0110@gmail.com> * upgrade rust-toochain --------- Co-authored-by: Carlos Pérez <37264926+CPerezz@users.noreply.github.com> Co-authored-by: Han <tinghan0110@gmail.com> * Improve halo2 query calls (privacy-scaling-explorations#154) * return expression from cell * add example * selector * recurse Expression to fill in index * minimized changes from the original * backword compatible meta.query_X & challange.expr() * cargo fmt * fixed lookup to pass all tests * Update comments Co-authored-by: Brecht Devos <Brechtp.Devos@gmail.com> * Update comments Co-authored-by: Brecht Devos <Brechtp.Devos@gmail.com> * Update comments Co-authored-by: Brecht Devos <Brechtp.Devos@gmail.com> * Update comments Co-authored-by: Brecht Devos <Brechtp.Devos@gmail.com> * Update comments Co-authored-by: Brecht Devos <Brechtp.Devos@gmail.com> * Update comments Co-authored-by: Brecht Devos <Brechtp.Devos@gmail.com> * update Co-authored-by: Brecht Devos <Brechtp.Devos@gmail.com> * add primitives.rs back * remove example2 * backward compatible meta.query_X & Column.cur(), next(), prev(), at(usize) * impl Debug & make side effects only when query.index.is_none() * change impl Debug for Expression instead & revert test in plonk_api * upgrade rust-toolchain * Update halo2_proofs/src/plonk/circuit.rs Co-authored-by: Han <tinghan0110@gmail.com> * Update halo2_proofs/src/plonk/circuit.rs Co-authored-by: Han <tinghan0110@gmail.com> * ran clippy * Update halo2_proofs/src/plonk/circuit.rs Co-authored-by: Han <tinghan0110@gmail.com> --------- Co-authored-by: Brecht Devos <Brechtp.Devos@gmail.com> Co-authored-by: Han <tinghan0110@gmail.com> * fix: compute `num_chunks` more precisely (privacy-scaling-explorations#172) * Implement Clone trait for Hash, Absorbing, and Sponge structs (privacy-scaling-explorations#171) * Revert double-assignment mock prover check Revert the check introduced in privacy-scaling-explorations#129 to detect double assignments with different values, because it breaks some tests in the zkevm project. There's a legitimate use case of double assignment with different values, which is overwriting cells in order to perform negative tests (tests with bad witness that should not pass the constraints). Also in the EVM Circuit from the zkevm project we "abuse" the assignment of cells as a cache: sometimes we assign some cells with a guess value, and later on we reassign with the correct value. I believe this check is interesting to have, so we could think of ways to add it back as an optional feature. * fix: Fix serialization for VerifyingKey (privacy-scaling-explorations#178) Now the value returned when the number of selectors is a multiple of 8 is correct. Resolves: privacy-scaling-explorations#175 * Add more getters to expose internal fields * add a constructor (privacy-scaling-explorations#164) * add a constructor * add more comment * fix as review * remove clone * remove * no need to use new variable * change comment * fix clippy * rename to from_parts * remove n declaration * feat: send sync region (privacy-scaling-explorations#180) * feat: send / sync region * Update layout.rs * update * lol * debug * Update keygen.rs * Update keygen.rs * Update keygen.rs * Update keygen.rs * thread-safe-region feature flag * cleanup * patch dev-graph * patch non-determinism in mapping creation * reduce mem usage for vk and pk * mock proving examples * swap for hashmap for insertion speed * reduce update overhead * replace BTree with Vec * add benchmarks * make the benchmarks massive * patch clippy * simplify lifetimes * patch benches * Update halo2_proofs/src/plonk/permutation/keygen.rs Co-authored-by: Han <tinghan0110@gmail.com> * Update halo2_proofs/examples/vector-mul.rs Co-authored-by: Han <tinghan0110@gmail.com> * rm benches * order once * patch lints --------- Co-authored-by: Han <tinghan0110@gmail.com> * Fix `parallelize` workload imbalance (privacy-scaling-explorations#186) * fix parallelize workload imbalance * remove the need of unsafe * implement native shuffle argument and api * fix: remove nonsense comment * strictly check shuffle rows * address doc typos * move compression into product commitment * typo * add shuffle errors for `verify_at_rows_par` * dedup expression evaluation * cargo fmt * fix fields in sanity-checks feature * Updates halo2_curves dependency to released package (privacy-scaling-explorations#190) THe package release ressets the version from those inherited by the legacy halo2curves repo's fork history. The upstream diff is: https://github.com/privacy-scaling-explorations/halo2curves/compare/9f5c50810bbefe779ee5cf1d852b2fe85dc35d5e..9a7f726fa74c8765bc7cdab11519cf285d169ecf * chore: remove monorepo Go back to having halo2curves and poseidon in separate repos. * chore: fix clippy and tests * fix: remove thread-safe-regions feature `WitnessCollection` in `create_proof` isn't thread-safe. We removed `Region`s from `SimpleLayouter` anyways. * fix: rustfmt * fix: dev-graph * chore: update lint CI name * chore: fix clippy * chore: autoexample = false turn off examples that use layouter * chore(CI): separate job for examples * chore: remove prefetch from asm, not used * chore: fix asm feature --------- Co-authored-by: adria0 <nowhere@> Co-authored-by: Carlos Pérez <37264926+CPerezz@users.noreply.github.com> Co-authored-by: adria0.eth <5526331+adria0@users.noreply.github.com> Co-authored-by: Jonathan Wang <jonathanpwang@users.noreply.github.com> Co-authored-by: kilic <kiliconu@itu.edu.tr> Co-authored-by: dante <45801863+alexander-camuto@users.noreply.github.com> Co-authored-by: pinkiebell <40266861+pinkiebell@users.noreply.github.com> Co-authored-by: han0110 <tinghan0110@gmail.com> Co-authored-by: Eduard S <eduardsanou@posteo.net> Co-authored-by: naure <naure@users.noreply.github.com> Co-authored-by: Aurélien Nicolas <info@nau.re> Co-authored-by: CeciliaZ030 <45245961+CeciliaZ030@users.noreply.github.com> Co-authored-by: Brecht Devos <Brechtp.Devos@gmail.com> Co-authored-by: Enrico Bottazzi <85900164+enricobottazzi@users.noreply.github.com> Co-authored-by: Ethan-000 <s2026080@ed.ac.uk> Co-authored-by: Mamy Ratsimbazafy <mamy_github@numforge.co> Co-authored-by: kilic <onurkilic1004@gmail.com> Co-authored-by: François Garillot <4142+huitseeker@users.noreply.github.com>
* fix parallelize workload imbalance * remove the need of unsafe
* fix parallelize workload imbalance * remove the need of unsafe
* feat: call synthesize in `MockProver` multiple times to behave same as real prover * modify previous commit * Expose mod `permutation` and re-export `permutation::keygen::Assembly` (privacy-scaling-explorations#149) * feat: expose mod ule `permutation` and re-export `permutation::keygen::Assembly` * feat: derive `lone` for `permutation::keygen::Assembly` * feat: bump MSRV for `inferno` * change: Migrate workspace to pasta_curves-0.5 (privacy-scaling-explorations#157) * change: Migrate workspace to pasta_curves-0.5 This ports the majority of the workspace to the `pasta_curves-0.5.0` leaving some tricky edge-cases that we need to handle carefully. Resolves: privacy-scaling-explorations#132 * fix: Complete latest trait bounds to compile halo2proofs * change: Migrate examples & benches to pasta 0.5 * change: Migrate halo2_gadgets to pasta-0.5 * change: Update gadgets outdated code with latest upstream * fix: Sha3 gadget circuit * fix: doc tests * chore: Update merged main * fix: Apply review suggestions * fix previous commit * Extend Circuit trait to take parameters in config (privacy-scaling-explorations#168) * Extend Circuit trait to take parameters in config The Circuit trait is extended with the following: ``` pub trait Circuit<F: Field> { /// [...] type Params: Default; fn params(&self) -> Self::Params { Self::Params::default() } fn configure_with_params(meta: &mut ConstraintSystem<F>, params: &Self::Params) -> Self::Config { Self::configure(meta) } fn configure(meta: &mut ConstraintSystem<F>) -> Self::Config; } ``` This allows runtime parametrization of the circuit configuration. The extension to the Circuit trait has been designed to minimize the breaking change: existing circuits only need to define the associated `type Params`. Unfortunately "Associated type defaults" are unstable in Rust, otherwise this would be a non-breaking change. See rust-lang/rust#29661 * Implement circuit params under feature flag * Don't overwrite configure method * Fix doc test * Allow halo2 constraint names to have non static names (privacy-scaling-explorations#156) * static ref to String type in Gates, Constraints, VirtualCell, Argument * 'lookup'.to_string() * return &str for gate name and constriant_name, also run fmt * Update halo2_gadgets/Cargo.toml Co-authored-by: Han <tinghan0110@gmail.com> * upgrade rust-toochain --------- Co-authored-by: Carlos Pérez <37264926+CPerezz@users.noreply.github.com> Co-authored-by: Han <tinghan0110@gmail.com> * Improve halo2 query calls (privacy-scaling-explorations#154) * return expression from cell * add example * selector * recurse Expression to fill in index * minimized changes from the original * backword compatible meta.query_X & challange.expr() * cargo fmt * fixed lookup to pass all tests * Update comments Co-authored-by: Brecht Devos <Brechtp.Devos@gmail.com> * Update comments Co-authored-by: Brecht Devos <Brechtp.Devos@gmail.com> * Update comments Co-authored-by: Brecht Devos <Brechtp.Devos@gmail.com> * Update comments Co-authored-by: Brecht Devos <Brechtp.Devos@gmail.com> * Update comments Co-authored-by: Brecht Devos <Brechtp.Devos@gmail.com> * Update comments Co-authored-by: Brecht Devos <Brechtp.Devos@gmail.com> * update Co-authored-by: Brecht Devos <Brechtp.Devos@gmail.com> * add primitives.rs back * remove example2 * backward compatible meta.query_X & Column.cur(), next(), prev(), at(usize) * impl Debug & make side effects only when query.index.is_none() * change impl Debug for Expression instead & revert test in plonk_api * upgrade rust-toolchain * Update halo2_proofs/src/plonk/circuit.rs Co-authored-by: Han <tinghan0110@gmail.com> * Update halo2_proofs/src/plonk/circuit.rs Co-authored-by: Han <tinghan0110@gmail.com> * ran clippy * Update halo2_proofs/src/plonk/circuit.rs Co-authored-by: Han <tinghan0110@gmail.com> --------- Co-authored-by: Brecht Devos <Brechtp.Devos@gmail.com> Co-authored-by: Han <tinghan0110@gmail.com> * Implement Clone trait for Hash, Absorbing, and Sponge structs (privacy-scaling-explorations#171) * fix: Fix serialization for VerifyingKey (privacy-scaling-explorations#178) Now the value returned when the number of selectors is a multiple of 8 is correct. Resolves: privacy-scaling-explorations#175 * Add more getters to expose internal fields * add a constructor (privacy-scaling-explorations#164) * add a constructor * add more comment * fix as review * remove clone * remove * no need to use new variable * change comment * fix clippy * rename to from_parts * remove n declaration * feat: send sync region (privacy-scaling-explorations#180) * feat: send / sync region * Update layout.rs * update * lol * debug * Update keygen.rs * Update keygen.rs * Update keygen.rs * Update keygen.rs * thread-safe-region feature flag * cleanup * patch dev-graph * patch non-determinism in mapping creation * reduce mem usage for vk and pk * mock proving examples * swap for hashmap for insertion speed * reduce update overhead * replace BTree with Vec * add benchmarks * make the benchmarks massive * patch clippy * simplify lifetimes * patch benches * Update halo2_proofs/src/plonk/permutation/keygen.rs Co-authored-by: Han <tinghan0110@gmail.com> * Update halo2_proofs/examples/vector-mul.rs Co-authored-by: Han <tinghan0110@gmail.com> * rm benches * order once * patch lints --------- Co-authored-by: Han <tinghan0110@gmail.com> * fix previous commit * Fix `parallelize` workload imbalance (privacy-scaling-explorations#186) * fix parallelize workload imbalance * remove the need of unsafe * Updates halo2_curves dependency to released package (privacy-scaling-explorations#190) THe package release ressets the version from those inherited by the legacy halo2curves repo's fork history. The upstream diff is: https://github.com/privacy-scaling-explorations/halo2curves/compare/9f5c50810bbefe779ee5cf1d852b2fe85dc35d5e..9a7f726fa74c8765bc7cdab11519cf285d169ecf * fix: explicitly define mds diff type (privacy-scaling-explorations#196) * fix: explicitly define mds diff type * rm paren * feat: expose `transcript_repr` of `VerifyingKey` and reduce the trait constraint (privacy-scaling-explorations#200) * implement native shuffle argument and api fix: remove nonsense comment strictly check shuffle rows address doc typos move compression into product commitment typo add shuffle errors for `verify_at_rows_par` dedup expression evaluation cargo fmt fix fields in sanity-checks feature * feat: public cells to allow for implementations of custom `Layouter` (privacy-scaling-explorations#192) * feat: public cells * Update mds.rs * Update mds.rs * Update single_pass.rs Co-authored-by: Han <tinghan0110@gmail.com> * bump toolchain to resolve errors * fix clippy errors for CI run * rustfmt post clippy * plz let it be the last lint * patch clippy lints in gadgets * clippy lints for sha256 bench * patch halo2proof benches * Update assigned.rs * Update halo2_gadgets/src/poseidon/primitives/mds.rs Co-authored-by: Han <tinghan0110@gmail.com> * Update halo2_gadgets/src/poseidon/primitives/mds.rs Co-authored-by: Han <tinghan0110@gmail.com> --------- Co-authored-by: Han <tinghan0110@gmail.com> * Synchronize with upstream (privacy-scaling-explorations#199) * refactor: add default impl for `SyncDeps` for backward compatability * feat: pick changes from zcash#728 and changes of flag `test-dev-graph` * feat: pick changes from zcash#622 * feat: pick changes about mod `circuit` and mod `dev` * feat: pick rest changes of `halo2_proofs` * fix: when `--no-default-features` * ci: sync from upstream, and deduplicate jobs when push to `main`, and remove always failing job `codecov`. * fix: make `commit_zk` runnable when `--no-default-features` * chore: Update rust-toolchain to 1.66 for testing (privacy-scaling-explorations#208) * chore: Update rust-toolchain to 1.66 for testing Note that tests will not compile due to the silent MSRV bump in `blake2b_simd`. Hence, we need to use `1.66` as toolchain. Resolves: privacy-scaling-explorations#207 * change: UIpdate MSRVs in Cargo.toml * fix: clippy (privacy-scaling-explorations#203) * fix: clippy * fmt * fix: Final clippy complains & adjustments --------- Co-authored-by: CPerezz <c.perezbaro@gmail.com> * Implement Sum and Product for Expression (privacy-scaling-explorations#209) * Make it Eq to make it easier for tests * Implement Sum and Product for Expression * Make it readable * chore: update poseidon dependency * fix: compiling bug with feautes=parallel_syn * feat(MockProver): replace errors by asserts(privacy-scaling-explorations#150) * boundary offset lost when resolving conflict * disable multiphase prover * Sync halo2 lib 0.4.0 merging (#81) * Use thread pool for assign_regions (#57) * feat: use rayon threadpool * feat: add UT for many subregions * refact: move common struct out to module level * refact: reuse common configure code * fix ci errors --------- Co-authored-by: kunxian xia <xiakunxian130@gmail.com> * Move `env_logger` dependency to dev-depdendencies (only for test). (#69) * sync ff/group 0.13 * fix clippy * fix clippy * fmg * [FEAT] Upgrading table16 for SHA256 (#73) * upgrade sha256 * fix clippy * Bus auto (#72) * bus: expose global offset of regions * bus-auto: add query_advice and query_fixed function in witness generation * bus-auto: fix clippy --------- Co-authored-by: Aurélien Nicolas <info@nau.re> * fix-tob-scroll-21 (#59) * fix-tob-scroll-21 * expose param field for re-randomization * enable accessing for table16 (#75) * chore: update poseidon link * merge sha256 gadget changes * Fix the CI errors (#78) * cargo fmt * fix clippy error * Feat: switch to logup scheme for lookup argument (#71) * Multi-input mv-lookup. (#49) * Add mv_lookup.rs * mv_lookup::prover, mv_lookup::verifier * Replace lookup with mv_lookup * replace halo2 with mv lookup Co-authored-by: ying tong <therealyingtong@users.noreply.github.com> * cleanups Co-authored-by: ying tong <therealyingtong@users.noreply.github.com> * ConstraintSystem: setup lookup_tracker Co-authored-by: Andrija <akinovak@gmail.com> * mv_lookup::hybrid_prover Co-authored-by: Andrija <akinovak@gmail.com> * WIP * mv_multi_lookup: enable lookup caching Co-authored-by: therealyingtong <yingtong.lai@gmail.com> * Rename hybrid_lookup -> lookup * Chunk lookups using user-provided minimum degree Co-authored-by: Andrija <akinovak@gmail.com> * mv_lookup bench Co-authored-by: Andrija <akinovak@gmail.com> * Introduce counter feature for FFTs and MSMs Co-authored-by: Andrija <akinovak@gmail.com> * Fix off-by-one errors in chunk_lookup Co-authored-by: Andrija <akinovak@gmail.com> * bench wip * time evaluate_h * KZG * more efficient batch inversion * extended lookup example * Finalize mv lookup Author: therealyingtong <yingtong.lai@gmail.com> * Remove main/ * Fix according to the comments * replace scan with parallel grand sum computation * Revert Cargo.lock * mv lookup Argument name * parallel batch invert --------- Co-authored-by: Andrija <akinovak@gmail.com> Co-authored-by: ying tong <therealyingtong@users.noreply.github.com> Co-authored-by: therealyingtong <yingtong.lai@gmail.com> * fmt * fix unit test * fix clippy errors * add todo in mv_lookup's prover * fmt and clippy * fix clippy * add detailed running time of steps in logup's prover * fmt * add more log hooks * more running time logs * use par invert * use sorted-vector to store how many times a table element occurs in input * par the process to get inputs_inv_sum * use par * fix par * add feature to skip inv sums * add new feature flag * fix clippy error --------- Co-authored-by: Sphere L <sph6r6.l1u@gmail.com> Co-authored-by: Andrija <akinovak@gmail.com> Co-authored-by: ying tong <therealyingtong@users.noreply.github.com> Co-authored-by: therealyingtong <yingtong.lai@gmail.com> * fix some simple building errs * upgrade pathfinder_simd to newer version as it can't compile on mac m1 pro * resolve merge conflict * fmt * clippy * more clippy fix * more lint fix * fmt * minor syntax fix * fix ipa multiopen test failure * fix clippy warning * fmt * fix par scan of log_inv diff * remove uncessary clone --------- Co-authored-by: alannotnerd <alan1995wang@outlook.com> Co-authored-by: kunxian xia <xiakunxian130@gmail.com> Co-authored-by: Steven <asongala@163.com> Co-authored-by: Carlos Pérez <37264926+CPerezz@users.noreply.github.com> Co-authored-by: zhenfei <zhenfei.zhang@hotmail.com> Co-authored-by: Ho <noel.wei@gmail.com> Co-authored-by: naure <naure@users.noreply.github.com> Co-authored-by: Aurélien Nicolas <info@nau.re> Co-authored-by: Sphere L <sph6r6.l1u@gmail.com> Co-authored-by: Andrija <akinovak@gmail.com> Co-authored-by: ying tong <therealyingtong@users.noreply.github.com> Co-authored-by: therealyingtong <yingtong.lai@gmail.com> --------- Co-authored-by: han0110 <tinghan0110@gmail.com> Co-authored-by: Velaciela <git.rover@outlook.com> Co-authored-by: Carlos Pérez <37264926+CPerezz@users.noreply.github.com> Co-authored-by: Eduard S <eduardsanou@posteo.net> Co-authored-by: CeciliaZ030 <45245961+CeciliaZ030@users.noreply.github.com> Co-authored-by: Brecht Devos <Brechtp.Devos@gmail.com> Co-authored-by: Enrico Bottazzi <85900164+enricobottazzi@users.noreply.github.com> Co-authored-by: Ethan-000 <s2026080@ed.ac.uk> Co-authored-by: dante <45801863+alexander-camuto@users.noreply.github.com> Co-authored-by: Mamy Ratsimbazafy <mamy_github@numforge.co> Co-authored-by: François Garillot <4142+huitseeker@users.noreply.github.com> Co-authored-by: kilic <onurkilic1004@gmail.com> Co-authored-by: Thor <7041313+thor314@users.noreply.github.com> Co-authored-by: CPerezz <c.perezbaro@gmail.com> Co-authored-by: chokermaxx <135603985+chokermaxx@users.noreply.github.com> Co-authored-by: Zhang Zhuo <mycinbrin@gmail.com> Co-authored-by: alannotnerd <alan1995wang@outlook.com> Co-authored-by: kunxian xia <xiakunxian130@gmail.com> Co-authored-by: Steven <asongala@163.com> Co-authored-by: Ho <noel.wei@gmail.com> Co-authored-by: naure <naure@users.noreply.github.com> Co-authored-by: Aurélien Nicolas <info@nau.re> Co-authored-by: Sphere L <sph6r6.l1u@gmail.com> Co-authored-by: Andrija <akinovak@gmail.com> Co-authored-by: ying tong <therealyingtong@users.noreply.github.com> Co-authored-by: therealyingtong <yingtong.lai@gmail.com>
…-hk/dev-feature/168-generalize-multiexponentiation-subroutines Generalize multiexponentiation subroutines to support larger curves
Currently if we take 40 items divided into 12 threads (AMD Ryzen 7800X, Apple M2 Pro or Intel i5-12600) the partitioning will lead to 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 7 = 3*11 + 7 = 40. The “remainder” thread will have 2.33x more work to do.
This can be quite extreme for example if we have 351 items to split on 32 cores, 351/32 = 10.96, rounded to 10 with integer division. 10*31 = 310, the last core needs to process 41 items, 4.1x more than the others.
This ensures whatever the number of items and the number of threads, the workload varies by at most 1.
This probably explains why not all cores are used while benchmarking (benchmarks TODO). See also Amdahl's law.
Unsafe
Note: I'm not familiar with Rust standard library and parallelism development since October 2016 (my last foray into writing Rust programs).
I've looked into:
Note of the chunking routines provides balanced partitioning of a range.
chunks_mut
array_chunks
and requires a separateremainder
callas_chunks
I might very well have missed something though.