Batch affine msm + generic in msm #261

gbotrel · 2022-11-09T22:00:14Z

Derived from #249 (from @0x0ece ) . Strategy is similar, but implementation details are a bit different.

This PR:

introduce generics in the MSM; this allow to re-use code across various MSM (extended jacobian, batch affine and upcoming twisted edwards variation). In short, a type is generated by bucket size (for example type bucketG1AffineC4 [1 << (4 - 1)]G1Affine) and the innerMSM functions are parametrized with that type. It allows for the buckets to be allocated on the stack, which is critical for perf.
window size (c parameter) is capped at 16bits. With extended jacobian coordinates, larger windows on large msm were beneficial, they are not for this new (msm-affine) method.
for small c, extended jacobian is called. for larger c, batch affine is called.
partitionScalars returns a list of digits as a []uint16 slice (instead of being packed into field elements).
benchmark: fillBenchScalars was not uniformly distributed. fixed.
adjusted the number of chunks / last window size to take into account the exact number of bits used by scalars (fr.Bits) instead of fr.Limbs * 64.
parameters & strategy for the batch affine version can (and hopefully will be) tuned with better heuristics.
generalizes the strategy to split the processing of a chunk into 2 go routines when this chunk is overweight; previously, we checked only the first chunk (for cases where input had a lot of binary values)
more tests
code is more readable than previous version; code is not duplicated (through templates + code generation) for each c, instead, only types are generated and generics used to select the right methods.

TODO:

remove partitionScalarsOld . will create a separate issue.
test batchAffineAdd methods

Some remarks on the msm-affine

Roughly speaking, the idea is to, as in the previous bucket method, process "chunks" of the scalars (think: columns) of a c-bit window size.
We can do efficient batchAddition (as in compute n point to point additions, NOT sum n points) in affine coordinates, but the n point to point additions must be independent; in our case, since we are adding points from the input vector to a smaller set of buckets, all we need to ensure is that during a same call to batchAddition, we don't add twice to the same bucket.

The larger the batch size is, the less (costly) inverses we do, but we potentially put more memory/cache pressure (if it's too large) and most importantly, increase the chances of finding conflicting additions (2 points to the same bucket).

Idea is that if we consider uniformly random scalars, and take say, a batchSize of 100, with 32000 buckets, the probably to hit the same bucket twice in a "batch-window" of 100 is very low. If that happens, we append the conflicting point to a queue, and try to reprocess it later.

new: with our current parameters, the queue should stay mostly empty, and if it becomes full, we are hitting a input vector that's unfriendly for the msm-affine. This can happen in SNARK context for example if a lot of the inputs have same values, or, if we keep finding m-consecutive identical values, with m being roughly the same order as the batchSize. This would force us to process batch additions of very small sizes (not full) and make the algorithm perform terribly. To deal with that and other edge cases, when the queue is full, we use another set of buckets, in extended jacobian coordinates to flush the queue. In practice (for uniformly distributed points), the slow down is ~5%, but worth it to avoid too many code path / edge cases.

Benchmarks

On AWS hpc6a.48xlarge. develop branch against feat/msm-affine (both generate uniformly distributed scalars).

without split logic (we only use as many cores as nbChunks)

TLDR; from 30 to 60% speed up 😲 . Need to benchmark on a low-cost device.

bls12-377

2022/11/18 01:50:30 comparing ../../ecc/bls12-377 MultiExp
name                           old time/op   new time/op   delta
MultiExpG1/32_points-96          381µs ±21%    351µs ± 9%     ~     (p=0.222 n=5+5)
MultiExpG1/64_points-96          442µs ±12%    486µs ±13%     ~     (p=0.222 n=5+5)
MultiExpG1/128_points-96         587µs ±35%    704µs ±41%     ~     (p=0.421 n=5+5)
MultiExpG1/256_points-96         717µs ±13%    713µs ±12%     ~     (p=1.000 n=5+5)
MultiExpG1/512_points-96         921µs ±23%    976µs ±12%     ~     (p=0.690 n=5+5)
MultiExpG1/1024_points-96       1.40ms ±30%   1.70ms ±46%     ~     (p=0.310 n=5+5)
MultiExpG1/2048_points-96       2.38ms ±86%   1.83ms ±20%     ~     (p=0.310 n=5+5)
MultiExpG1/4096_points-96       3.10ms ±23%   2.69ms ± 4%     ~     (p=0.151 n=5+5)
MultiExpG1/8192_points-96       6.10ms ±48%   5.29ms ±15%     ~     (p=1.000 n=5+5)
MultiExpG1/16384_points-96      10.0ms ±50%   11.0ms ±48%     ~     (p=1.000 n=5+5)
MultiExpG1/32768_points-96      22.2ms ±24%   25.6ms ±13%     ~     (p=0.310 n=5+5)
MultiExpG1/65536_points-96      42.3ms ±20%   30.2ms ± 9%  -28.46%  (p=0.016 n=5+4)
MultiExpG1/131072_points-96     69.0ms ±10%   68.0ms ±16%     ~     (p=0.841 n=5+5)
MultiExpG1/262144_points-96      161ms ±31%    143ms ±20%     ~     (p=0.421 n=5+5)
MultiExpG1/524288_points-96      268ms ± 2%    215ms ± 5%  -19.86%  (p=0.008 n=5+5)
MultiExpG1/1048576_points-96     513ms ± 5%    367ms ± 1%  -28.56%  (p=0.008 n=5+5)
MultiExpG1/2097152_points-96     983ms ± 9%    710ms ± 2%  -27.81%  (p=0.008 n=5+5)
MultiExpG1/4194304_points-96     2.97s ± 3%    1.39s ± 3%  -53.22%  (p=0.008 n=5+5)
MultiExpG1/8388608_points-96     5.61s ± 3%    2.84s ±20%  -49.44%  (p=0.008 n=5+5)
MultiExpG1/16777216_points-96    10.8s ± 2%     5.4s ± 3%  -50.04%  (p=0.008 n=5+5)
MultiExpG1/33554432_points-96    21.2s ± 4%    11.0s ± 5%  -48.02%  (p=0.008 n=5+5)
MultiExpG1Reference-96           480ms ± 3%    359ms ± 2%  -25.29%  (p=0.008 n=5+5)
ManyMultiExpG1Reference-96       566ms ± 4%    387ms ± 4%  -31.60%  (p=0.008 n=5+5)
MultiExpG2/32_points-96          709µs ± 5%    888µs ±24%     ~     (p=0.556 n=4+5)
MultiExpG2/64_points-96          926µs ±30%  1816µs ±150%     ~     (p=0.222 n=5+5)
MultiExpG2/128_points-96        1.31ms ±51%   1.19ms ± 5%     ~     (p=0.905 n=5+4)
MultiExpG2/256_points-96       3.22ms ±187%   2.58ms ±49%     ~     (p=0.690 n=5+5)
MultiExpG2/512_points-96        2.43ms ±68%   1.91ms ±13%     ~     (p=0.222 n=5+5)
MultiExpG2/1024_points-96       2.76ms ±16%   2.88ms ±51%     ~     (p=1.000 n=5+5)
MultiExpG2/2048_points-96       5.40ms ±26%   4.53ms ±17%     ~     (p=0.310 n=5+5)
MultiExpG2/4096_points-96       8.42ms ±51%   9.63ms ±45%     ~     (p=0.310 n=5+5)
MultiExpG2/8192_points-96       16.1ms ±45%   10.9ms ±19%  -32.48%  (p=0.008 n=5+5)
MultiExpG2/16384_points-96      31.1ms ±44%   29.0ms ±39%     ~     (p=1.000 n=5+5)
MultiExpG2/32768_points-96      59.1ms ±21%   41.6ms ±15%  -29.68%  (p=0.008 n=5+5)
MultiExpG2/65536_points-96       139ms ±11%     86ms ±16%  -37.80%  (p=0.008 n=5+5)
MultiExpG2/131072_points-96      228ms ±32%    143ms ± 5%  -37.31%  (p=0.008 n=5+5)
MultiExpG2/262144_points-96      401ms ±13%    275ms ± 4%  -31.30%  (p=0.008 n=5+5)
MultiExpG2/524288_points-96      788ms ± 3%    557ms ± 7%  -29.39%  (p=0.008 n=5+5)
MultiExpG2/1048576_points-96     1.48s ± 3%    0.98s ± 3%  -33.55%  (p=0.008 n=5+5)
MultiExpG2/2097152_points-96     2.85s ± 3%    1.84s ± 3%  -35.53%  (p=0.008 n=5+5)
MultiExpG2/4194304_points-96     7.83s ± 2%    3.82s ±20%  -51.28%  (p=0.008 n=5+5)
MultiExpG2/8388608_points-96     14.5s ± 2%     7.4s ± 8%  -48.78%  (p=0.008 n=5+5)
MultiExpG2/16777216_points-96    28.3s ± 1%    14.8s ± 9%  -47.50%  (p=0.008 n=5+5)
MultiExpG2/33554432_points-96    56.4s ± 3%    30.5s ± 6%  -45.92%  (p=0.008 n=5+5)
MultiExpG2Reference-96           1.47s ± 7%    0.98s ± 2%  -33.24%  (p=0.016 n=5+4)
ManyMultiExpG2Reference-96       1.70s ± 3%    1.20s ± 4%  -29.36%  (p=0.008 n=5+5)

bls12-381


2022/11/18 01:50:30 comparing ../../ecc/bls12-381 MultiExp
name                           old time/op   new time/op   delta
MultiExpG1/32_points-96          363µs ±12%    357µs ± 5%     ~     (p=0.690 n=5+5)
MultiExpG1/64_points-96          450µs ± 9%    446µs ± 8%     ~     (p=1.000 n=5+5)
MultiExpG1/128_points-96         624µs ± 7%    562µs ± 8%   -9.94%  (p=0.032 n=5+5)
MultiExpG1/256_points-96         919µs ±40%    897µs ±56%     ~     (p=0.841 n=5+5)
MultiExpG1/512_points-96         922µs ±23%    949µs ±25%     ~     (p=0.841 n=5+5)
MultiExpG1/1024_points-96       1.33ms ±57%   1.17ms ± 4%     ~     (p=1.000 n=5+5)
MultiExpG1/2048_points-96       2.05ms ±39%   2.40ms ±20%     ~     (p=0.222 n=5+5)
MultiExpG1/4096_points-96       3.07ms ±36%   5.18ms ±99%     ~     (p=0.095 n=5+5)
MultiExpG1/8192_points-96       5.28ms ±23%   4.65ms ±32%     ~     (p=0.095 n=5+5)
MultiExpG1/16384_points-96      11.9ms ±57%    8.8ms ±18%     ~     (p=0.095 n=5+5)
MultiExpG1/32768_points-96      18.0ms ± 7%   19.8ms ±22%     ~     (p=0.413 n=4+5)
MultiExpG1/65536_points-96      38.8ms ±21%   35.7ms ±54%     ~     (p=0.310 n=5+5)
MultiExpG1/131072_points-96     79.9ms ±20%   72.0ms ±18%     ~     (p=0.421 n=5+5)
MultiExpG1/262144_points-96      147ms ±21%    110ms ± 4%  -24.79%  (p=0.016 n=5+4)
MultiExpG1/524288_points-96      272ms ± 5%    202ms ± 1%  -25.47%  (p=0.008 n=5+5)
MultiExpG1/1048576_points-96     503ms ± 3%    362ms ± 2%  -28.12%  (p=0.008 n=5+5)
MultiExpG1/2097152_points-96     964ms ± 2%    704ms ± 5%  -27.04%  (p=0.008 n=5+5)
MultiExpG1/4194304_points-96     3.01s ± 7%    1.38s ± 3%  -54.11%  (p=0.008 n=5+5)
MultiExpG1/8388608_points-96     5.67s ± 3%    2.78s ± 4%  -51.03%  (p=0.008 n=5+5)
MultiExpG1/16777216_points-96    10.9s ± 2%     5.3s ± 4%  -51.34%  (p=0.008 n=5+5)
MultiExpG1/33554432_points-96    21.2s ± 4%    10.7s ±10%  -49.32%  (p=0.008 n=5+5)
MultiExpG1Reference-96           481ms ± 2%    355ms ± 1%  -26.22%  (p=0.008 n=5+5)
ManyMultiExpG1Reference-96       576ms ± 5%    373ms ± 2%  -35.13%  (p=0.008 n=5+5)
MultiExpG2/32_points-96          862µs ±46%    733µs ±24%     ~     (p=0.310 n=5+5)
MultiExpG2/64_points-96         1.11ms ±22%   0.82ms ±15%  -25.73%  (p=0.016 n=5+5)
MultiExpG2/128_points-96       1.71ms ±124%   1.11ms ±18%     ~     (p=0.548 n=5+5)
MultiExpG2/256_points-96        1.63ms ±75%  3.59ms ±128%     ~     (p=0.056 n=5+5)
MultiExpG2/512_points-96        1.98ms ±14%   7.87ms ±77%     ~     (p=0.151 n=5+5)
MultiExpG2/1024_points-96       2.76ms ±36%   2.43ms ±11%     ~     (p=0.310 n=5+5)
MultiExpG2/2048_points-96       3.96ms ±13%   4.57ms ±32%     ~     (p=0.222 n=5+5)
MultiExpG2/4096_points-96       6.92ms ±29%   8.51ms ±28%     ~     (p=0.222 n=5+5)
MultiExpG2/8192_points-96       17.2ms ±39%   10.2ms ± 3%  -40.49%  (p=0.008 n=5+5)
MultiExpG2/16384_points-96      30.0ms ±68%   25.7ms ±33%     ~     (p=0.841 n=5+5)
MultiExpG2/32768_points-96      46.7ms ±14%   56.6ms ±53%     ~     (p=0.841 n=5+5)
MultiExpG2/65536_points-96       113ms ±18%     81ms ±28%  -28.31%  (p=0.016 n=5+5)
MultiExpG2/131072_points-96      208ms ±48%    143ms ±15%  -31.05%  (p=0.008 n=5+5)
MultiExpG2/262144_points-96      347ms ± 2%    268ms ± 2%  -22.96%  (p=0.029 n=4+4)
MultiExpG2/524288_points-96      710ms ± 3%    516ms ± 4%  -27.35%  (p=0.008 n=5+5)
MultiExpG2/1048576_points-96     1.35s ± 5%    0.90s ± 3%  -33.21%  (p=0.008 n=5+5)
MultiExpG2/2097152_points-96     2.61s ± 8%    1.74s ± 5%  -33.52%  (p=0.008 n=5+5)
MultiExpG2/4194304_points-96     7.16s ± 2%    3.38s ± 9%  -52.79%  (p=0.008 n=5+5)
MultiExpG2/8388608_points-96     13.2s ± 3%     6.7s ± 3%  -48.99%  (p=0.008 n=5+5)
MultiExpG2/16777216_points-96    25.5s ± 3%    13.2s ± 6%  -48.26%  (p=0.008 n=5+5)
MultiExpG2/33554432_points-96    51.3s ± 4%    26.4s ±10%  -48.54%  (p=0.008 n=5+5)
MultiExpG2Reference-96           1.35s ± 7%    0.89s ± 2%  -33.73%  (p=0.008 n=5+5)
ManyMultiExpG2Reference-96       1.56s ± 5%    1.10s ± 3%  -29.43%  (p=0.016 n=5+4)

bn254


2022/11/18 01:50:30 comparing ../../ecc/bn254 MultiExp
name                           old time/op  new time/op  delta
MultiExpG1/32_points-96         241µs ± 5%   244µs ± 6%     ~     (p=0.222 n=5+5)
MultiExpG1/64_points-96         357µs ±24%   334µs ± 8%     ~     (p=0.690 n=5+5)
MultiExpG1/128_points-96        483µs ±46%   434µs ± 6%     ~     (p=0.841 n=5+5)
MultiExpG1/256_points-96        477µs ±17%   562µs ±14%     ~     (p=0.056 n=5+5)
MultiExpG1/512_points-96        673µs ±14%   635µs ±17%     ~     (p=0.690 n=5+5)
MultiExpG1/1024_points-96       826µs ±11%   856µs ±13%     ~     (p=0.548 n=5+5)
MultiExpG1/2048_points-96      1.13ms ±12%  1.21ms ± 5%     ~     (p=0.095 n=5+5)
MultiExpG1/4096_points-96      1.88ms ±39%  2.15ms ±29%     ~     (p=0.310 n=5+5)
MultiExpG1/8192_points-96      2.89ms ±16%  3.26ms ±33%     ~     (p=0.151 n=5+5)
MultiExpG1/16384_points-96     6.24ms ±33%  5.24ms ±12%     ~     (p=0.310 n=5+5)
MultiExpG1/32768_points-96     14.5ms ±25%  11.0ms ±37%     ~     (p=0.056 n=5+5)
MultiExpG1/65536_points-96     21.7ms ±18%  25.6ms ±65%     ~     (p=0.841 n=5+5)
MultiExpG1/131072_points-96    53.9ms ±25%  45.4ms ±61%     ~     (p=0.421 n=5+5)
MultiExpG1/262144_points-96    80.5ms ± 7%  65.8ms ± 5%  -18.26%  (p=0.008 n=5+5)
MultiExpG1/524288_points-96     166ms ± 2%   137ms ± 6%  -17.75%  (p=0.008 n=5+5)
MultiExpG1/1048576_points-96    317ms ± 8%   234ms ± 3%  -26.15%  (p=0.008 n=5+5)
MultiExpG1/2097152_points-96    575ms ± 3%   453ms ± 3%  -21.20%  (p=0.008 n=5+5)
MultiExpG1/4194304_points-96    2.33s ± 3%   0.86s ± 2%  -63.12%  (p=0.008 n=5+5)
MultiExpG1/8388608_points-96    4.39s ± 5%   1.68s ± 4%  -61.64%  (p=0.008 n=5+5)
MultiExpG1/16777216_points-96   8.77s ± 4%   3.41s ± 7%  -61.07%  (p=0.008 n=5+5)
MultiExpG1/33554432_points-96   17.3s ± 3%    6.7s ± 3%  -61.25%  (p=0.008 n=5+5)
MultiExpG1Reference-96          291ms ± 1%   228ms ± 1%  -21.57%  (p=0.008 n=5+5)
ManyMultiExpG1Reference-96      349ms ± 4%   239ms ± 1%  -31.44%  (p=0.008 n=5+5)
MultiExpG2/32_points-96         435µs ± 7%   434µs ± 8%     ~     (p=0.841 n=5+5)
MultiExpG2/64_points-96         555µs ±12%   507µs ± 4%     ~     (p=0.063 n=5+4)
MultiExpG2/128_points-96        701µs ±12%   656µs ± 7%     ~     (p=0.421 n=5+5)
MultiExpG2/256_points-96       1.14ms ±17%  1.12ms ±13%     ~     (p=1.000 n=5+5)
MultiExpG2/512_points-96       1.79ms ±51%  1.20ms ±15%     ~     (p=0.310 n=5+5)
MultiExpG2/1024_points-96      1.37ms ± 2%  1.54ms ±17%     ~     (p=0.190 n=4+5)
MultiExpG2/2048_points-96      2.72ms ±39%  2.84ms ±25%     ~     (p=0.690 n=5+5)
MultiExpG2/4096_points-96      5.28ms ±24%  4.56ms ±25%     ~     (p=0.421 n=5+5)
MultiExpG2/8192_points-96      7.44ms ±51%  5.70ms ± 7%     ~     (p=0.063 n=5+4)
MultiExpG2/16384_points-96     14.1ms ± 7%  12.2ms ±26%     ~     (p=0.421 n=5+5)
MultiExpG2/32768_points-96     28.8ms ± 6%  26.7ms ±31%     ~     (p=0.905 n=4+5)
MultiExpG2/65536_points-96     45.8ms ± 6%  45.7ms ±37%     ~     (p=0.556 n=4+5)
MultiExpG2/131072_points-96    95.9ms ±10%  71.2ms ± 6%  -25.72%  (p=0.008 n=5+5)
MultiExpG2/262144_points-96     252ms ±66%   154ms ±12%  -38.68%  (p=0.008 n=5+5)
MultiExpG2/524288_points-96     368ms ± 4%   267ms ± 2%  -27.47%  (p=0.008 n=5+5)
MultiExpG2/1048576_points-96    678ms ± 3%   497ms ± 8%  -26.69%  (p=0.008 n=5+5)
MultiExpG2/2097152_points-96    1.26s ± 2%   0.95s ±16%  -24.14%  (p=0.008 n=5+5)
MultiExpG2/4194304_points-96    3.81s ± 3%   1.80s ± 2%  -52.66%  (p=0.008 n=5+5)
MultiExpG2/8388608_points-96    7.25s ± 6%   3.56s ± 5%  -50.92%  (p=0.008 n=5+5)
MultiExpG2/16777216_points-96   13.9s ± 2%    6.9s ± 3%  -50.42%  (p=0.008 n=5+5)
MultiExpG2/33554432_points-96   27.2s ± 3%   14.2s ± 5%  -47.75%  (p=0.008 n=5+5)
MultiExpG2Reference-96          654ms ± 3%   498ms ± 7%  -23.82%  (p=0.008 n=5+5)
ManyMultiExpG2Reference-96      772ms ± 1%   528ms ± 7%  -31.56%  (p=0.008 n=5+5)

bw6-761


2022/11/18 01:50:30 comparing ../../ecc/bw6-761 MultiExp
name                           old time/op   new time/op    delta
MultiExpG1/32_points-96         1.28ms ±56%    1.17ms ±34%      ~     (p=0.690 n=5+5)
MultiExpG1/64_points-96         1.50ms ±28%   2.17ms ±104%      ~     (p=0.548 n=5+5)
MultiExpG1/128_points-96        1.37ms ±14%   2.54ms ±110%      ~     (p=0.063 n=4+5)
MultiExpG1/256_points-96        1.73ms ± 1%   2.90ms ±130%      ~     (p=1.000 n=4+5)
MultiExpG1/512_points-96        2.08ms ± 1%    2.78ms ±21%   +33.85%  (p=0.016 n=4+5)
MultiExpG1/1024_points-96       2.96ms ± 2%    6.32ms ±63%  +113.99%  (p=0.016 n=4+5)
MultiExpG1/2048_points-96       4.13ms ± 2%  11.02ms ±125%  +166.75%  (p=0.008 n=5+5)
MultiExpG1/4096_points-96       7.34ms ± 7%    7.38ms ±12%      ~     (p=0.841 n=5+5)
MultiExpG1/8192_points-96       13.3ms ±18%    14.7ms ± 7%      ~     (p=0.095 n=5+5)
MultiExpG1/16384_points-96      27.4ms ±13%    33.2ms ±43%      ~     (p=0.310 n=5+5)
MultiExpG1/32768_points-96      52.3ms ±24%    47.5ms ± 8%      ~     (p=0.413 n=5+4)
MultiExpG1/65536_points-96       203ms ±29%      93ms ±11%   -54.08%  (p=0.008 n=5+5)
MultiExpG1/131072_points-96      329ms ±18%     233ms ± 9%   -29.05%  (p=0.008 n=5+5)
MultiExpG1/262144_points-96      515ms ± 3%     446ms ±67%      ~     (p=0.151 n=5+5)
MultiExpG1/524288_points-96      933ms ± 2%     593ms ± 7%   -36.44%  (p=0.008 n=5+5)
MultiExpG1/1048576_points-96     1.83s ± 7%     1.10s ± 5%   -40.25%  (p=0.008 n=5+5)
MultiExpG1/2097152_points-96     3.43s ± 5%     2.20s ± 1%   -35.93%  (p=0.008 n=5+5)
MultiExpG1/4194304_points-96     7.09s ± 3%     4.48s ± 6%   -36.88%  (p=0.008 n=5+5)
MultiExpG1/8388608_points-96     14.0s ± 5%      8.9s ± 5%   -36.39%  (p=0.008 n=5+5)
MultiExpG1/16777216_points-96    30.2s ±11%     18.2s ± 3%   -39.71%  (p=0.008 n=5+5)
MultiExpG1/33554432_points-96    57.5s ± 4%     37.2s ± 7%   -35.25%  (p=0.008 n=5+5)
MultiExpG1Reference-96           1.72s ± 5%     1.13s ±14%   -34.25%  (p=0.008 n=5+5)
ManyMultiExpG1Reference-96       1.93s ± 2%     1.37s ± 5%   -28.87%  (p=0.016 n=5+4)
MultiExpG2/32_points-96         1.21ms ±31%   1.54ms ±104%      ~     (p=1.000 n=5+5)
MultiExpG2/64_points-96         1.92ms ±43%    1.77ms ±72%      ~     (p=0.841 n=5+5)
MultiExpG2/128_points-96       2.48ms ±134%   3.18ms ±168%      ~     (p=0.841 n=5+5)
MultiExpG2/256_points-96        2.81ms ±77%    2.96ms ±68%      ~     (p=0.310 n=5+5)
MultiExpG2/512_points-96        2.61ms ±30%    2.50ms ±23%      ~     (p=0.548 n=5+5)
MultiExpG2/1024_points-96       3.29ms ±10%    4.50ms ±46%   +36.81%  (p=0.032 n=5+5)
MultiExpG2/2048_points-96       5.53ms ±35%    6.04ms ±70%      ~     (p=1.000 n=5+5)
MultiExpG2/4096_points-96       8.77ms ±36%    8.44ms ±44%      ~     (p=0.841 n=5+5)
MultiExpG2/8192_points-96       15.3ms ±30%    16.2ms ±38%      ~     (p=0.841 n=5+5)
MultiExpG2/16384_points-96      28.1ms ±26%    26.5ms ±16%      ~     (p=0.690 n=5+5)
MultiExpG2/32768_points-96      66.3ms ±22%    52.0ms ±25%   -21.58%  (p=0.032 n=5+5)
MultiExpG2/65536_points-96       190ms ± 8%      89ms ± 4%   -53.04%  (p=0.008 n=5+5)
MultiExpG2/131072_points-96      317ms ±26%     261ms ±12%      ~     (p=0.151 n=5+5)
MultiExpG2/262144_points-96      522ms ±11%     402ms ±31%      ~     (p=0.056 n=5+5)
MultiExpG2/524288_points-96      912ms ± 3%     622ms ±17%   -31.80%  (p=0.008 n=5+5)
MultiExpG2/1048576_points-96     1.77s ± 9%     1.08s ± 3%   -38.85%  (p=0.008 n=5+5)
MultiExpG2/2097152_points-96     3.32s ± 7%     2.27s ±24%   -31.72%  (p=0.008 n=5+5)
MultiExpG2/4194304_points-96     6.77s ± 9%     4.39s ± 6%   -35.16%  (p=0.008 n=5+5)
MultiExpG2/8388608_points-96     13.3s ± 5%      8.8s ± 2%   -33.73%  (p=0.008 n=5+5)
MultiExpG2/16777216_points-96    26.9s ± 5%     17.8s ± 8%   -33.70%  (p=0.008 n=5+5)
MultiExpG2/33554432_points-96    56.2s ± 7%     38.1s ±11%   -32.23%  (p=0.008 n=5+5)
MultiExpG2Reference-96           1.69s ± 8%     1.10s ± 6%   -34.88%  (p=0.008 n=5+5)
ManyMultiExpG2Reference-96       1.90s ± 1%     1.37s ± 7%   -27.68%  (p=0.008 n=5+5)

with split logic (more cores == we split the msm)

TLDR; advantage is good for most sizes (10% to 50% perf gain), decreases with large msms probably due to the fact that we now stop at c=16. Some small sizes on G2 have significant decrease, need to tune the batchSize / choice of c for those.

bls12-377

2022/11/17 23:08:58 comparing ../../ecc/bls12-377 MultiExp
name                           old time/op  new time/op  delta
MultiExpG1/32_points-96         459µs ± 5%   363µs ± 9%  -21.07%  (p=0.008 n=5+5)
MultiExpG1/64_points-96         609µs ±12%   431µs ± 6%  -29.19%  (p=0.008 n=5+5)
MultiExpG1/128_points-96        778µs ±15%   726µs ±34%     ~     (p=0.421 n=5+5)
MultiExpG1/256_points-96       1.02ms ±32%  0.90ms ±21%     ~     (p=0.690 n=5+5)
MultiExpG1/512_points-96        985µs ±18%  1031µs ±32%     ~     (p=0.690 n=5+5)
MultiExpG1/1024_points-96      1.16ms ±16%  1.18ms ±27%     ~     (p=1.000 n=5+5)
MultiExpG1/2048_points-96      1.52ms ± 8%  1.66ms ± 6%     ~     (p=0.056 n=5+5)
MultiExpG1/4096_points-96      2.30ms ±23%  2.45ms ± 8%     ~     (p=0.421 n=5+5)
MultiExpG1/8192_points-96      2.88ms ± 4%  3.55ms ±21%  +23.34%  (p=0.008 n=5+5)
MultiExpG1/16384_points-96     4.88ms ±10%  5.54ms ±25%     ~     (p=0.222 n=5+5)
MultiExpG1/32768_points-96     8.35ms ± 0%  8.71ms ±13%     ~     (p=0.690 n=5+5)
MultiExpG1/65536_points-96     10.6ms ±12%  13.1ms ±31%     ~     (p=0.151 n=5+5)
MultiExpG1/131072_points-96    18.3ms ± 2%  15.9ms ± 2%  -13.14%  (p=0.008 n=5+5)
MultiExpG1/262144_points-96    35.4ms ± 3%  35.3ms ±39%     ~     (p=0.151 n=5+5)
MultiExpG1/524288_points-96    75.5ms ±16%  56.1ms ± 3%  -25.73%  (p=0.008 n=5+5)
MultiExpG1/1048576_points-96    149ms ± 2%   105ms ± 1%  -29.09%  (p=0.016 n=4+5)
MultiExpG1/2097152_points-96    316ms ± 1%   227ms ± 0%  -28.31%  (p=0.029 n=4+4)
MultiExpG1/4194304_points-96    601ms ± 8%   419ms ± 2%  -30.26%  (p=0.016 n=4+5)
MultiExpG1/8388608_points-96    1.08s ± 3%   0.81s ±10%  -25.23%  (p=0.008 n=5+5)
MultiExpG1/16777216_points-96   1.88s ± 7%   1.53s ± 3%  -18.62%  (p=0.008 n=5+5)
MultiExpG1/33554432_points-96   3.83s ±17%   3.06s ± 5%  -20.08%  (p=0.008 n=5+5)
MultiExpG1/67108864_points-96   6.65s ± 1%   5.92s ± 5%  -10.90%  (p=0.016 n=4+5)
MultiExpG1Reference-96          148ms ± 8%   105ms ± 1%  -29.05%  (p=0.008 n=5+5)
ManyMultiExpG1Reference-96      347ms ± 4%   266ms ± 2%  -23.21%  (p=0.016 n=5+4)
MultiExpG2/32_points-96        1.15ms ±76%  0.87ms ±33%     ~     (p=0.548 n=5+5)
MultiExpG2/64_points-96        1.04ms ±12%  1.11ms ±18%     ~     (p=0.556 n=5+4)
MultiExpG2/128_points-96       1.24ms ±16%  1.14ms ±13%     ~     (p=0.421 n=5+5)
MultiExpG2/256_points-96       2.78ms ±68%  1.85ms ±14%     ~     (p=0.095 n=5+5)
MultiExpG2/512_points-96       1.93ms ±46%  2.00ms ±19%     ~     (p=0.310 n=5+5)
MultiExpG2/1024_points-96      2.08ms ±11%  2.91ms ±48%     ~     (p=0.056 n=5+5)
MultiExpG2/2048_points-96      2.94ms ± 7%  3.03ms ± 3%     ~     (p=0.548 n=5+5)
MultiExpG2/4096_points-96      4.31ms ±18%  6.13ms ±66%  +42.30%  (p=0.032 n=5+5)
MultiExpG2/8192_points-96      7.13ms ± 4%  7.97ms ±14%  +11.82%  (p=0.016 n=5+5)
MultiExpG2/16384_points-96     12.9ms ± 6%  13.4ms ±24%     ~     (p=0.690 n=5+5)
MultiExpG2/32768_points-96     23.6ms ± 1%  22.0ms ±19%     ~     (p=0.151 n=5+5)
MultiExpG2/65536_points-96     30.2ms ±25%  20.3ms ± 2%  -32.88%  (p=0.016 n=5+4)
MultiExpG2/131072_points-96    57.2ms ± 4%  39.8ms ± 4%  -30.47%  (p=0.008 n=5+5)
MultiExpG2/262144_points-96     113ms ± 1%    88ms ± 4%  -21.56%  (p=0.016 n=4+5)
MultiExpG2/524288_points-96     225ms ± 2%   154ms ± 9%  -31.86%  (p=0.008 n=5+5)
MultiExpG2/1048576_points-96    461ms ± 5%   287ms ± 3%  -37.73%  (p=0.008 n=5+5)
MultiExpG2/2097152_points-96    907ms ±31%   639ms ± 6%  -29.57%  (p=0.008 n=5+5)
MultiExpG2/4194304_points-96    1.43s ± 7%   1.20s ± 5%  -16.14%  (p=0.029 n=4+4)
MultiExpG2/8388608_points-96    2.69s ± 3%   2.28s ± 5%  -15.14%  (p=0.008 n=5+5)
MultiExpG2/16777216_points-96   5.06s ± 2%   4.41s ± 2%  -12.84%  (p=0.008 n=5+5)
MultiExpG2/33554432_points-96   10.2s ±16%    8.6s ± 5%  -15.13%  (p=0.008 n=5+5)
MultiExpG2/67108864_points-96   17.7s ± 3%   17.6s ± 0%     ~     (p=0.905 n=5+4)
MultiExpG2Reference-96          468ms ± 6%   282ms ± 1%  -39.84%  (p=0.016 n=5+4)
ManyMultiExpG2Reference-96      1.08s ± 3%   0.71s ± 1%  -34.21%  (p=0.008 n=5+5)

bls12-378

2022/11/17 23:08:58 comparing ../../ecc/bls12-378 MultiExp
name                           old time/op  new time/op   delta
MultiExpG1/32_points-96         463µs ± 8%    358µs ±12%  -22.51%  (p=0.008 n=5+5)
MultiExpG1/64_points-96         645µs ±23%    453µs ±16%  -29.75%  (p=0.032 n=5+5)
MultiExpG1/128_points-96        646µs ±16%    603µs ±17%     ~     (p=0.548 n=5+5)
MultiExpG1/256_points-96        792µs ±13%    629µs ± 3%  -20.51%  (p=0.016 n=5+4)
MultiExpG1/512_points-96       1.05ms ±12%   0.82ms ±15%  -21.62%  (p=0.008 n=5+5)
MultiExpG1/1024_points-96      1.22ms ±23%   1.15ms ±20%     ~     (p=0.548 n=5+5)
MultiExpG1/2048_points-96      1.78ms ±26%   1.70ms ±16%     ~     (p=0.841 n=5+5)
MultiExpG1/4096_points-96      2.24ms ±20%   3.21ms ±26%  +43.52%  (p=0.016 n=5+5)
MultiExpG1/8192_points-96      2.97ms ± 5%   3.72ms ±42%  +25.31%  (p=0.016 n=5+5)
MultiExpG1/16384_points-96     5.14ms ± 7%   4.87ms ±12%     ~     (p=0.310 n=5+5)
MultiExpG1/32768_points-96     8.69ms ± 3%   9.61ms ±28%     ~     (p=0.421 n=5+5)
MultiExpG1/65536_points-96     16.0ms ± 2%   18.3ms ±70%     ~     (p=0.730 n=4+5)
MultiExpG1/131072_points-96    21.5ms ± 9%   17.3ms ±14%  -19.41%  (p=0.016 n=5+5)
MultiExpG1/262144_points-96    36.5ms ± 5%   32.0ms ± 3%  -12.24%  (p=0.016 n=5+4)
MultiExpG1/524288_points-96    71.3ms ± 1%   58.9ms ± 6%  -17.36%  (p=0.008 n=5+5)
MultiExpG1/1048576_points-96    150ms ± 2%    107ms ± 1%  -28.77%  (p=0.016 n=5+4)
MultiExpG1/2097152_points-96    321ms ± 3%    234ms ± 3%  -27.12%  (p=0.008 n=5+5)
MultiExpG1/4194304_points-96    589ms ±10%    420ms ± 4%  -28.58%  (p=0.008 n=5+5)
MultiExpG1/8388608_points-96    1.08s ± 4%    0.84s ±12%  -22.11%  (p=0.008 n=5+5)
MultiExpG1/16777216_points-96   1.97s ± 4%    1.56s ± 5%  -20.51%  (p=0.008 n=5+5)
MultiExpG1/33554432_points-96   3.62s ± 1%    3.03s ± 1%  -16.34%  (p=0.016 n=4+5)
MultiExpG1/67108864_points-96   6.66s ± 0%    6.24s ±11%     ~     (p=0.190 n=4+5)
MultiExpG1Reference-96          148ms ± 2%    108ms ± 2%  -26.76%  (p=0.008 n=5+5)
ManyMultiExpG1Reference-96      352ms ± 1%    268ms ±10%  -23.89%  (p=0.016 n=5+4)
MultiExpG2/32_points-96         933µs ±17%    820µs ± 9%     ~     (p=0.222 n=5+5)
MultiExpG2/64_points-96        1.33ms ±29%  1.46ms ±123%     ~     (p=0.310 n=5+5)
MultiExpG2/128_points-96       1.61ms ±12%   1.23ms ±27%  -23.43%  (p=0.032 n=4+5)
MultiExpG2/256_points-96       1.58ms ±21%   1.69ms ±33%     ~     (p=0.841 n=5+5)
MultiExpG2/512_points-96       1.80ms ±28%   1.85ms ±14%     ~     (p=0.730 n=4+5)
MultiExpG2/1024_points-96      2.01ms ± 0%   2.65ms ± 5%  +31.58%  (p=0.029 n=4+4)
MultiExpG2/2048_points-96      2.80ms ± 8%   3.89ms ±24%  +38.73%  (p=0.008 n=5+5)
MultiExpG2/4096_points-96      4.13ms ± 9%   4.93ms ±10%  +19.37%  (p=0.032 n=5+5)
MultiExpG2/8192_points-96      7.43ms ± 6%   8.51ms ±12%  +14.63%  (p=0.016 n=5+5)
MultiExpG2/16384_points-96     12.3ms ± 1%   12.9ms ±14%     ~     (p=0.222 n=5+5)
MultiExpG2/32768_points-96     23.7ms ± 1%   24.5ms ±27%     ~     (p=0.690 n=5+5)
MultiExpG2/65536_points-96     35.0ms ±21%   26.6ms ±14%  -23.82%  (p=0.032 n=5+4)
MultiExpG2/131072_points-96    59.0ms ± 3%   60.7ms ±67%     ~     (p=0.690 n=5+5)
MultiExpG2/262144_points-96     114ms ± 1%     88ms ± 1%  -23.20%  (p=0.029 n=4+4)
MultiExpG2/524288_points-96     232ms ± 4%    164ms ±11%  -29.13%  (p=0.008 n=5+5)
MultiExpG2/1048576_points-96    513ms ± 1%    299ms ± 4%  -41.79%  (p=0.016 n=4+5)
MultiExpG2/2097152_points-96    1.16s ± 6%    0.68s ± 4%  -41.68%  (p=0.008 n=5+5)
MultiExpG2/4194304_points-96    1.94s ± 3%    1.24s ± 4%  -36.02%  (p=0.008 n=5+5)
MultiExpG2/8388608_points-96    3.20s ± 5%    2.27s ± 3%  -29.03%  (p=0.016 n=5+4)
MultiExpG2/16777216_points-96   5.69s ± 1%    4.48s ± 2%  -21.34%  (p=0.008 n=5+5)
MultiExpG2/33554432_points-96   9.64s ± 1%    8.59s ± 5%  -10.92%  (p=0.016 n=4+5)
MultiExpG2/67108864_points-96   17.7s ± 0%    17.2s ± 2%   -2.85%  (p=0.016 n=4+5)
MultiExpG2Reference-96          466ms ± 5%   619ms ±187%     ~     (p=0.310 n=5+5)
ManyMultiExpG2Reference-96      1.08s ± 2%    0.72s ± 5%  -33.12%  (p=0.008 n=5+5)

bls12-381


2022/11/17 23:08:58 comparing ../../ecc/bls12-381 MultiExp
name                           old time/op   new time/op  delta
MultiExpG1/32_points-96          456µs ± 7%   384µs ±29%     ~     (p=0.151 n=5+5)
MultiExpG1/64_points-96          592µs ±15%   444µs ±26%  -24.99%  (p=0.032 n=5+5)
MultiExpG1/128_points-96         803µs ±42%   584µs ±17%  -27.35%  (p=0.032 n=5+5)
MultiExpG1/256_points-96         786µs ±18%   712µs ± 8%     ~     (p=0.222 n=5+5)
MultiExpG1/512_points-96         891µs ± 9%  1066µs ±34%     ~     (p=0.222 n=5+5)
MultiExpG1/1024_points-96       1.31ms ±19%  1.96ms ±45%     ~     (p=0.151 n=5+5)
MultiExpG1/2048_points-96       2.13ms ±23%  1.63ms ±24%  -23.41%  (p=0.016 n=5+5)
MultiExpG1/4096_points-96       2.11ms ±16%  2.15ms ±16%     ~     (p=0.310 n=5+5)
MultiExpG1/8192_points-96       3.01ms ± 5%  3.53ms ±23%  +17.22%  (p=0.032 n=5+5)
MultiExpG1/16384_points-96      4.83ms ± 7%  6.25ms ±30%     ~     (p=0.095 n=5+5)
MultiExpG1/32768_points-96      8.48ms ± 1%  9.34ms ±39%     ~     (p=0.841 n=5+5)
MultiExpG1/65536_points-96      11.9ms ±36%  13.8ms ±49%     ~     (p=0.421 n=5+5)
MultiExpG1/131072_points-96     22.0ms ±15%  26.3ms ±94%     ~     (p=0.690 n=5+5)
MultiExpG1/262144_points-96     35.5ms ± 1%  32.1ms ± 5%   -9.40%  (p=0.008 n=5+5)
MultiExpG1/524288_points-96     72.0ms ± 0%  65.2ms ±16%     ~     (p=0.190 n=4+5)
MultiExpG1/1048576_points-96     154ms ± 1%   116ms ± 1%  -24.80%  (p=0.008 n=5+5)
MultiExpG1/2097152_points-96     320ms ± 2%   235ms ± 5%  -26.61%  (p=0.008 n=5+5)
MultiExpG1/4194304_points-96     608ms ± 4%   427ms ± 3%  -29.70%  (p=0.008 n=5+5)
MultiExpG1/8388608_points-96     1.11s ±11%   0.83s ±11%  -24.94%  (p=0.008 n=5+5)
MultiExpG1/16777216_points-96    1.88s ± 3%   1.61s ± 4%  -14.22%  (p=0.008 n=5+5)
MultiExpG1/33554432_points-96    3.72s ± 1%   3.14s ± 7%  -15.63%  (p=0.008 n=5+5)
MultiExpG1/67108864_points-96    6.75s ± 0%   6.10s ± 6%   -9.59%  (p=0.008 n=5+5)
MultiExpG1Reference-96           146ms ± 1%   115ms ± 1%  -20.86%  (p=0.008 n=5+5)
ManyMultiExpG1Reference-96       351ms ± 1%   268ms ± 4%  -23.64%  (p=0.008 n=5+5)
MultiExpG2/32_points-96          813µs ±11%   755µs ±29%     ~     (p=0.310 n=5+5)
MultiExpG2/64_points-96        1.38ms ±109%  0.89ms ±24%     ~     (p=0.056 n=5+5)
MultiExpG2/128_points-96       1.86ms ±107%  1.74ms ±84%     ~     (p=0.841 n=5+5)
MultiExpG2/256_points-96       2.01ms ±130%  1.53ms ±32%     ~     (p=0.841 n=5+5)
MultiExpG2/512_points-96        1.79ms ±36%  1.57ms ± 7%     ~     (p=0.310 n=5+5)
MultiExpG2/1024_points-96       2.51ms ±72%  2.49ms ±26%     ~     (p=0.222 n=5+5)
MultiExpG2/2048_points-96       2.61ms ± 3%  3.52ms ±36%  +34.83%  (p=0.008 n=5+5)
MultiExpG2/4096_points-96       3.76ms ± 9%  4.67ms ±28%  +24.39%  (p=0.016 n=5+5)
MultiExpG2/8192_points-96       6.73ms ± 6%  7.29ms ±29%     ~     (p=0.548 n=5+5)
MultiExpG2/16384_points-96      11.0ms ± 0%  15.6ms ±23%  +42.05%  (p=0.016 n=4+5)
MultiExpG2/32768_points-96      21.2ms ± 1%  25.0ms ±23%     ~     (p=0.151 n=5+5)
MultiExpG2/65536_points-96      35.5ms ± 9%  30.7ms ±28%     ~     (p=0.151 n=5+5)
MultiExpG2/131072_points-96     51.6ms ± 2%  47.4ms ±30%     ~     (p=0.690 n=5+5)
MultiExpG2/262144_points-96      101ms ± 1%   121ms ±74%     ~     (p=0.690 n=5+5)
MultiExpG2/524288_points-96      204ms ± 1%   143ms ± 2%  -29.71%  (p=0.008 n=5+5)
MultiExpG2/1048576_points-96     425ms ± 3%   338ms ± 3%  -20.56%  (p=0.016 n=5+4)
MultiExpG2/2097152_points-96     1.25s ±40%   0.65s ± 3%  -48.05%  (p=0.008 n=5+5)
MultiExpG2/4194304_points-96     1.45s ± 4%   1.19s ± 3%  -18.09%  (p=0.016 n=4+5)
MultiExpG2/8388608_points-96     2.63s ± 5%   2.22s ± 4%  -15.25%  (p=0.008 n=5+5)
MultiExpG2/16777216_points-96    4.99s ± 2%   4.48s ±23%     ~     (p=0.151 n=5+5)
MultiExpG2/33554432_points-96    8.88s ± 1%   8.20s ± 3%   -7.64%  (p=0.016 n=4+5)
MultiExpG2/67108864_points-96    16.6s ± 3%   16.1s ± 5%     ~     (p=0.095 n=5+5)
MultiExpG2Reference-96           420ms ± 4%   418ms ±55%     ~     (p=0.151 n=5+5)
ManyMultiExpG2Reference-96       961ms ± 2%   690ms ± 3%  -28.18%  (p=0.008 n=5+5)

bn254


2022/11/17 23:08:59 comparing ../../ecc/bn254 MultiExp
name                           old time/op  new time/op  delta
MultiExpG1/32_points-96         299µs ±13%   248µs ± 3%  -16.81%  (p=0.008 n=5+5)
MultiExpG1/64_points-96         340µs ±14%   324µs ±13%     ~     (p=0.151 n=5+5)
MultiExpG1/128_points-96        512µs ±16%   443µs ±17%     ~     (p=0.222 n=5+5)
MultiExpG1/256_points-96        558µs ± 6%   521µs ±23%     ~     (p=0.151 n=5+5)
MultiExpG1/512_points-96        737µs ±12%   660µs ± 9%     ~     (p=0.095 n=5+5)
MultiExpG1/1024_points-96      1.00ms ±27%  0.85ms ± 9%     ~     (p=0.222 n=5+5)
MultiExpG1/2048_points-96      1.08ms ± 7%  1.24ms ±25%     ~     (p=0.222 n=5+5)
MultiExpG1/4096_points-96      1.59ms ±23%  1.61ms ±12%     ~     (p=0.421 n=5+5)
MultiExpG1/8192_points-96      2.51ms ±14%  2.10ms ± 6%  -16.48%  (p=0.008 n=5+5)
MultiExpG1/16384_points-96     3.43ms ±10%  3.95ms ±19%     ~     (p=0.056 n=5+5)
MultiExpG1/32768_points-96     5.66ms ± 3%  6.95ms ±66%     ~     (p=0.690 n=5+5)
MultiExpG1/65536_points-96     9.42ms ±19%  8.05ms ±12%     ~     (p=0.095 n=5+5)
MultiExpG1/131072_points-96    15.6ms ± 4%  15.2ms ± 8%     ~     (p=0.548 n=5+5)
MultiExpG1/262144_points-96    22.2ms ± 2%  21.0ms ±20%     ~     (p=0.151 n=5+5)
MultiExpG1/524288_points-96    44.5ms ± 0%  43.9ms ±19%     ~     (p=0.730 n=4+5)
MultiExpG1/1048576_points-96   93.0ms ± 1%  70.7ms ± 3%  -24.02%  (p=0.008 n=5+5)
MultiExpG1/2097152_points-96    201ms ± 2%   139ms ± 1%  -31.06%  (p=0.016 n=4+5)
MultiExpG1/4194304_points-96    407ms ±12%   253ms ± 2%  -37.64%  (p=0.008 n=5+5)
MultiExpG1/8388608_points-96    744ms ± 3%   510ms ± 8%  -31.53%  (p=0.008 n=5+5)
MultiExpG1/16777216_points-96   1.31s ± 6%   1.09s ±17%  -16.96%  (p=0.008 n=5+5)
MultiExpG1/33554432_points-96   2.81s ± 4%   1.91s ±16%  -31.98%  (p=0.008 n=5+5)
MultiExpG1/67108864_points-96   5.29s ± 5%   3.79s ± 9%  -28.45%  (p=0.008 n=5+5)
MultiExpG1Reference-96         88.8ms ± 1%  83.7ms ±29%     ~     (p=0.690 n=5+5)
ManyMultiExpG1Reference-96      214ms ± 2%   267ms ±44%     ~     (p=0.690 n=5+5)
MultiExpG2/32_points-96         566µs ±15%   417µs ± 3%  -26.39%  (p=0.008 n=5+5)
MultiExpG2/64_points-96         679µs ±16%   551µs ± 6%  -18.84%  (p=0.008 n=5+5)
MultiExpG2/128_points-96        899µs ±21%   761µs ±29%     ~     (p=0.222 n=5+5)
MultiExpG2/256_points-96        852µs ± 6%  1012µs ±24%     ~     (p=0.151 n=5+5)
MultiExpG2/512_points-96       1.16ms ±14%  0.99ms ±12%  -14.66%  (p=0.032 n=5+5)
MultiExpG2/1024_points-96      1.60ms ±24%  1.39ms ±21%     ~     (p=0.056 n=5+5)
MultiExpG2/2048_points-96      1.88ms ±16%  2.16ms ±33%     ~     (p=0.151 n=5+5)
MultiExpG2/4096_points-96      2.35ms ± 1%  3.03ms ±16%  +28.85%  (p=0.008 n=5+5)
MultiExpG2/8192_points-96      3.80ms ±13%  4.02ms ±15%     ~     (p=0.310 n=5+5)
MultiExpG2/16384_points-96     6.18ms ± 9%  6.43ms ± 9%     ~     (p=0.310 n=5+5)
MultiExpG2/32768_points-96     11.3ms ± 0%  11.4ms ± 9%     ~     (p=0.690 n=5+5)
MultiExpG2/65536_points-96     18.8ms ±14%  26.8ms ±82%     ~     (p=0.690 n=5+5)
MultiExpG2/131072_points-96    28.9ms ± 5%  31.5ms ±44%     ~     (p=0.690 n=5+5)
MultiExpG2/262144_points-96    49.7ms ± 1%  58.8ms ±78%     ~     (p=0.690 n=5+5)
MultiExpG2/524288_points-96     101ms ± 1%    79ms ± 8%  -21.14%  (p=0.008 n=5+5)
MultiExpG2/1048576_points-96    207ms ± 2%   156ms ± 2%  -24.63%  (p=0.008 n=5+5)
MultiExpG2/2097152_points-96    425ms ± 7%   348ms ± 5%  -18.21%  (p=0.008 n=5+5)
MultiExpG2/4194304_points-96    734ms ± 8%   614ms ± 5%  -16.40%  (p=0.008 n=5+5)
MultiExpG2/8388608_points-96    1.24s ± 7%   1.10s ± 7%  -11.56%  (p=0.008 n=5+5)
MultiExpG2/16777216_points-96   2.29s ± 4%   2.13s ± 7%     ~     (p=0.056 n=5+5)
MultiExpG2/33554432_points-96   4.84s ± 1%   4.04s ±11%  -16.50%  (p=0.008 n=5+5)
MultiExpG2/67108864_points-96   8.91s ± 0%   7.68s ± 2%  -13.84%  (p=0.008 n=5+5)
MultiExpG2Reference-96          209ms ± 2%   144ms ± 2%  -31.10%  (p=0.016 n=4+5)
ManyMultiExpG2Reference-96      482ms ± 1%   369ms ± 2%  -23.34%  (p=0.008 n=5+5)

bw6-633


2022/11/17 23:08:59 comparing ../../ecc/bw6-633 MultiExp
name                           old time/op   new time/op   delta
MultiExpG1/32_points-96          896µs ±20%   1113µs ±43%     ~     (p=0.556 n=5+4)
MultiExpG1/64_points-96         1.29ms ±23%  2.27ms ±153%     ~     (p=1.000 n=5+5)
MultiExpG1/128_points-96       2.54ms ±181%  2.54ms ±118%     ~     (p=0.841 n=5+5)
MultiExpG1/256_points-96        1.67ms ±69%   1.49ms ±19%     ~     (p=0.905 n=5+4)
MultiExpG1/512_points-96        1.99ms ±17%   2.49ms ±79%     ~     (p=0.548 n=5+5)
MultiExpG1/1024_points-96       2.48ms ±16%   2.30ms ± 7%     ~     (p=0.413 n=5+4)
MultiExpG1/2048_points-96       2.82ms ± 3%   3.39ms ±15%  +20.03%  (p=0.008 n=5+5)
MultiExpG1/4096_points-96       4.26ms ±10%   4.07ms ± 2%   -4.64%  (p=0.032 n=5+5)
MultiExpG1/8192_points-96       6.91ms ± 6%   6.09ms ± 1%  -11.81%  (p=0.008 n=5+5)
MultiExpG1/16384_points-96      11.5ms ±11%   15.2ms ±75%     ~     (p=0.151 n=5+5)
MultiExpG1/32768_points-96      18.5ms ± 5%   19.9ms ±10%     ~     (p=0.056 n=5+5)
MultiExpG1/65536_points-96      34.1ms ± 0%   43.5ms ±12%  +27.49%  (p=0.008 n=5+5)
MultiExpG1/131072_points-96     66.6ms ± 3%   86.2ms ±42%  +29.48%  (p=0.008 n=5+5)
MultiExpG1/262144_points-96      126ms ± 1%    117ms ± 1%   -6.88%  (p=0.008 n=5+5)
MultiExpG1/524288_points-96      307ms ± 2%    292ms ± 6%     ~     (p=0.056 n=5+5)
MultiExpG1/1048576_points-96     487ms ± 4%    306ms ± 1%  -37.14%  (p=0.008 n=5+5)
MultiExpG1/2097152_points-96     791ms ± 5%    535ms ± 2%  -32.43%  (p=0.008 n=5+5)
MultiExpG1/4194304_points-96     1.34s ± 3%    0.99s ± 2%  -26.16%  (p=0.008 n=5+5)
MultiExpG1/8388608_points-96     2.36s ± 6%    1.88s ± 2%  -20.46%  (p=0.008 n=5+5)
MultiExpG1/16777216_points-96    4.30s ± 5%    3.75s ± 7%  -12.82%  (p=0.008 n=5+5)
MultiExpG1/33554432_points-96    8.26s ± 1%    7.06s ± 2%  -14.58%  (p=0.008 n=5+5)
MultiExpG1/67108864_points-96    16.3s ± 1%    14.1s ± 3%  -13.40%  (p=0.008 n=5+5)
MultiExpG1Reference-96           479ms ± 1%    302ms ± 2%  -36.95%  (p=0.008 n=5+5)
ManyMultiExpG1Reference-96       1.12s ± 3%    0.80s ± 1%  -28.59%  (p=0.008 n=5+5)
MultiExpG2/32_points-96        1.28ms ±109%   0.72ms ± 7%  -43.74%  (p=0.008 n=5+5)
MultiExpG2/64_points-96         1.00ms ±17%  2.76ms ±103%     ~     (p=0.730 n=4+5)
MultiExpG2/128_points-96       2.17ms ±116%   1.26ms ±51%     ~     (p=0.095 n=5+5)
MultiExpG2/256_points-96       3.46ms ±124%   5.46ms ±73%     ~     (p=0.548 n=5+5)
MultiExpG2/512_points-96        1.70ms ±16%   1.77ms ± 7%     ~     (p=0.413 n=5+4)
MultiExpG2/1024_points-96       2.02ms ± 1%   2.37ms ±11%  +17.74%  (p=0.016 n=4+5)
MultiExpG2/2048_points-96       2.66ms ± 0%   3.25ms ± 5%  +22.32%  (p=0.029 n=4+4)
MultiExpG2/4096_points-96       3.88ms ± 5%   4.16ms ± 7%   +7.30%  (p=0.032 n=5+5)
MultiExpG2/8192_points-96       6.14ms ± 6%   6.08ms ± 6%     ~     (p=0.548 n=5+5)
MultiExpG2/16384_points-96      10.7ms ± 6%   15.1ms ± 9%  +40.36%  (p=0.008 n=5+5)
MultiExpG2/32768_points-96      18.4ms ± 2%   23.4ms ± 6%  +27.63%  (p=0.008 n=5+5)
MultiExpG2/65536_points-96      33.9ms ± 0%   53.7ms ±13%  +58.50%  (p=0.008 n=5+5)
MultiExpG2/131072_points-96     66.6ms ± 2%   71.1ms ± 7%     ~     (p=0.056 n=5+5)
MultiExpG2/262144_points-96      126ms ± 3%    136ms ±19%     ~     (p=0.095 n=5+5)
MultiExpG2/524288_points-96      550ms ±60%    317ms ± 6%     ~     (p=0.095 n=5+5)
MultiExpG2/1048576_points-96     506ms ± 3%    356ms ± 1%  -29.54%  (p=0.008 n=5+5)
MultiExpG2/2097152_points-96     828ms ± 4%    578ms ± 1%  -30.20%  (p=0.016 n=4+5)
MultiExpG2/4194304_points-96     1.32s ± 1%    1.05s ± 6%  -20.13%  (p=0.008 n=5+5)
MultiExpG2/8388608_points-96     2.33s ± 3%    1.93s ± 9%  -17.08%  (p=0.008 n=5+5)
MultiExpG2/16777216_points-96    4.58s ±31%    3.60s ± 2%  -21.47%  (p=0.016 n=5+4)
MultiExpG2/33554432_points-96    8.24s ± 2%    6.97s ± 4%  -15.35%  (p=0.008 n=5+5)
MultiExpG2/67108864_points-96    16.5s ± 2%    14.0s ± 3%  -15.03%  (p=0.008 n=5+5)
MultiExpG2Reference-96           485ms ± 4%    303ms ± 1%  -37.53%  (p=0.008 n=5+5)
ManyMultiExpG2Reference-96       1.11s ± 1%    0.80s ± 2%  -28.34%  (p=0.008 n=5+5)

bw6-761


2022/11/17 23:08:59 comparing ../../ecc/bw6-761 MultiExp
name                           old time/op   new time/op   delta
MultiExpG1/32_points-96         1.08ms ± 7%   1.29ms ±26%     ~     (p=0.310 n=5+5)
MultiExpG1/64_points-96         2.00ms ±56%   1.37ms ±47%     ~     (p=0.095 n=5+5)
MultiExpG1/128_points-96       3.32ms ±174%   3.65ms ±90%     ~     (p=0.841 n=5+5)
MultiExpG1/256_points-96       3.41ms ±145%   1.84ms ±11%     ~     (p=0.286 n=5+4)
MultiExpG1/512_points-96        2.37ms ± 9%   2.55ms ±19%     ~     (p=0.222 n=5+5)
MultiExpG1/1024_points-96       3.23ms ±12%   2.83ms ± 5%  -12.54%  (p=0.032 n=5+4)
MultiExpG1/2048_points-96       4.59ms ± 9%   4.41ms ±28%     ~     (p=0.310 n=5+5)
MultiExpG1/4096_points-96       7.53ms ± 9%   5.56ms ±19%  -26.18%  (p=0.008 n=5+5)
MultiExpG1/8192_points-96       11.5ms ±12%    7.7ms ± 4%  -32.67%  (p=0.016 n=5+4)
MultiExpG1/16384_points-96      21.3ms ±22%   13.1ms ± 2%  -38.57%  (p=0.008 n=5+5)
MultiExpG1/32768_points-96      37.2ms ± 3%   29.2ms ±18%  -21.51%  (p=0.008 n=5+5)
MultiExpG1/65536_points-96      65.3ms ± 5%   47.1ms ±13%  -27.85%  (p=0.008 n=5+5)
MultiExpG1/131072_points-96      109ms ± 2%     94ms ±19%     ~     (p=0.151 n=5+5)
MultiExpG1/262144_points-96      287ms ±93%    293ms ±18%     ~     (p=0.222 n=5+5)
MultiExpG1/524288_points-96      387ms ±12%    331ms ±24%     ~     (p=0.095 n=5+5)
MultiExpG1/1048576_points-96     641ms ±16%    513ms ± 5%  -19.99%  (p=0.008 n=5+5)
MultiExpG1/2097152_points-96     1.13s ±13%    0.81s ±16%  -28.44%  (p=0.008 n=5+5)
MultiExpG1/4194304_points-96     2.15s ± 5%    1.32s ± 9%  -38.73%  (p=0.008 n=5+5)
MultiExpG1/8388608_points-96     4.06s ± 3%    2.68s ±12%  -34.13%  (p=0.008 n=5+5)
MultiExpG1/16777216_points-96    7.98s ± 8%    4.79s ± 2%  -40.00%  (p=0.008 n=5+5)
MultiExpG1/33554432_points-96    15.3s ± 1%     9.4s ± 2%  -38.72%  (p=0.008 n=5+5)
MultiExpG1/67108864_points-96    30.8s ± 6%    18.3s ± 2%  -40.51%  (p=0.008 n=5+5)
MultiExpG1Reference-96           729ms ±24%  2010ms ±200%     ~     (p=0.690 n=5+5)
ManyMultiExpG1Reference-96       1.80s ± 2%    1.26s ± 6%  -30.07%  (p=0.008 n=5+5)
MultiExpG2/32_points-96        1.45ms ±109%   1.31ms ±77%     ~     (p=1.000 n=5+5)
MultiExpG2/64_points-96         6.02ms ±57%  2.29ms ±106%     ~     (p=0.056 n=5+5)
MultiExpG2/128_points-96       4.97ms ±103%   3.37ms ±81%     ~     (p=0.690 n=5+5)
MultiExpG2/256_points-96        2.11ms ±16%   2.00ms ± 4%     ~     (p=0.486 n=4+4)
MultiExpG2/512_points-96       5.80ms ±100%  4.14ms ±133%     ~     (p=0.690 n=5+5)
MultiExpG2/1024_points-96       3.00ms ±16%  8.83ms ±122%     ~     (p=0.421 n=5+5)
MultiExpG2/2048_points-96       4.29ms ±13%   4.03ms ± 7%     ~     (p=0.421 n=5+5)
MultiExpG2/4096_points-96       6.16ms ±15%   6.87ms ±70%     ~     (p=0.690 n=5+5)
MultiExpG2/8192_points-96       10.4ms ±17%    8.1ms ± 5%  -21.79%  (p=0.016 n=5+4)
MultiExpG2/16384_points-96      15.3ms ±12%   14.7ms ±15%     ~     (p=0.690 n=5+5)
MultiExpG2/32768_points-96      24.6ms ± 3%   25.8ms ±26%     ~     (p=0.548 n=5+5)
MultiExpG2/65536_points-96      46.4ms ± 1%   52.1ms ±23%     ~     (p=0.222 n=5+5)
MultiExpG2/131072_points-96      107ms ± 2%    105ms ±16%     ~     (p=1.000 n=5+5)
MultiExpG2/262144_points-96      255ms ±27%    264ms ± 3%     ~     (p=0.286 n=5+4)
MultiExpG2/524288_points-96      384ms ± 7%    312ms ±13%  -18.65%  (p=0.016 n=5+4)
MultiExpG2/1048576_points-96    1.09s ±108%    0.47s ±12%  -57.09%  (p=0.008 n=5+5)
MultiExpG2/2097152_points-96     1.26s ± 9%   1.49s ±112%     ~     (p=1.000 n=5+5)
MultiExpG2/4194304_points-96     2.28s ± 1%    1.39s ± 5%  -38.98%  (p=0.016 n=5+4)
MultiExpG2/8388608_points-96     4.30s ± 1%    2.71s ± 6%  -36.94%  (p=0.008 n=5+5)
MultiExpG2/16777216_points-96    8.26s ± 5%    5.15s ±14%  -37.65%  (p=0.008 n=5+5)
MultiExpG2/33554432_points-96    15.8s ± 1%     9.2s ± 2%  -41.35%  (p=0.016 n=4+5)
MultiExpG2/67108864_points-96    31.2s ± 3%    18.4s ± 1%  -40.99%  (p=0.008 n=5+5)
MultiExpG2Reference-96           705ms ± 6%   1756ms ±82%     ~     (p=0.690 n=5+5)
ManyMultiExpG2Reference-96       1.85s ± 4%    1.21s ±11%  -34.66%  (p=0.008 n=5+5)

0x0ece · 2022-11-18T18:23:00Z

In practice (for uniformly distributed points), the slow down is ~5%, but worth it to avoid too many code path / edge cases.

Just FYI, this is the part in our paper where we essentially said "we don't know how to build an efficient + compact scheduler", as the 5% slow down for small MSM defeats batch affine (in g1, on Intel -- I'm glad you guys found wider applications!).

The issue is that keeping track of which buckets are used costs 1 mem write for each iteration of the scheduler, and that's surprisingly non-trivial (you're storing a bool, we also tried with uint to avoid reset, but similar result). For what is worth, since you can only have at most 100 bool set to true, another approach is to just keep the ids in the queue and compare with those... in hardware we're doing that because you can run all compare in parallel. In software, again, we weren't able to do it fast enough (faster than this current method, or faster than letting the queue grow without mem writes).

Anyway, I just wanted to remark, either for you or for anyone else lurking, that there might be an interesting trick here to get an "easy" 5% gain :)

yelhousni

@gbotrel This looks great!

regarding the TODO on double() and unsafeFromJacExtended() when _p=0

normally the point at infinity lies in Z=0. In Jacobian coordinates it is any (t^2:t^3:0) so the convention is to take (1:1:0) (as we do in gnark-crypto) and in projective coordinates it is any (0:t:0) and the convention is to take (0:1:0). In cryptography since we don't need this point to be on the curve (because we don't use it in formulas) we just check that Z=0. Some references take (0:1:0) for both coordinates systems.

Because of this, the formulas in double() outputs always Z=0 when fed with a point at infinity. Same for unsafeFromJacExtended() (we can even rename it FromJacExtended() since the result is always (0:0:0)).

gbotrel added 12 commits November 7, 2022 16:12

feat: ported msm-affine

e15af20

build: updated go.mod to go 1.18 req

853bfb1

feat: started to factorize msm impl through generics

9d170ef

feat,style: factorize code between extjac and affine msm using generics

5edbf30

docs: added a todo in tmpl

16352cc

feat: partitionScalars return list of digits unpacked

95e4305

feat: gymnastic to ensure buckets are on the stack -- compiler hints

c8613e8

feat: toying with batch size

653877c

perf: msm affine OK on x86

e43bb76

test: gen scalars and bases in parallel

bc85933

test: add BatchAdd benchmark

091d0d5

feat: add bitset to do quick bucket presence check in batch

f4b4eea

gbotrel added the perf label Nov 9, 2022

gbotrel added this to the v0.9.0 milestone Nov 9, 2022

gbotrel requested a review from yelhousni November 9, 2022 22:00

gbotrel marked this pull request as draft November 9, 2022 22:00

gbotrel added 14 commits November 10, 2022 15:52

feat: restored split msm logic

b75ae09

fix: restore previous way to generate scalars in benches

227a8f2

fix: fix splitting logic in msm

f973cf4

feat: store neg(P) and P in opposite sides of batch add input slice

85e6ea0

feat: revert part of previous commit

decae89

feat: since we cap c==16 we may as well use uint16

d60bf24

perf: allocate batch affine arrays on the stack with generics

3fd6c7e

build: fix import in template

2543ac3

feat: use nbBits+1 instead of nbWords*64 for partitionScalars

5733bd2

style: cosmetics

533743e

style: code cleaning

59eb243

feat: added chunkStats instead of small values

555ca0d

test: added msm benchmarks with small values and redundancy

989a932

test: update worst case benchmark for batch affine msm

52e5eaa

gbotrel added 15 commits November 15, 2022 14:30

feat: start to add statistics when parsing scalars in msm

52191c9

checkpoint

1a91d5a

checkpoint

fcdcbfd

checkpoint

b096408

style: added comments and clean msm

df5fcdf

fix: fix for small window size no need for stats

dc404e5

test: restore test for all C

e76541e

test: restore bench batchadd

2a8d8e6

bug: bug when c==1 msm ext jac incorrect

6049e2f

test: added cross msm tests

3133efd

style: make staticcheck happier by code generating bucket sizes in clear

37ae24e

feat: deal with doubling edge case using other set of buckets

0eb6955

test: add some doublings in msm test

c1ec769

fix: msm partitionScalar - handle edge cases with carry

4dbc364

fix: add panic in generator when c > 16

e3b29f7

gbotrel mentioned this pull request Nov 17, 2022

BatchScalarMultiplication uses partitionScalarOld: merge with partitionScalar #268

Closed

gbotrel marked this pull request as ready for review November 18, 2022 04:01

gbotrel changed the title ~~[draft]batch affine msm + generic in msm~~ Batch affine msm + generic in msm Nov 18, 2022

yelhousni added 2 commits November 21, 2022 18:22

perf: remove 3 muls by 1 in batchAdd

3e0f0f3

docs: add comment regarding double(infinity) in ext-Jac

9673409

yelhousni approved these changes Nov 21, 2022

View reviewed changes

gbotrel merged commit 37c3c93 into develop Nov 21, 2022

gbotrel deleted the feat/msm-affine branch November 21, 2022 20:47

0x0ece mentioned this pull request Sep 25, 2023

Mixed Add Scheduler Question JumpCrypto/cyclone#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch affine msm + generic in msm #261

Batch affine msm + generic in msm #261

gbotrel commented Nov 9, 2022 •

edited

Loading

0x0ece commented Nov 18, 2022

yelhousni left a comment

Batch affine msm + generic in msm #261

Batch affine msm + generic in msm #261

Conversation

gbotrel commented Nov 9, 2022 • edited Loading

Some remarks on the msm-affine

Benchmarks

without split logic (we only use as many cores as nbChunks)

with split logic (more cores == we split the msm)

0x0ece commented Nov 18, 2022

yelhousni left a comment

Choose a reason for hiding this comment

gbotrel commented Nov 9, 2022 •

edited

Loading