Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test and bench for eval_polynomial and compute_inner_product #559

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ashWhiteHat
Copy link
Contributor

@ashWhiteHat ashWhiteHat commented Apr 25, 2022

No description provided.

@ashWhiteHat
Copy link
Contributor Author

It seems there are more efficient optimization in following comment.
#548 (comment)
I reverted my change so this is just bench and test for eval_polynomial and compute_inner_product.

@daira daira changed the title Parallelize compute_inner_product Test and bench for eval_polynomial and compute_inner_product Apr 26, 2022
@daira
Copy link
Contributor

daira commented Apr 26, 2022

Quote-replying just to preserve the original comment, which does not describe the current PR contents.

Test and bench for eval_polynomial and compute_inner_product

Found more efficient solution and reverted Prev PR. This is just test and bench for eval_polynomial and compute_inner_product

Prev PR

# Parallelize `compute_inner_product`

Hi there.
I introduced parallelize to `compute_inner_product`.

## What I did

### Introduce parallelize

Introduce parallelize using rayon `join` and recursive function.

### Test

Write the test which checks `compute_inner_product` works fine.

### Bench

Add `compute_inner_product` bench which checks performance.
I would like to check exact performance so I set the `profile.bench` and closer to production environment.

### Other

I told that `all recursive` is the best option but it seems wrong.
The `rayon::join` uses the [`work-stealing`](https://en.wikipedia.org/wiki/Work_stealing#:~:text=In%20parallel%20computing%2C%20work%20stealing,of%20processors%20(or%20cores).) and the efficiency is related to `task size` and `multiplicity`.
For small task size, the shallower multiplicity, the faster performance.
For larger task size, the deeper multiplicity, the faster performance.

The `mul` operation on Field is small task so I bundle some tasks to out at a higher n as following.
https://github.com/NoCtrlZ/halo2/blob/feat/coeffs-arithmetic-optimization/halo2_proofs/src/arithmetic.rs#L287

## Bench

The optimized `compute_inner_product` us faster for almost all k except `12` and `13`.
I benched it locally with thread num = 8 and following is the result.

| k | default | recursive |
| ---- | ---- | ---- |
|3| 91.496 us 92.714 us 94.384 us | 88.904 us 89.356 us 89.869 us |
|4| 182.11 us 184.99 us 188.17 us | 176.85 us 177.94 us 179.15 us |
|5| 365.83 us 372.23 us 379.90 us | 351.58 us 352.67 us 353.86 us |
|6| 706.55 us 712.62 us 719.54 us | 701.19 us 702.64 us 704.16 us |
|7| 1.4597 ms 1.4862 ms 1.5121 ms | 1.4050 ms 1.4095 ms 1.4144 ms |
|8| 2.9451 ms 3.0102 ms 3.1018 ms | 2.8082 ms 2.8175 ms 2.8274 ms |
|9| 5.7104 ms 5.7810 ms 5.8706 ms | 5.7026 ms 5.7158 ms 5.7300 ms |
|10| 11.682 ms 11.904 ms 12.171 ms | 11.401 ms 11.439 ms 11.484 ms |
|11| 22.588 ms 22.683 ms 22.778 ms | 22.631 ms 22.682 ms 22.737 ms |
|12| 44.691 ms 44.839 ms 44.992 ms | 45.028 ms 45.111 ms 45.196 ms |
|13| 90.120 ms 90.559 ms 91.058 ms | 89.995 ms 90.650 ms 91.797 ms |
|14| 182.29 ms 186.16 ms 191.74 ms | 179.63 ms 179.96 ms 180.33 ms |
|15| 381.50 ms 387.02 ms 393.56 ms | 359.97 ms 360.73 ms 361.59 ms | 
|16| 761.80 ms 769.39 ms 777.99 ms | 718.81 ms 720.44 ms 722.32 ms |
|17| 1.5566 s 1.5844 s 1.6162 s | 1.4663 s 1.4800 s 1.4959 s |
|18| 3.0584 s 3.0874 s 3.1217 s | 2.9104 s 2.9301 s 2.9554 s |

## Uncertainty

I think `best_fft` perform more complicated operation than `compute_inner_product` but `best_fft` is far [faster](https://github.com/zcash/halo2/pull/543#issuecomment-1104646161) even same k.

I would appreciate it if you confirm. Thank you!

@daira
Copy link
Contributor

daira commented Apr 26, 2022

Please force-push this PR so that it only touches the test and benchmarking code.

@ashWhiteHat ashWhiteHat force-pushed the feat/coeffs-arithmetic-optimization branch from f71c276 to 8544f14 Compare April 26, 2022 10:57
@ashWhiteHat
Copy link
Contributor Author

Hi @daira
I fixed the PR according to your review.
I would appreciate it if you would confirm.
Thank you!

Cargo.toml Outdated Show resolved Hide resolved
halo2_proofs/benches/poly.rs Outdated Show resolved Hide resolved
halo2_proofs/benches/poly.rs Outdated Show resolved Hide resolved
halo2_proofs/benches/poly.rs Outdated Show resolved Hide resolved
halo2_proofs/benches/poly.rs Outdated Show resolved Hide resolved
@ashWhiteHat ashWhiteHat requested a review from str4d May 12, 2022 04:33
@ashWhiteHat ashWhiteHat force-pushed the feat/coeffs-arithmetic-optimization branch from 6f17780 to 2508407 Compare May 12, 2022 04:50
Copy link
Collaborator

@therealyingtong therealyingtong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

halo2_proofs/src/arithmetic.rs Outdated Show resolved Hide resolved
halo2_proofs/src/arithmetic.rs Outdated Show resolved Hide resolved
@ashWhiteHat ashWhiteHat force-pushed the feat/coeffs-arithmetic-optimization branch from 2508407 to f6ee02e Compare May 31, 2022 15:13
@ashWhiteHat ashWhiteHat force-pushed the feat/coeffs-arithmetic-optimization branch from f6ee02e to 647d61c Compare May 31, 2022 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants