Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[perf] Optimize FriReducedOpeningChip #1248

Merged
merged 6 commits into from
Jan 23, 2025
Merged

[perf] Optimize FriReducedOpeningChip #1248

merged 6 commits into from
Jan 23, 2025

Conversation

nyunyunyunyu
Copy link
Contributor

@nyunyunyunyu nyunyunyunyu commented Jan 22, 2025

closes INT-3055

Reduced # of columns of FriReducedOpeningChip from 64 to 26. A side effect is that we use a more efficient way to compute alpha_pow.

Notes for reviewers:

  • All meaningful changes are in extensions/native/circuit/src/fri/mod.rs. I rewrite everything so diff view is not helpful.
  • The rows of the new implementation has 3 variants: WorkloadCols/ Instruction1Cols/ Instruction2Cols. I use lots of static assertions in order to guarantee that some columns in these variants are aligned.

WorkloadCols:

  • 1 instruction has length rows of WorkloadCols.
  • Each row reads a/b from memory and write local.result * local.alpha + b - a into next.result

Instruction1Cols:

  • 1 instruction has only 1 row of Instruction1Cols after all WorkloadCols.
  • This rows reads a_ptr/b_ptr/length/alpha and write local.result into local.result_ptr.
  • We constraint a_ptr/b_ptr/alpha in WorkloadCols above are from this row.
  • This rows also handle the PC interaction of the instruction.

Instruction2Cols:

  • 1 instruction has only 1 row of Instruction2Cols after Instruction1Cols.
  • This is just for some memory aux columns to reduce the overall width.

fib_e2e before this PR(ref):

Summary Proof Time (s) Parallel Proof Time (s)
Total 455.70 316.94
fib_e2e 41.24 6.33
leaf 55.94 8.36
internal.0 56.76 16.20
internal.1 31.91 16.20
internal.2 16.26 16.26
root 39.22 39.22
halo2_outer 168.55 168.55
halo2_wrapper 45.82 45.82

fib_e2e after this PR(ref):

Summary Proof Time (s) Parallel Proof Time (s)
Total 433.68 298.21
fib_e2e 40.31 6.25
leaf 54.28 8.29
internal.0 55.42 15.67
internal.1 31.41 15.73
internal.2 15.91 15.91
root 39.70 39.70
halo2_outer 155.79 155.79
halo2_wrapper 40.86 40.86

Proof time of some verifier drops ~30% but others keep same. I'm surprised halo2_outer proof time increases because the number column decreases.

@nyunyunyunyu nyunyunyunyu added run-benchmark triggers benchmark workflows on the pr run-benchmark-e2e labels Jan 22, 2025

This comment has been minimized.

@nyunyunyunyu nyunyunyunyu force-pushed the feat/opt-fri-reduce branch 2 times, most recently from edc6db6 to 259a25f Compare January 22, 2025 16:24

This comment has been minimized.

@nyunyunyunyu nyunyunyunyu force-pushed the feat/opt-fri-reduce branch 2 times, most recently from 6258838 to ffff287 Compare January 22, 2025 17:13

This comment has been minimized.

@nyunyunyunyu nyunyunyunyu marked this pull request as ready for review January 22, 2025 17:48

This comment has been minimized.

@yi-sun
Copy link
Collaborator

yi-sun commented Jan 23, 2025

Can you also update the ISA documentation?

extensions/native/circuit/src/utils.rs Outdated Show resolved Hide resolved
extensions/native/compiler/tests/fri_ro_eval.rs Outdated Show resolved Hide resolved
extensions/native/compiler/tests/fri_ro_eval.rs Outdated Show resolved Hide resolved
@jonathanpwang jonathanpwang changed the title [chore] Optimize FriReducedOpeningChip [perf] Optimize FriReducedOpeningChip Jan 23, 2025
@nyunyunyunyu nyunyunyunyu requested a review from yi-sun January 23, 2025 06:58

This comment has been minimized.

docs/specs/ISA.md Outdated Show resolved Hide resolved
Copy link

group app.proof_time_ms app.cycles app.cells_used leaf.proof_time_ms leaf.cycles leaf.cells_used
verify_fibair (+113 [+5.0%]) 2,361 513,309 (-28071 [-0.1%]) 19,357,682 - - -
fibonacci_program (+38 [+0.6%]) 6,109 1,500,137 51,487,838 (-94 [-1.2%]) 7,966 (-16002 [-0.9%]) 1,817,697 (-5225391 [-6.9%]) 70,261,826
regex_program (-703 [-3.8%]) 17,821 4,190,904 165,010,909 (-2029 [-10.6%]) 17,127 (-21360 [-0.7%]) 3,007,473 (-21669219 [-13.3%]) 141,617,522
fib_e2e 40,273 12,000,137 410,699,582 54,976 11,373,351 435,143,222
ecrecover_program (+5 [+0.2%]) 2,616 285,401 15,075,033 (-1145 [-5.1%]) 21,260 (-28047 [-0.7%]) 4,138,626 (-38606313 [-16.0%]) 202,803,516

Commit: 0ff18c5

Benchmark Workflow

) {
let local: &Instruction2Cols<AB::Var> = local_slice[..INS_2_WIDTH].borrow();
let next: &WorkloadCols<AB::Var> = next_slice[..WL_WIDTH].borrow();
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this is necessary, shouldn't it already be covered by lines 302-303?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, the last row is a WorkloadCols with idx = -1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this case will be avoided by other constraints. But this sets an explicit boundary condition just for more safety

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the case that the last row is a workload row with index -1 is prevented by the fact that an instruction1 row is always followed by an instruction2 row (lines 302-303), an instruction2 row is always followed by a workload row with index 0 (line 403), and a disabled row other than the last is always followed by another disabled row (line 193), so a workload row with index -1 can only be preceded by a workload row with index -2, and so on.

The condition you have here is anyway not preventing workload rows with index -1 that appear in places other than the last row. I guess we can keep it just in case but if you can verify that it's not needed I think it's better to remove to make it less confusing later

)
.eval(builder, multiplicity);
{
let mut when_transition = builder.when_transition();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why it's necessary to restrict to when_transition here -- didn't we already restrict that first row has to be the first workload row of a FriReducedOpening, so last row necessary cannot be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you are right

a: &[F],
b: &[[F; EXT_DEG]],
) -> ([F; EXT_DEG], [F; EXT_DEG]) {
) -> [F; EXT_DEG] {
let mut alpha_pow: [F; EXT_DEG] = elem_to_ext(F::ONE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Shouldn't we be using EF in general here rather than [F; EXT_DEG]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is EF?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic for field extension

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm how should I change? [F; EXT_DEG] is not a generic

Copy link
Contributor

@zlangley zlangley Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to change it in this PR. But we would make FriReducedOpeningChip also generic over EF, which would have the trait bound EF: ExtensionField<F>. Anyway, I'm not sure we want to do it; I suppose the AIR stills needs to constrain field extension multiplication for a very specific field extension, not a generic one.

@TlatoaniHJ
Copy link
Contributor

I looked through the AIR constraints and it seems to be sound -- I noted two places above where I think there are unnecessary constraints but they are not harmful.

@TlatoaniHJ TlatoaniHJ self-requested a review January 23, 2025 18:29
Copy link
Contributor

@TlatoaniHJ TlatoaniHJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will merge, above comments can be addressed in future

@TlatoaniHJ TlatoaniHJ merged commit 1ee9fba into main Jan 23, 2025
23 checks passed
@TlatoaniHJ TlatoaniHJ deleted the feat/opt-fri-reduce branch January 23, 2025 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-benchmark triggers benchmark workflows on the pr run-benchmark-e2e
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants