Simplify auxiliary columns build #1140

Al-Kindi-0 · 2023-11-07T19:20:41Z

Describe your changes

This PR proposes a simplification to the auxiliary columns building procedure.
This is a wip/poc version without any optimizations.

Checklist before requesting a review

Repo forked and branch created from next according to naming convention.
Commit messages and codestyle follow conventions.
Relevant issues are linked in the PR description.
Tests added for new functionality.
Documentation/comments updated according to changes.

bobbinth

Looks good! Thank you! I left a few comments inline - mostly about improving code readability/modularity.

processor/src/chiplets/aux_trace/mod.rs

bobbinth

Looks good! Thank you! I left a few small comments inline. A couple of additional thoughts:

It seems like this change results in roughly doubling of the time it takes to build the auxiliary trace (tested by running the Fibonacci example for $2^20$ cycles). This is not a huge deal as overall contribution to proof generation time is pretty small, but I'd like to understand which parts of the new approach contribute the most to the runtime.
I'm wondering if it would make sense to combine some of the functions. For example, the logic in build_mloadw_request and build_mstorew_request is nearly identical - so, combining these two should be pretty straight-forward. Similar approach can probably be applied to some other functions as well.

Also, we should probably remove the old "hint" structure as I don't think it is needed for anything chiplet-related any more (this can be done in a separate commit so that we can roll things back if needed).

Overall, I'd probably make this PR about chiplet bus only and then address other buses/virtual tables in different PRs (which build on this PR). This way, we have relatively small PRs for each change, and then we can merge them together once the whole structure has been migrated.

processor/src/chiplets/aux_trace/mod.rs

Al-Kindi-0 · 2023-11-26T11:25:20Z

* It seems like this change results in roughly doubling of the time it takes to build the auxiliary trace (tested by running the Fibonacci example for 220 cycles). This is not a huge deal as overall contribution to proof generation time is pretty small, but I'd like to understand which parts of the new approach contribute the most to the runtime.

Added a description of the timings for the Fibonacci example above.

* I'm wondering if it would make sense to combine some of the functions. For example, the logic in `build_mloadw_request` and `build_mstorew_request` is nearly identical - so, combining these two should be pretty straight-forward. Similar approach can probably be applied to some other functions as well.

I have now combined a few additional functions. Let me know if you see further simplifications.

Also, we should probably remove the old "hint" structure as I don't think it is needed for anything chiplet-related any more (this can be done in a separate commit so that we can roll things back if needed).

Will remove them in at the end in a separate commit, if that is OK.

Overall, I'd probably make this PR about chiplet bus only and then address other buses/virtual tables in different PRs (which build on this PR). This way, we have relatively small PRs for each change, and then we can merge them together once the whole structure has been migrated.

Makes sense.

bobbinth

Looks good! Thank you! I left a few more small comments inline - after these, we are good to merge. But as mentioned in the previous review, let's hold off on merging until the whole structure is refactored.

processor/src/chiplets/aux_trace/mod.rs

bobbinth · 2023-12-13T02:10:31Z

I've cleaned up commit history a bit and merged all downstream PRs into this one (and also rebased this PR from the latest next).

Let's not merge it yet though. We can probably do a bit more cleanup - and I'd like to do it on top of this PR before we merge into next.

bobbinth · 2023-12-13T02:18:11Z

Overall, with the changes as they stand, the time to build auxiliary trace increased about 10x (from ~25ms to ~250ms). But we haven't optimized this at all, and nothing is parallelized. So, overall, I think we should be able to get this down to 25 ms (or maybe even less).

And if GKR changes work, we may be able to reduce it even more (though, I think we'll need to understand GKR proving time a bit more before we can make this claim).

bobbinth · 2023-12-13T02:35:32Z

One of the things we can do in terms of code improvements is to consolidate all MainTrace structs and move them into the air crate (under air/src/trace).

Al-Kindi-0 · 2023-12-20T10:50:50Z

One of the things we can do in terms of code improvements is to consolidate all MainTrace structs and move them into the air crate (under air/src/trace).

Makes sense!

Al-Kindi-0 · 2023-12-20T10:54:26Z

Overall, with the changes as they stand, the time to build auxiliary trace increased about 10x (from ~25ms to ~250ms). But we haven't optimized this at all, and nothing is parallelized. So, overall, I think we should be able to get this down to 25 ms (or maybe even less).

That's plausible. Will try to optimize it after refactoring MainTrace.

And if GKR changes work, we may be able to reduce it even more (though, I think we'll need to understand GKR proving time a bit more before we can make this claim).

I would bet yes since as far as I can see there are no inversions when evaluating the GKR circuit. However, there will be some extra costs with the GKR prover that are non-negligible.

Al-Kindi-0 · 2023-12-21T13:18:38Z

I have now consolidated the helper struct and methods, improved documentation further and did some minor cleanups. Managed to find a nice way to build the column and batch invert at the same time, this should hopefully reduce the build times.

Al-Kindi-0 · 2023-12-21T17:39:36Z

I have also looked at parallelizing the auxiliary columns build and I don't see a way to do it without making some changes to Winterfell. I propose that we tackle this in a separate PR as it is a self-contained task.

bobbinth · 2023-12-22T07:58:04Z

I have also looked at parallelizing the auxiliary columns build and I don't see a way to do it without making some changes to Winterfell. I propose that we tackle this in a separate PR as it is a self-contained task.

The place to parallelize is roughly here - so, I think we can avoid changes in Winterfell. I do agree that it should be a separate PR - but probably based on this PR before we merge.

plafer

Removing the hints stuff is very refreshing! However, I think we lost some good abstractions that were present before (as mentioned in the review).

Not a full review yet; I still need to think more about how to bring some of these abstractions back. Suggestions obviously welcome :)

plafer · 2024-01-08T13:06:56Z

processor/src/chiplets/aux_trace/mod.rs

+{
+    let mut multiplicand = E::ONE;
+    if i % 8 == 7 {
+        let op_label = get_op_label(ONE, ZERO, is_xor, ZERO);


Should we use the column values from the trace itself (selector0, etc) instead of hardcoding the column values? Harcoding values like this makes it all the more difficult to change the layout of the chiplets in the future.

I would personally change get_op_label() to

fn get_op_label(main_trace: &MainTrace, row_idx: usize) -> Felt { // fetch selector columns from main trace and compute label }

and let the compiler optimize the memory accesses

Yes, we can. Some of the "hardcoding" is due to there being only one possible value of the flag given some previous check.
I have now factored the computation of the flag outside of the if blocks.

plafer · 2024-01-08T13:19:05Z

processor/src/chiplets/aux_trace/mod.rs

+    E: FieldElement<BaseField = Felt>,
+{
+    let mut multiplicand = E::ONE;
+    if i % 8 == 7 {


Suggested change

if i % 8 == 7 {

if i % bitwise::OP_CYCLE_LEN == bitwise::OP_CYCLE_LEN - 1 {

Can we limit the use of magic numbers elsewhere too (e.g. in build_hasher_chiplet_responses)?

processor/src/chiplets/aux_trace/mod.rs

plafer · 2024-01-08T14:11:30Z

processor/src/chiplets/aux_trace/mod.rs

+pub struct BusTraceBuilder {}
+
+/// Builds the chiplets bus auxiliary trace column.
+fn build_bus_aux_column<E>(main_trace: &ColMatrix<Felt>, alphas: &[E]) -> Vec<E>


What was nice about the previous implementation is that we only had one function to build all aux columns (AuxColumnBuilder::build_aux_column()). Now this logic is duplicated for every column (build_bus_aux_column(), build_vtable_aux_column(), stack's AuxTraceBuilder::build_aux_column(), etc.).

We could bring back a different AuxColumnBuilder that would look something like

trait AuxColumnBuilder { // except with (much better) naming fn init_result_1(main_trace, alphas); fn init_result_2(main_trace, alphas); fn result_1(prev_result, main_trace, alphas, row_idx); fn result_2(prev_result, main_trace, alphas, row_idx); // provided method fn build_aux_column(main_trace: &ColMatrix<Felt>, alphas: &[E]) -> Vec<E> where E: FieldElement<BaseField = Felt>, { let main_trace = MainTrace::new(main_trace); let mut result_1: Vec<E> = Self::init_result_1(&main_trace, alphas); let mut result_2: Vec<E> = Self::init_result_2(&main_trace, alphas) let mut result_2_acc = E::ONE; for i in 0..main_trace.num_rows() - 1 { result_1[i + 1] = Self::result_1(result_1[i], &main_trace, alphas, row_idx); result_2[i + 1] = Self::result_2(result_2[i], &main_trace, alphas, row_idx); result_2_acc *= result_2[i + 1]; } let mut acc_inv = result_2_acc.inv(); for i in (0..main_trace.num_rows()).rev() { result_1[i] *= acc_inv; acc_inv *= result_2[i]; } result_1 } }

The only thing left is to find better names than the ones I used here. And potentially making main_trace and alphas fields in the structs to avoiding passing them around all the time.

I have now factored the logic above into a trait for aux column building. I have used the self. instead of Self:: as it makes implementing aux column building for the stack overflow table less cumbersome.

processor/src/chiplets/aux_trace/mod.rs

plafer · 2024-01-08T14:48:47Z

processor/src/chiplets/aux_trace/mod.rs

+                + alphas[3].mul_base(node_index);
+
+            let bit = node_index.as_int() & 1;
+            let left_word = build_value(&alphas[8..12], &state[4..8]);


Can we use constants here, instead of 8..12 and 4..8? It makes it hard to understand why these specific values were chosen

Done! Although for the alphas it is less straightforward as the values don't have a special meaning, except for the alphas used in computing the header values.

plafer · 2024-01-08T14:56:02Z

processor/src/chiplets/aux_trace/mod.rs

+    alphas[0]
+        + alphas[1].mul_base(op_label)
+        + alphas[2].mul_base(ctx)
+        + alphas[3].mul_base(addr)
+        + alphas[4].mul_base(clk)
+        + alphas[5].mul_base(value0)
+        + alphas[6].mul_base(value1)
+        + alphas[7].mul_base(value2)
+        + alphas[8].mul_base(value3)


Can we use build_value() here? This applies to all other places in this file where we reimplement build_value() by hand.

I'll have to think about it some more, but it'd be nice to have a trait similar to the previous LookupTableRow, adapted to the changes in this PR, which would replace build_value().

processor/src/chiplets/aux_trace/mod.rs

plafer

I think it's best to hold off to make big refactors on top of this, as the final shape of the aux column building code will probably require all chiplets to be refactored first (see last comment in review). So I would address the points I brought up in my reviews, and then the PR would be in a good state to merge IMO 🙂

processor/src/decoder/auxiliary.rs

processor/src/trace/mod.rs

plafer · 2024-01-08T20:55:55Z

processor/src/chiplets/aux_trace/mod.rs

+    result_2[0] = E::ONE;
+
+    let mut result_2_acc = E::ONE;
+    for i in 0..main_trace.num_rows() - 1 {


Currently, every chiplet performs its own loop over the entire trace. It would probably be more cache-efficient to have just one loop, in which we every chiplet performs its work in the body - but we'd need to measure the performance delta to be sure.

It would also make further optimizations easier, such as parallelizing the loop (as opposed to parallelize n loops for n chiplets).

bobbinth · 2024-01-08T21:04:29Z

I think it's best to hold off to make big refactors on top of this, as the final shape of the aux column building code will probably require all chiplets to be refactored first (see last comment in review).

I like the idea of standardizing the code a bit more - but I wouldn't do it in this PR directly. Instead, we could do it in PRs built on top of this one, and then merge into this PR. This would make incremental reviews easer.

The goal here is to get the structure in the PR to be relatively clean, and only then merge into next.

…ttern

Al-Kindi-0 requested a review from bobbinth November 7, 2023 19:20

Al-Kindi-0 marked this pull request as draft November 7, 2023 19:22

bobbinth reviewed Nov 8, 2023

View reviewed changes

Al-Kindi-0 marked this pull request as ready for review November 23, 2023 15:54

bobbinth reviewed Nov 26, 2023

View reviewed changes

Al-Kindi-0 mentioned this pull request Nov 27, 2023

Simplify the chiplet virtual table build #1161

Merged

bobbinth approved these changes Nov 29, 2023

View reviewed changes

bobbinth force-pushed the al-simplify-aux branch from 7ca5384 to 7aabb8f Compare December 5, 2023 08:57

bobbinth force-pushed the al-simplify-aux branch 5 times, most recently from 2eb75d6 to cf6db3a Compare December 13, 2023 02:08

plafer reviewed Jan 8, 2024

View reviewed changes

Al-Kindi-0 force-pushed the al-simplify-aux branch from 7a0a206 to ee93c6d Compare January 15, 2024 11:43

plafer mentioned this pull request Jan 17, 2024

ExecutionTrace.main_trace: change type to MainTrace #1201

Closed

bobbinth force-pushed the al-simplify-aux branch from f7916ac to 5919612 Compare January 24, 2024 08:16

bobbinth mentioned this pull request Jan 29, 2024

Adds the RCOMB1 instruction #1216

Merged

Al-Kindi-0 mentioned this pull request Jan 30, 2024

Handle extra memory requests made by new RCombBase op #1222

Closed

bobbinth force-pushed the al-simplify-aux branch 2 times, most recently from 3d06daa to 4c963d2 Compare February 8, 2024 08:29

Al-Kindi-0 and others added 16 commits February 8, 2024 15:46

feat: add simplified chiplet bus column builder

138fa0a

feat: simplify build of chiplet vtable column

b516e51

feat: simplify the 3 decoder auxiliary columns

84af58e

feat: simplify the 3 decoder auxiliary columns

cac7f1a

feat: simplify stack overflow auxiliary column build

1118760

chore: remove decoder aux-hints code

359a4a2

refactor: decoder auxiliary builder and uniformize the aux builder pa…

48b882a

…ttern

feat: simplify auxiliary trace building logic

e374d30

feat: consolidate MainTrace and optimize column build

3e9b799

feat: consolidate MainTrace and optimize column build

946682b

Trait for auxiliary column build (#1195)

97551fa

refactor: clean up aux column generator for stack overflow table (#1213)

33fed30

refactor: Deref MainTrace type to ColMatrix (#1214)

dbd132d

fix: metal prover compilation

0df5861

chore: split up decoder aux column builders into separate files (#1233)

a7b5fc3

Handle additional memory requests made by RCOMBBASE (#1229)

156a257

bobbinth force-pushed the al-simplify-aux branch from d8ae2e5 to 156a257 Compare February 8, 2024 23:49

Adds Kernel procedure responses (#1246)

620cc6e

bobbinth merged commit d1aa43c into next Feb 13, 2024
15 checks passed

bobbinth deleted the al-simplify-aux branch February 13, 2024 22:40

This was referenced Feb 13, 2024

Auxiliary trace column builder simplification #989

Closed

Remove queued_requests from the hasher #446

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify auxiliary columns build #1140

Simplify auxiliary columns build #1140

Al-Kindi-0 commented Nov 7, 2023

bobbinth left a comment

bobbinth left a comment

Al-Kindi-0 commented Nov 26, 2023

bobbinth left a comment

bobbinth commented Dec 13, 2023

bobbinth commented Dec 13, 2023 •

edited

Loading

bobbinth commented Dec 13, 2023

Al-Kindi-0 commented Dec 20, 2023

Al-Kindi-0 commented Dec 20, 2023

Al-Kindi-0 commented Dec 21, 2023

Al-Kindi-0 commented Dec 21, 2023

bobbinth commented Dec 22, 2023

plafer left a comment

plafer Jan 8, 2024

Al-Kindi-0 Jan 15, 2024

plafer Jan 8, 2024

Al-Kindi-0 Jan 15, 2024

plafer Jan 8, 2024

Al-Kindi-0 Jan 15, 2024

plafer Jan 8, 2024

Al-Kindi-0 Jan 15, 2024

plafer Jan 8, 2024

plafer left a comment

plafer Jan 8, 2024

bobbinth commented Jan 8, 2024

	if i % 8 == 7 {
	if i % bitwise::OP_CYCLE_LEN == bitwise::OP_CYCLE_LEN - 1 {

Simplify auxiliary columns build #1140

Simplify auxiliary columns build #1140

Conversation

Al-Kindi-0 commented Nov 7, 2023

Describe your changes

Checklist before requesting a review

bobbinth left a comment

Choose a reason for hiding this comment

bobbinth left a comment

Choose a reason for hiding this comment

Al-Kindi-0 commented Nov 26, 2023

bobbinth left a comment

Choose a reason for hiding this comment

bobbinth commented Dec 13, 2023

bobbinth commented Dec 13, 2023 • edited Loading

bobbinth commented Dec 13, 2023

Al-Kindi-0 commented Dec 20, 2023

Al-Kindi-0 commented Dec 20, 2023

Al-Kindi-0 commented Dec 21, 2023

Al-Kindi-0 commented Dec 21, 2023

bobbinth commented Dec 22, 2023

plafer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

plafer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bobbinth commented Jan 8, 2024

bobbinth commented Dec 13, 2023 •

edited

Loading