-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify auxiliary columns build #1140
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you! I left a few comments inline - mostly about improving code readability/modularity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you! I left a few small comments inline. A couple of additional thoughts:
- It seems like this change results in roughly doubling of the time it takes to build the auxiliary trace (tested by running the Fibonacci example for
$2^20$ cycles). This is not a huge deal as overall contribution to proof generation time is pretty small, but I'd like to understand which parts of the new approach contribute the most to the runtime. - I'm wondering if it would make sense to combine some of the functions. For example, the logic in
build_mloadw_request
andbuild_mstorew_request
is nearly identical - so, combining these two should be pretty straight-forward. Similar approach can probably be applied to some other functions as well.
Also, we should probably remove the old "hint" structure as I don't think it is needed for anything chiplet-related any more (this can be done in a separate commit so that we can roll things back if needed).
Overall, I'd probably make this PR about chiplet bus only and then address other buses/virtual tables in different PRs (which build on this PR). This way, we have relatively small PRs for each change, and then we can merge them together once the whole structure has been migrated.
Added a description of the timings for the Fibonacci example above.
I have now combined a few additional functions. Let me know if you see further simplifications.
Will remove them in at the end in a separate commit, if that is OK.
Makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you! I left a few more small comments inline - after these, we are good to merge. But as mentioned in the previous review, let's hold off on merging until the whole structure is refactored.
7ca5384
to
7aabb8f
Compare
2eb75d6
to
cf6db3a
Compare
I've cleaned up commit history a bit and merged all downstream PRs into this one (and also rebased this PR from the latest Let's not merge it yet though. We can probably do a bit more cleanup - and I'd like to do it on top of this PR before we merge into |
Overall, with the changes as they stand, the time to build auxiliary trace increased about 10x (from ~25ms to ~250ms). But we haven't optimized this at all, and nothing is parallelized. So, overall, I think we should be able to get this down to 25 ms (or maybe even less). And if GKR changes work, we may be able to reduce it even more (though, I think we'll need to understand GKR proving time a bit more before we can make this claim). |
One of the things we can do in terms of code improvements is to consolidate all |
Makes sense! |
That's plausible. Will try to optimize it after refactoring
I would bet yes since as far as I can see there are no inversions when evaluating the GKR circuit. However, there will be some extra costs with the GKR prover that are non-negligible. |
I have now consolidated the helper struct and methods, improved documentation further and did some minor cleanups. Managed to find a nice way to build the column and batch invert at the same time, this should hopefully reduce the build times. |
I have also looked at parallelizing the auxiliary columns build and I don't see a way to do it without making some changes to Winterfell. I propose that we tackle this in a separate PR as it is a self-contained task. |
The place to parallelize is roughly here - so, I think we can avoid changes in Winterfell. I do agree that it should be a separate PR - but probably based on this PR before we merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the hints stuff is very refreshing! However, I think we lost some good abstractions that were present before (as mentioned in the review).
Not a full review yet; I still need to think more about how to bring some of these abstractions back. Suggestions obviously welcome :)
{ | ||
let mut multiplicand = E::ONE; | ||
if i % 8 == 7 { | ||
let op_label = get_op_label(ONE, ZERO, is_xor, ZERO); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use the column values from the trace itself (selector0
, etc) instead of hardcoding the column values? Harcoding values like this makes it all the more difficult to change the layout of the chiplets in the future.
I would personally change get_op_label()
to
fn get_op_label(main_trace: &MainTrace, row_idx: usize) -> Felt {
// fetch selector columns from main trace and compute label
}
and let the compiler optimize the memory accesses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can. Some of the "hardcoding" is due to there being only one possible value of the flag given some previous check.
I have now factored the computation of the flag outside of the if blocks.
E: FieldElement<BaseField = Felt>, | ||
{ | ||
let mut multiplicand = E::ONE; | ||
if i % 8 == 7 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if i % 8 == 7 { | |
if i % bitwise::OP_CYCLE_LEN == bitwise::OP_CYCLE_LEN - 1 { |
Can we limit the use of magic numbers elsewhere too (e.g. in build_hasher_chiplet_responses
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pub struct BusTraceBuilder {} | ||
|
||
/// Builds the chiplets bus auxiliary trace column. | ||
fn build_bus_aux_column<E>(main_trace: &ColMatrix<Felt>, alphas: &[E]) -> Vec<E> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was nice about the previous implementation is that we only had one function to build all aux columns (AuxColumnBuilder::build_aux_column()
). Now this logic is duplicated for every column (build_bus_aux_column()
, build_vtable_aux_column()
, stack's AuxTraceBuilder::build_aux_column()
, etc.).
We could bring back a different AuxColumnBuilder
that would look something like
trait AuxColumnBuilder {
// except with (much better) naming
fn init_result_1(main_trace, alphas);
fn init_result_2(main_trace, alphas);
fn result_1(prev_result, main_trace, alphas, row_idx);
fn result_2(prev_result, main_trace, alphas, row_idx);
// provided method
fn build_aux_column(main_trace: &ColMatrix<Felt>, alphas: &[E]) -> Vec<E>
where
E: FieldElement<BaseField = Felt>,
{
let main_trace = MainTrace::new(main_trace);
let mut result_1: Vec<E> = Self::init_result_1(&main_trace, alphas);
let mut result_2: Vec<E> = Self::init_result_2(&main_trace, alphas)
let mut result_2_acc = E::ONE;
for i in 0..main_trace.num_rows() - 1 {
result_1[i + 1] = Self::result_1(result_1[i], &main_trace, alphas, row_idx);
result_2[i + 1] = Self::result_2(result_2[i], &main_trace, alphas, row_idx);
result_2_acc *= result_2[i + 1];
}
let mut acc_inv = result_2_acc.inv();
for i in (0..main_trace.num_rows()).rev() {
result_1[i] *= acc_inv;
acc_inv *= result_2[i];
}
result_1
}
}
The only thing left is to find better names than the ones I used here. And potentially making main_trace
and alphas
fields in the structs to avoiding passing them around all the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now factored the logic above into a trait for aux column building. I have used the self.
instead of Self::
as it makes implementing aux column building for the stack overflow table less cumbersome.
+ alphas[3].mul_base(node_index); | ||
|
||
let bit = node_index.as_int() & 1; | ||
let left_word = build_value(&alphas[8..12], &state[4..8]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use constants here, instead of 8..12
and 4..8
? It makes it hard to understand why these specific values were chosen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Although for the alphas it is less straightforward as the values don't have a special meaning, except for the alphas used in computing the header values.
alphas[0] | ||
+ alphas[1].mul_base(op_label) | ||
+ alphas[2].mul_base(ctx) | ||
+ alphas[3].mul_base(addr) | ||
+ alphas[4].mul_base(clk) | ||
+ alphas[5].mul_base(value0) | ||
+ alphas[6].mul_base(value1) | ||
+ alphas[7].mul_base(value2) | ||
+ alphas[8].mul_base(value3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use build_value()
here? This applies to all other places in this file where we reimplement build_value()
by hand.
I'll have to think about it some more, but it'd be nice to have a trait similar to the previous LookupTableRow
, adapted to the changes in this PR, which would replace build_value()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's best to hold off to make big refactors on top of this, as the final shape of the aux column building code will probably require all chiplets to be refactored first (see last comment in review). So I would address the points I brought up in my reviews, and then the PR would be in a good state to merge IMO 🙂
result_2[0] = E::ONE; | ||
|
||
let mut result_2_acc = E::ONE; | ||
for i in 0..main_trace.num_rows() - 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, every chiplet performs its own loop over the entire trace. It would probably be more cache-efficient to have just one loop, in which we every chiplet performs its work in the body - but we'd need to measure the performance delta to be sure.
It would also make further optimizations easier, such as parallelizing the loop (as opposed to parallelize n
loops for n
chiplets).
I like the idea of standardizing the code a bit more - but I wouldn't do it in this PR directly. Instead, we could do it in PRs built on top of this one, and then merge into this PR. This would make incremental reviews easer. The goal here is to get the structure in the PR to be relatively clean, and only then merge into |
7a0a206
to
ee93c6d
Compare
f7916ac
to
5919612
Compare
3d06daa
to
4c963d2
Compare
d8ae2e5
to
156a257
Compare
Describe your changes
This PR proposes a simplification to the auxiliary columns building procedure.
This is a wip/poc version without any optimizations.
Checklist before requesting a review
next
according to naming convention.