Assembler: merge adjacent basic blocks #1454

plafer · 2024-08-14T21:17:03Z

~~The current solution leaves the old unused basic blocks in the MastForest. I could not think of a simple/clean way to avoid this.~~ We now run dead code elimination as well.

bobbinth · 2024-08-15T10:43:33Z

Thank you! Not a review yet - but one thought that came to mind: I wonder if it may be better to do this as a separate pass once MastForest has been fully built. It would be less efficient, but this is probably not super important as compilation is not on a "critical path".

At the same time, it would make the code more modular and potentially may yield better results. Also, we can extend this approach to the unused node elimination. That is, we first build the MAST, then do a separate pass to merge contiguous basic blocks, then do another pass to eliminate unused notes. In the future, this can be extended with additional passes for various optimization purposes.

plafer · 2024-08-15T12:40:08Z

I thought about it a bit more, and it's simpler to keep the current implementation, and add a dead code elimination pass that will remove all unused nodes. Otherwise, it's quite complex to merge basic blocks once you have the MastForest, since you need to undo the tree of JOINs that we create to merge basic blocks, create a new one from the new set of basic blocks, and then correctly update all references to old MastNodeIds.

I'll put this back into draft to implement a dead code elimination pass.

miden/tests/integration/exec_iters.rs

plafer · 2024-08-15T17:44:38Z

Added dead code elimination, which makes the assembler 5x slower (or more). I will investigate if there are some easy optimizations I can make, but we should also consider making dead code elimination optional (if only to make tests run faster)

UPDATE: Actually the program_compilation benchmark is virtually unchanged. I observed the ~5x slowdown from running proptests - no longer sure why this is happening

UPDATE: pushed an optimization where we only run dead code elimination if we detect any dead code. now the program_compilation benchmark is exactly unchanged. I also flamegraphed the clz_proptest test, and 80% of the time is spent hashing as a result of deserialization (#1391).

assembly/src/assembler/dead_code_elimination.rs

assembly/src/tests.rs

bobbinth

Thank you! Looks good! I left some comments inline, but also several general comments:

It seems like the size of the serialized standard library went up from about 250KB to 1.2MB. I was expecting it it to go down - so, curious why that happened.
Do you know how this affected the runtime of our usual BLAKE3 example? I'm mostly curious how the cycle counts changed.
I think we probably should update the program_compilation benchmark. I think right now it benchmarks primarily the Assembler::add_library() method since the program we are compiling is very simple (basically, it results in an empty MAST forest). I think instead, we should benchmark compilation of a specific module in the standard library (e.g., maybe the same sha256 module) - or maybe we can benchmark compilation of the entire standard library.

assembly/src/assembler/basic_block_builder.rs

assembly/src/assembler/instruction/mod.rs

core/src/mast/node/mod.rs

assembly/src/assembler/mod.rs

assembly/src/assembler/dead_code_elimination.rs

bobbinth · 2024-08-16T19:40:04Z

Also - let's rebase this on the main branch. We can merge it there and then release it as v0.10.4 - this way, we'll realize the benefit of this PR in miden-base immediately.

assembly/src/assembler/dead_code_elimination.rs

bobbinth · 2024-08-17T18:44:18Z

It seems like the size of the serialized standard library went up from about 250KB to 1.2MB. I was expecting it it to go down - so, curious why that happened.

Actually, thinking about this more - the reason for size increase could be due to inlining of procedures. Though, 5x increase seems quite high.

plafer · 2024-08-19T20:15:29Z

Benchmark of MIDEN_LOG=info RAYON_NUM_THREADS=16 ./target/optimized/miden example --recursive blake3 -n 100:

branch: next

proving time: 12470 ms

============================================================
Generated a program to compute 100-th iteration of BLAKE3 1-to-1 hash; expected result: [4210649924, 4239425932, 2583891669, 2278324621, 1697424527, 1323302812, 3062448259, 2695025053]
--------------------------------
INFO     prove_program [ 12.5s | 3.77% / 100.00% ]
INFO     ┝━ execute_program [ 335ms | 2.69% ]
INFO     ┝━ ｉ [info]: Generated execution trace of 70 columns and 1048576 steps (49% padded) in 334 ms
INFO     ┝━ build_domain [ 10.6ms | 0.08% ] trace_length: 1048576 | lde_domain_size: 8388608
INFO     ┝━ commit_to_main_trace_segment [ 6.64s | 0.00% / 53.25% ]
INFO     │  ┝━ extend_execution_trace [ 5.51s | 44.19% ] num_cols: 70 | blowup: 8
INFO     │  ┕━ compute_execution_trace_commitment [ 1.13s | 9.07% ] tree_depth: 23
INFO     ┝━ commit_to_aux_trace_segment [ 836ms | 0.00% / 6.70% ]
INFO     │  ┝━ extend_execution_trace [ 497ms | 3.99% ] num_cols: 7 | blowup: 8
INFO     │  ┕━ compute_execution_trace_commitment [ 338ms | 2.71% ] tree_depth: 23
INFO     ┝━ evaluate_constraints [ 2.42s | 19.41% ] ce_domain_size: 8388608
INFO     ┝━ commit_to_constraint_evaluations [ 1.19s | 0.00% / 9.58% ]
INFO     │  ┝━ build_composition_poly_columns [ 124ms | 0.99% ] num_columns: 8
INFO     │  ┝━ evaluate_composition_poly_columns [ 782ms | 6.27% ]
INFO     │  ┕━ compute_constraint_evaluation_commitment [ 289ms | 2.31% ] tree_depth: 23
INFO     ┝━ build_deep_composition_poly [ 371ms | 2.97% ]
INFO     ┝━ evaluate_deep_composition_poly [ 71.1ms | 0.57% ]
INFO     ┝━ compute_fri_layers [ 95.9ms | 0.77% ] num_layers: 4
INFO     ┝━ determine_query_positions [ 2.11ms | 0.02% ] grinding_factor: 16 | num_positions: 27
INFO     ┕━ build_proof_object [ 23.6ms | 0.19% ]
--------------------------------
Executed program in 12470 ms
Stack outputs: [4210649924, 4239425932, 2583891669, 2278324621, 1697424527, 1323302812, 3062448259, 2695025053]
Execution proof size: 100 KB
Execution proof security: 96 bits
--------------------------------
INFO     verify_program [ 1.05ms | 100.00% ]
Execution verified in 1 ms

branch: this one

proving time: 5593 ms

Generated a program to compute 100-th iteration of BLAKE3 1-to-1 hash; expected result: [4210649924, 4239425932, 2583891669, 2278324621, 1697424527, 1323302812, 3062448259, 2695025053]
--------------------------------
INFO     prove_program [ 5.59s | 4.10% / 100.00% ]
INFO     ┝━ execute_program [ 234ms | 4.18% ]
INFO     ┝━ ｉ [info]: Generated execution trace of 70 columns and 524288 steps (8% padded) in 233 ms
INFO     ┝━ build_domain [ 5.24ms | 0.09% ] trace_length: 524288 | lde_domain_size: 4194304
INFO     ┝━ commit_to_main_trace_segment [ 2.68s | 0.00% / 47.91% ]
INFO     │  ┝━ extend_execution_trace [ 2.15s | 38.49% ] num_cols: 70 | blowup: 8
INFO     │  ┕━ compute_execution_trace_commitment [ 527ms | 9.42% ] tree_depth: 22
INFO     ┝━ commit_to_aux_trace_segment [ 413ms | 0.00% / 7.38% ]
INFO     │  ┝━ extend_execution_trace [ 253ms | 4.52% ] num_cols: 7 | blowup: 8
INFO     │  ┕━ compute_execution_trace_commitment [ 160ms | 2.86% ] tree_depth: 22
INFO     ┝━ evaluate_constraints [ 1.18s | 21.07% ] ce_domain_size: 4194304
INFO     ┝━ commit_to_constraint_evaluations [ 564ms | 0.00% / 10.09% ]
INFO     │  ┝━ build_composition_poly_columns [ 64.8ms | 1.16% ] num_columns: 8
INFO     │  ┝━ evaluate_composition_poly_columns [ 352ms | 6.30% ]
INFO     │  ┕━ compute_constraint_evaluation_commitment [ 147ms | 2.63% ] tree_depth: 22
INFO     ┝━ build_deep_composition_poly [ 201ms | 3.60% ]
INFO     ┝━ evaluate_deep_composition_poly [ 33.2ms | 0.59% ]
INFO     ┝━ compute_fri_layers [ 39.9ms | 0.71% ] num_layers: 4
INFO     ┝━ determine_query_positions [ 271µs | 0.00% ] grinding_factor: 16 | num_positions: 27
INFO     ┕━ build_proof_object [ 15.0ms | 0.27% ]
--------------------------------
Executed program in 5593 ms
Stack outputs: [4210649924, 4239425932, 2583891669, 2278324621, 1697424527, 1323302812, 3062448259, 2695025053]
Execution proof size: 93 KB
Execution proof security: 96 bits
--------------------------------
INFO     verify_program [ 1.01ms | 100.00% ]
Execution verified in 1 ms

bobbinth

Looks good! Thank you! I left some comments inline - but they are all pretty small.

assembly/src/assembler/mast_forest_builder.rs

core/src/mast/mod.rs

bobbinth · 2024-08-19T21:10:26Z

Oh - I think we should also update program_compilation benchmark to see how this affected the assembler performance.

plafer · 2024-08-20T12:50:02Z

Oh - I think we should also update program_compilation benchmark to see how this affected the assembler performance.

I ended up creating a new benchmark under the miden-stdlib package instead. After looking more into it, I think we should keep benchmarks in the package which they are benchmarking. Otherwise, in this case, I would have had to resort to some hacks - e.g. I use the CARGO_MANIFEST_DIR environment variable to find the directory which contains the masm files for the standard library (under asm/). But in the miden-vm package, this environment variable points to the wrong directory.

And also I think it makes things cleaner - we will find all the benchmarks related to the standard library in the standard library's package, as opposed to having all the miden benchmarks in the same package.

So IMO, in another PR, we should move the current benchmarks in the miden-vm package in the package which they are benchmarking.

Benchmark results

On my machine, it took 115ms to compile the standard library.

bitwalker

LGTM!

bobbinth

Looks good! Thank you! I left two tiny nits inline.

Regarding benchmark locations: I actually think having all of them in a single crate is convenient (we always know where to look for them). miden-vm crate works well for this because all crates "come together" there.

But let's keep the new benchmark as you have and we can later decided whether we move it into miden-vm or move others out of there.

bobbinth · 2024-08-20T17:37:45Z

assembly/src/assembler/mast_forest_builder.rs

+    /// Builds a tree of `JOIN` operations to combine the provided MAST node IDs.
+    pub fn join_nodes(
+        &mut self,
+        mast_node_ids: Vec<MastNodeId>,


minor nit: here and in other functions, I'd probably name this parameter just node_ids as the mast part is kind of implied.

bobbinth · 2024-08-20T17:57:06Z

stdlib/benches/compilation.rs

+use criterion::{criterion_group, criterion_main, Criterion};
+
+fn stdlib_compilation(c: &mut Criterion) {
+    let mut group = c.benchmark_group("stdlib");


nit: I would rename the group name to "compile_stdlib".

bobbinth · 2024-08-20T18:04:08Z

Oh - and a couple of last things to check:

What is the size of the serialized stdlib after this PR.
How did this PR affect stdlib deserialization time (we have a benchmark for this in miden-vm).

plafer · 2024-08-20T18:39:54Z

What is the size of the serialized stdlib after this PR.

next: 250825 bytes (or 0.24 MBs)
Now: 1198781 bytes (or 1.14 MBs)

How did this PR affect stdlib deserialization time (we have a benchmark for this in miden-vm).

next: 2.2ms
Now: 14.3ms

A ~5x decrease in performance, which is directly related to the file size which is ~5x larger. So it seems like adding a heuristic for when to stop merging blocks would be very beneficial for deserialization performance.

bobbinth

All looks good! Thank you!

plafer added 4 commits August 14, 2024 15:28

add failing test

4eed616

chore: Remove old mentions "span" instead of "basic block"

6284166

feat(assembler): merge contiguous basic blocks

5e903d1

chore: fix tests that didn't assume merging

e3bcda6

plafer requested review from bitwalker and bobbinth August 14, 2024 21:17

plafer added 2 commits August 14, 2024 17:19

Changelog

bbb4313

fix falcon test

8d3a98b

plafer marked this pull request as draft August 15, 2024 12:40

plafer commented Aug 15, 2024

View reviewed changes

miden/tests/integration/exec_iters.rs Outdated Show resolved Hide resolved

plafer marked this pull request as ready for review August 15, 2024 17:41

plafer force-pushed the plafer-1429-merge-adjacent-blocks branch from e38cde2 to f9c14cc Compare August 15, 2024 17:43

plafer force-pushed the plafer-1429-merge-adjacent-blocks branch 2 times, most recently from c33a67a to c00fef2 Compare August 15, 2024 18:35

plafer commented Aug 15, 2024

View reviewed changes

assembly/src/assembler/dead_code_elimination.rs Outdated Show resolved Hide resolved

bobbinth mentioned this pull request Aug 16, 2024

Migrate to miden-vm v0.10 0xPolygonMiden/miden-base#826

Merged

bobbinth reviewed Aug 16, 2024

View reviewed changes

assembly/src/tests.rs Show resolved Hide resolved

plafer added 3 commits August 16, 2024 07:02

fix integration tests

f0fff24

disable exec_iter test

dceb4c5

feat: implement dead code elimination

aaf0212

plafer force-pushed the plafer-1429-merge-adjacent-blocks branch from c00fef2 to aaf0212 Compare August 16, 2024 11:04

bobbinth requested changes Aug 16, 2024

View reviewed changes

bitwalker requested changes Aug 17, 2024

View reviewed changes

assembly/src/assembler/dead_code_elimination.rs Outdated Show resolved Hide resolved

plafer added 2 commits August 19, 2024 09:18

make compute_live_ids iterative

7113e23

comment nit

244aaee

Move join_mast_node_ids in MastForestBuilder

aefaa5f

plafer force-pushed the plafer-1429-merge-adjacent-blocks branch from 15ec461 to 2e4b889 Compare August 19, 2024 19:55

Replace "dead code elimination" with removal of unused basic blocks only

a71c792

plafer force-pushed the plafer-1429-merge-adjacent-blocks branch from 2e4b889 to a71c792 Compare August 19, 2024 20:03

bobbinth approved these changes Aug 19, 2024

View reviewed changes

plafer added 9 commits August 20, 2024 07:00

Rename MastForestBuilder::prune_and_build to build

8c8ef39

rename join_nodes()

18d26da

check for procedure roots in get_nodes_to_remove

ba9e864

MastNodeId: Remove conversion from u32

0c7635f

rename variable pruned_nodes

b64e6e3

update comment

0f7beae

rename var pruned_nodes in other context

dcd8922

clarify docstring for MastForest::remove_nodes

c652420

add stdlib compilation benchmark

622ce45

plafer requested a review from bobbinth August 20, 2024 12:50

bitwalker approved these changes Aug 20, 2024

View reviewed changes

bobbinth approved these changes Aug 20, 2024

View reviewed changes

plafer added 2 commits August 20, 2024 14:25

Rename all mast_node_ids -> node_ids

d8f664c

rename stdlib benchmark group

df15c30

plafer requested a review from bobbinth August 20, 2024 18:40

bobbinth approved these changes Aug 20, 2024

View reviewed changes

bobbinth merged commit 62a49fd into next Aug 20, 2024
9 checks passed

bobbinth deleted the plafer-1429-merge-adjacent-blocks branch August 20, 2024 19:30

This was referenced Aug 20, 2024

Reinstate inlining of repeat statements #1429

Closed

Add no_std once primitive for stdlib deserialization #1463

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assembler: merge adjacent basic blocks #1454

Assembler: merge adjacent basic blocks #1454

plafer commented Aug 14, 2024 •

edited

Loading

bobbinth commented Aug 15, 2024

plafer commented Aug 15, 2024

plafer commented Aug 15, 2024 •

edited

Loading

bobbinth left a comment

bobbinth commented Aug 16, 2024

bobbinth commented Aug 17, 2024

plafer commented Aug 19, 2024

bobbinth left a comment

bobbinth commented Aug 19, 2024

plafer commented Aug 20, 2024 •

edited

Loading

bitwalker left a comment

bobbinth left a comment

bobbinth Aug 20, 2024

plafer Aug 20, 2024

bobbinth Aug 20, 2024

plafer Aug 20, 2024

bobbinth commented Aug 20, 2024

plafer commented Aug 20, 2024

bobbinth left a comment

Assembler: merge adjacent basic blocks #1454

Assembler: merge adjacent basic blocks #1454

Conversation

plafer commented Aug 14, 2024 • edited Loading

bobbinth commented Aug 15, 2024

plafer commented Aug 15, 2024

plafer commented Aug 15, 2024 • edited Loading

bobbinth left a comment

Choose a reason for hiding this comment

bobbinth commented Aug 16, 2024

bobbinth commented Aug 17, 2024

plafer commented Aug 19, 2024

branch: next

branch: this one

bobbinth left a comment

Choose a reason for hiding this comment

bobbinth commented Aug 19, 2024

plafer commented Aug 20, 2024 • edited Loading

Benchmark results

bitwalker left a comment

Choose a reason for hiding this comment

bobbinth left a comment

Choose a reason for hiding this comment

bobbinth Aug 20, 2024

Choose a reason for hiding this comment

plafer Aug 20, 2024

Choose a reason for hiding this comment

bobbinth Aug 20, 2024

Choose a reason for hiding this comment

plafer Aug 20, 2024

Choose a reason for hiding this comment

bobbinth commented Aug 20, 2024

plafer commented Aug 20, 2024

bobbinth left a comment

Choose a reason for hiding this comment

plafer commented Aug 14, 2024 •

edited

Loading

plafer commented Aug 15, 2024 •

edited

Loading

plafer commented Aug 20, 2024 •

edited

Loading