-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assembler: merge adjacent basic blocks #1454
Conversation
Thank you! Not a review yet - but one thought that came to mind: I wonder if it may be better to do this as a separate pass once At the same time, it would make the code more modular and potentially may yield better results. Also, we can extend this approach to the unused node elimination. That is, we first build the MAST, then do a separate pass to merge contiguous basic blocks, then do another pass to eliminate unused notes. In the future, this can be extended with additional passes for various optimization purposes. |
I thought about it a bit more, and it's simpler to keep the current implementation, and add a dead code elimination pass that will remove all unused nodes. Otherwise, it's quite complex to merge basic blocks once you have the I'll put this back into draft to implement a dead code elimination pass. |
e38cde2
to
f9c14cc
Compare
Added dead code elimination, which makes the assembler 5x slower (or more). I will investigate if there are some easy optimizations I can make, but we should also consider making dead code elimination optional (if only to make tests run faster) UPDATE: Actually the UPDATE: pushed an optimization where we only run dead code elimination if we detect any dead code. now the |
c33a67a
to
c00fef2
Compare
c00fef2
to
aaf0212
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Looks good! I left some comments inline, but also several general comments:
- It seems like the size of the serialized standard library went up from about 250KB to 1.2MB. I was expecting it it to go down - so, curious why that happened.
- Do you know how this affected the runtime of our usual BLAKE3 example? I'm mostly curious how the cycle counts changed.
- I think we probably should update the
program_compilation
benchmark. I think right now it benchmarks primarily theAssembler::add_library()
method since the program we are compiling is very simple (basically, it results in an empty MAST forest). I think instead, we should benchmark compilation of a specific module in the standard library (e.g., maybe the samesha256
module) - or maybe we can benchmark compilation of the entire standard library.
Also - let's rebase this on the |
Actually, thinking about this more - the reason for size increase could be due to inlining of procedures. Though, 5x increase seems quite high. |
15ec461
to
2e4b889
Compare
2e4b889
to
a71c792
Compare
Benchmark of branch: nextproving time: 12470 ms
branch: this oneproving time: 5593 ms
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you! I left some comments inline - but they are all pretty small.
Oh - I think we should also update |
I ended up creating a new benchmark under the And also I think it makes things cleaner - we will find all the benchmarks related to the standard library in the standard library's package, as opposed to having all the miden benchmarks in the same package. So IMO, in another PR, we should move the current benchmarks in the Benchmark resultsOn my machine, it took 115ms to compile the standard library. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you! I left two tiny nits inline.
Regarding benchmark locations: I actually think having all of them in a single crate is convenient (we always know where to look for them). miden-vm
crate works well for this because all crates "come together" there.
But let's keep the new benchmark as you have and we can later decided whether we move it into miden-vm
or move others out of there.
/// Builds a tree of `JOIN` operations to combine the provided MAST node IDs. | ||
pub fn join_nodes( | ||
&mut self, | ||
mast_node_ids: Vec<MastNodeId>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit: here and in other functions, I'd probably name this parameter just node_ids
as the mast
part is kind of implied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
stdlib/benches/compilation.rs
Outdated
use criterion::{criterion_group, criterion_main, Criterion}; | ||
|
||
fn stdlib_compilation(c: &mut Criterion) { | ||
let mut group = c.benchmark_group("stdlib"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would rename the group name to "compile_stdlib".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Oh - and a couple of last things to check:
|
A ~5x decrease in performance, which is directly related to the file size which is ~5x larger. So it seems like adding a heuristic for when to stop merging blocks would be very beneficial for deserialization performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good! Thank you!
Closes #1429
The current solution leaves the old unused basic blocks in theWe now run dead code elimination as well.MastForest
. I could not think of a simple/clean way to avoid this.