Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add unsafe constructors to nodes for deserialization #1453

Merged

Conversation

sergerad
Copy link
Contributor

@sergerad sergerad commented Aug 14, 2024

Closes #1391

Optimization that avoids recomputing MastNode hashes during deserialization. Seems like a 10x improvement, see results below.

Adds benchmarks for standard library deserialization.

Updates BasicBlockNode constructors and converts an assert to an error flow instead.


Latest commit:

deserialize_std_lib/read_from_bytes
                        time:   [2.2618 ms 2.2777 ms 2.2996 ms]
                        change: [-2.8312% -0.7465% +1.7381%] (p = 0.62 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

Head of next banch:

deserialize_std_lib/read_from_bytes
                        time:   [26.605 ms 26.933 ms 27.356 ms]
                        change: [+1022.4% +1058.5% +1088.1%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high severe

@sergerad
Copy link
Contributor Author

@bobbinth I have a draft PR here for feedback.

Main question is how best to organize the deserialization for benchmarking. I have just put deserialization code from the library_serialization unit test there for now. I can also see stdlib/build.rs approach which might be relevant.

LMK if anything doesn't make sense with the node constructors themselves. I split up the batch_ops fn so that the trusted flow doesn't perform hashing. I couldn't see a way of deserializing straight to batch ops in decode_operations_and_decorators rather than operations first, but perhaps that is possible.

@sergerad
Copy link
Contributor Author

I've had a pass at the deserialization benchmark. My understanding is that I need to provide a path to a .masl file bundled from the stdlib .masm in order to drive Library::deserialize_from_file(path) for the benchmark.

Will look at how to bundle the .masl next. LMK if I'm heading in the wrong direction with that 🙏

@plafer
Copy link
Contributor

plafer commented Aug 16, 2024

Thank you for taking this one on! Haven't had a chance to look at the PR yet.

My understanding is that I need to provide a path to a .masl file bundled from the stdlib .masm in order to drive Library::deserialize_from_file(path) for the benchmark.

I would run the benchmark on <Library as Deserializable>::read_from as opposed to Library::deserialize_from_file(path) so as to not measure the interactions with the file system.

So the simplest benchmark I see has similar steps to some of our tests (e.g. here)

  • define source code strings
  • parse them into modules
  • call library = Assembler::assemble_library()
  • serialize library into a Vec<u8> using <Library as Serializable>::to_bytes()
  • run the benchmark bench.iter() over Library::read_from()

You could also put the library source code in separate files, but that's up to you - and I'd get the above working first regardless.

Also caveat: we found a bug recently. TLDR: Make sure that you don't have 2 functions that differ only by the decorators they call. An example failing test case is:

proc.foo
    push.1
end

proc.bar
    adv.push_mapval
    push.1
end

begin
    exec.foo
    exec.bar
end

@bobbinth
Copy link
Contributor

I would probably run the deserialization benchmark on something sizable - so, miden-stdlib is a good candidate in my opinion. But I agree with @plafer that we shouldn't include reading file from the disc into this.

One way to do this is to embed the binary of miden-stdlib into the benchmark file using include_bytes! macro (we do something similar here). And then just run Library::read_from_bytes() on it.

assembly/Cargo.toml Outdated Show resolved Hide resolved
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you for tackling this! In addition to the comments about the benchmark above, I left a couple of minor comments inline.

core/src/mast/node/join_node.rs Outdated Show resolved Hide resolved
core/src/mast/node/basic_block_node/mod.rs Show resolved Hide resolved
@sergerad sergerad marked this pull request as ready for review August 18, 2024 09:05
@sergerad sergerad requested a review from bobbinth August 18, 2024 09:05
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you! I left a few non-blocking comments inline. This is in addition to #1453 (comment), which is also non-blocking.

assembly/Cargo.toml Outdated Show resolved Hide resolved
core/src/mast/mod.rs Outdated Show resolved Hide resolved
core/src/mast/node/basic_block_node/mod.rs Show resolved Hide resolved
@sergerad sergerad changed the title add unsafe constructors to individual nodes add unsafe constructors to nodes for deserialization Aug 18, 2024
@sergerad
Copy link
Contributor Author

Have followed up with the latest suggested changes. Updated description with benchmark results for this branch vs next. Seems like 10x as you suspected @bobbinth.

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks great! Thank you again for working on this! I'll merge as soon as CI is green.

One thing I've been thinking about is how to make it so that we deserialize things like miden-stdlib only once. We could have a static variable and use LazyLock in std context, but not sure whats the best way to handle it in no-std context.

@bobbinth bobbinth merged commit fd32a69 into 0xPolygonMiden:next Aug 18, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants