Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf!: refactor Program to reduce clone time #999

Merged
merged 1 commit into from
Apr 18, 2023
Merged

Conversation

Oppen
Copy link
Contributor

@Oppen Oppen commented Apr 17, 2023

There was a report that CairoRunner::new took a significant time (~6%) of the runtime in a program that reuses the same Program structure to execute it many times for short runtimes. This is significant and couldn't be caught before by the regular workflow of cairo-vm-cli (it runs a program once, so it can only be caught if awfully slow).
This change should help any kind of sequencers, so I think it's worth it.

The fields that were extracted are either used only on initialization or on error paths. The rest are still being copied, as a previous approach of passing Arc<Program> instead of Program to CairoRunner::new led to a big performance hit.
The change should non-breaking because it's kept internal to Program: there's a new pub(crate) field that keeps an Arc<SharedProgramData>, so no public function changes. However, we did some oversharing of internal data in previous releases and now it's technically breaking: the fields we moved are no longer visible, and wouldn't be accessible with the same paths anyway.

Two other approaches we tried first:

  • Store a reference: this forced an explicit lifetime in CairoRunner which broke in an unfixable way cairo-rs-py: PyO3 can't deal with structs having their lifetime managed by Rust.
  • Directly store an Arc<Program> in CairoRunner: as mentioned, this made execution much slower (10-20%).

Checklist

  • Linked to Github Issue
  • Unit tests added updated
  • Integration tests added.
  • This change requires new documentation.
    • Documentation has been added/updated.
    • CHANGELOG has been updated.

@Oppen Oppen force-pushed the partial_arc_program branch from 651fda1 to ea75559 Compare April 17, 2023 20:03
@github-actions
Copy link

github-actions bot commented Apr 17, 2023

Benchmark Results for unmodified programs 🚀

Command Mean [s] Min [s] Max [s] Relative
base blake2s_integration_benchmark 3.320 ± 0.022 3.294 3.354 1.00 ± 0.01
head blake2s_integration_benchmark 3.308 ± 0.042 3.276 3.421 1.00
Command Mean [s] Min [s] Max [s] Relative
base compare_arrays_200000 4.137 ± 0.046 4.092 4.218 1.01 ± 0.01
head compare_arrays_200000 4.085 ± 0.021 4.065 4.141 1.00
Command Mean [s] Min [s] Max [s] Relative
base dict_integration_benchmark 3.125 ± 0.018 3.104 3.167 1.01 ± 0.01
head dict_integration_benchmark 3.079 ± 0.023 3.053 3.129 1.00
Command Mean [s] Min [s] Max [s] Relative
base factorial_multirun 5.071 ± 0.036 5.035 5.157 1.01 ± 0.01
head factorial_multirun 4.999 ± 0.059 4.948 5.122 1.00
Command Mean [s] Min [s] Max [s] Relative
base fibonacci_1000_multirun 4.013 ± 0.052 3.980 4.157 1.01 ± 0.01
head fibonacci_1000_multirun 3.991 ± 0.022 3.971 4.044 1.00
Command Mean [s] Min [s] Max [s] Relative
base integration_builtins 3.997 ± 0.033 3.959 4.082 1.00 ± 0.01
head integration_builtins 3.992 ± 0.015 3.970 4.008 1.00
Command Mean [s] Min [s] Max [s] Relative
base keccak_integration_benchmark 3.618 ± 0.016 3.600 3.642 1.00 ± 0.01
head keccak_integration_benchmark 3.602 ± 0.025 3.575 3.664 1.00
Command Mean [s] Min [s] Max [s] Relative
base linear_search 4.253 ± 0.013 4.232 4.272 1.00
head linear_search 4.267 ± 0.025 4.237 4.314 1.00 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
base math_cmp_and_pow_integration_benchmark 3.560 ± 0.033 3.530 3.637 1.01 ± 0.01
head math_cmp_and_pow_integration_benchmark 3.525 ± 0.024 3.501 3.581 1.00
Command Mean [s] Min [s] Max [s] Relative
base math_integration_benchmark 3.241 ± 0.034 3.206 3.314 1.01 ± 0.01
head math_integration_benchmark 3.212 ± 0.021 3.191 3.265 1.00
Command Mean [s] Min [s] Max [s] Relative
base memory_integration_benchmark 2.850 ± 0.015 2.826 2.881 1.00 ± 0.01
head memory_integration_benchmark 2.849 ± 0.027 2.819 2.897 1.00
Command Mean [s] Min [s] Max [s] Relative
base operations_with_data_structures_benchmarks 2.848 ± 0.018 2.828 2.877 1.00 ± 0.01
head operations_with_data_structures_benchmarks 2.840 ± 0.015 2.810 2.861 1.00
Command Mean [s] Min [s] Max [s] Relative
base pedersen 3.990 ± 0.011 3.977 4.006 1.00
head pedersen 4.003 ± 0.018 3.984 4.039 1.00 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
base poseidon_integration_benchmark 1.777 ± 0.011 1.756 1.795 1.00 ± 0.01
head poseidon_integration_benchmark 1.775 ± 0.010 1.762 1.795 1.00
Command Mean [s] Min [s] Max [s] Relative
base secp_integration_benchmark 3.173 ± 0.015 3.155 3.209 1.01 ± 0.01
head secp_integration_benchmark 3.150 ± 0.008 3.139 3.159 1.00
Command Mean [s] Min [s] Max [s] Relative
base set_integration_benchmark 2.126 ± 0.014 2.112 2.155 1.00
head set_integration_benchmark 2.179 ± 0.011 2.165 2.196 1.02 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
base uint256_integration_benchmark 4.986 ± 0.057 4.943 5.124 1.00 ± 0.01
head uint256_integration_benchmark 4.985 ± 0.019 4.952 5.013 1.00

@Oppen Oppen force-pushed the partial_arc_program branch from ea75559 to 5bec08b Compare April 17, 2023 21:15
@Oppen
Copy link
Contributor Author

Oppen commented Apr 17, 2023

No slowdown above uncertainty was detected, so this solution is acceptable in principle.
These are the results for the microbenchmarks with IAI:

parse_program
  Instructions:           190723153 (+0.025154%)
  L1 Accesses:            258756581 (+0.023427%)
  L2 Accesses:                51852 (+0.104251%)
  RAM Accesses:              302252 (-0.017532%)
  Estimated Cycles:       269594661 (+0.021897%)

build_many_runners
  Instructions:           256939374 (-20.51740%)
  L1 Accesses:            353011124 (-21.13548%)
  L2 Accesses:              2520321 (-48.57583%)
  RAM Accesses:              314302 (-3.150143%)
  Estimated Cycles:       376619 (-22.10376%)

22% improvement on creation of 100 runners from a single program at virtually no extra cost.

@Oppen Oppen changed the title Partial arc program Refactor Program so less used fields aren't copied on CairoRunner construction Apr 18, 2023
@Oppen Oppen marked this pull request as ready for review April 18, 2023 15:06
@Oppen Oppen force-pushed the partial_arc_program branch from 5bec08b to f7e90ef Compare April 18, 2023 15:09
@Oppen Oppen changed the title Refactor Program so less used fields aren't copied on CairoRunner construction perf!: refactor Program to reduce clone time Apr 18, 2023
@Oppen Oppen force-pushed the partial_arc_program branch from f7e90ef to 2fc2649 Compare April 18, 2023 15:13
@codecov
Copy link

codecov bot commented Apr 18, 2023

Codecov Report

Merging #999 (08df889) into main (426d656) will decrease coverage by 0.01%.
The diff coverage is 97.91%.

❗ Current head 08df889 differs from pull request most recent head aaccaf1. Consider uploading reports for the commit aaccaf1 to get more accurate results

@@            Coverage Diff             @@
##             main     #999      +/-   ##
==========================================
- Coverage   98.02%   98.01%   -0.01%     
==========================================
  Files          76       76              
  Lines       31621    31646      +25     
==========================================
+ Hits        30997    31019      +22     
- Misses        624      627       +3     
Impacted Files Coverage Δ
src/types/program.rs 98.46% <92.85%> (-0.67%) ⬇️
src/serde/deserialize_program.rs 97.48% <100.00%> (-0.02%) ⬇️
src/utils.rs 99.43% <100.00%> (+<0.01%) ⬆️
src/vm/runners/cairo_runner.rs 98.08% <100.00%> (+0.01%) ⬆️
src/vm/security.rs 98.07% <100.00%> (+<0.01%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Oppen Oppen force-pushed the partial_arc_program branch 2 times, most recently from ca78d89 to 4baa387 Compare April 18, 2023 18:48
CHANGELOG.md Outdated Show resolved Hide resolved
Extract most fields (see the code for the details) in `Program` into a
new `SharedProgramData` structure, then add a field
`shared_program_data: Arc<SharedProgramData>` to `Program`, so cloning
doesn't deep copy them. These were selected based on how often the
runner needed to access them directly, as the indirection and heap
access proved to come with a runtime cost. Frequently accessed fields
are still copied because of that.

The break comes from hiding some symbols (as they were moved to the new
structure), but those shouldn't have been exposed in the first place, so
I expect no breakage for real-world programs (cue Hyrum's law).
@Oppen Oppen force-pushed the partial_arc_program branch from 4baa387 to aaccaf1 Compare April 18, 2023 19:16
@Oppen Oppen enabled auto-merge April 18, 2023 19:29
@Oppen Oppen added this pull request to the merge queue Apr 18, 2023
Merged via the queue into main with commit d545372 Apr 18, 2023
@Oppen Oppen deleted the partial_arc_program branch April 18, 2023 20:12
kariy pushed a commit to dojoengine/cairo-rs that referenced this pull request Jun 23, 2023
Extract most fields (see the code for the details) in `Program` into a
new `SharedProgramData` structure, then add a field
`shared_program_data: Arc<SharedProgramData>` to `Program`, so cloning
doesn't deep copy them. These were selected based on how often the
runner needed to access them directly, as the indirection and heap
access proved to come with a runtime cost. Frequently accessed fields
are still copied because of that.

The break comes from hiding some symbols (as they were moved to the new
structure), but those shouldn't have been exposed in the first place, so
I expect no breakage for real-world programs (cue Hyrum's law).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants