perf: Change `CairoRunError::VmException` to `Box<VmException>` #1756

fmoletta · 2024-05-08T21:18:54Z

PR #1720 Added a small error variant to the CairoRunError which brought a huge performance regression. This is due to the VmException variant having a big size, making all other variants equally as big. This PR solves this issue by wrapping the VmException contained in its corresponding variant, and adds a test to ensure that the size of CairoRunError doesn't surpass 32 bytes

github-actions · 2024-05-08T21:28:57Z

**Hyper Thereading Benchmark results**




hyperfine -r 2 -n "hyper_threading_main threads: 1" 'RAYON_NUM_THREADS=1 ./hyper_threading_main' -n "hyper_threading_pr threads: 1" 'RAYON_NUM_THREADS=1 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 1
  Time (mean ± σ):     27.212 s ±  0.002 s    [User: 26.331 s, System: 0.879 s]
  Range (min … max):   27.211 s … 27.214 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 1
  Time (mean ± σ):     26.887 s ±  0.077 s    [User: 26.107 s, System: 0.778 s]
  Range (min … max):   26.833 s … 26.942 s    2 runs
 
Summary
  'hyper_threading_pr threads: 1' ran
    1.01 ± 0.00 times faster than 'hyper_threading_main threads: 1'




hyperfine -r 2 -n "hyper_threading_main threads: 2" 'RAYON_NUM_THREADS=2 ./hyper_threading_main' -n "hyper_threading_pr threads: 2" 'RAYON_NUM_THREADS=2 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 2
  Time (mean ± σ):     14.597 s ±  0.013 s    [User: 26.937 s, System: 0.829 s]
  Range (min … max):   14.587 s … 14.606 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 2
  Time (mean ± σ):     14.787 s ±  0.017 s    [User: 26.764 s, System: 0.793 s]
  Range (min … max):   14.776 s … 14.799 s    2 runs
 
Summary
  'hyper_threading_main threads: 2' ran
    1.01 ± 0.00 times faster than 'hyper_threading_pr threads: 2'




hyperfine -r 2 -n "hyper_threading_main threads: 4" 'RAYON_NUM_THREADS=4 ./hyper_threading_main' -n "hyper_threading_pr threads: 4" 'RAYON_NUM_THREADS=4 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 4
  Time (mean ± σ):     11.112 s ±  0.007 s    [User: 38.620 s, System: 0.992 s]
  Range (min … max):   11.107 s … 11.117 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 4
  Time (mean ± σ):     10.602 s ±  0.413 s    [User: 38.032 s, System: 0.933 s]
  Range (min … max):   10.310 s … 10.893 s    2 runs
 
Summary
  'hyper_threading_pr threads: 4' ran
    1.05 ± 0.04 times faster than 'hyper_threading_main threads: 4'




hyperfine -r 2 -n "hyper_threading_main threads: 6" 'RAYON_NUM_THREADS=6 ./hyper_threading_main' -n "hyper_threading_pr threads: 6" 'RAYON_NUM_THREADS=6 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 6
  Time (mean ± σ):     10.734 s ±  0.229 s    [User: 39.018 s, System: 0.998 s]
  Range (min … max):   10.572 s … 10.896 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 6
  Time (mean ± σ):     10.630 s ±  0.346 s    [User: 38.248 s, System: 0.970 s]
  Range (min … max):   10.385 s … 10.875 s    2 runs
 
Summary
  'hyper_threading_pr threads: 6' ran
    1.01 ± 0.04 times faster than 'hyper_threading_main threads: 6'




hyperfine -r 2 -n "hyper_threading_main threads: 8" 'RAYON_NUM_THREADS=8 ./hyper_threading_main' -n "hyper_threading_pr threads: 8" 'RAYON_NUM_THREADS=8 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 8
  Time (mean ± σ):     10.591 s ±  0.118 s    [User: 39.393 s, System: 1.006 s]
  Range (min … max):   10.508 s … 10.674 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 8
  Time (mean ± σ):     10.379 s ±  0.040 s    [User: 38.488 s, System: 1.041 s]
  Range (min … max):   10.351 s … 10.407 s    2 runs
 
Summary
  'hyper_threading_pr threads: 8' ran
    1.02 ± 0.01 times faster than 'hyper_threading_main threads: 8'




hyperfine -r 2 -n "hyper_threading_main threads: 16" 'RAYON_NUM_THREADS=16 ./hyper_threading_main' -n "hyper_threading_pr threads: 16" 'RAYON_NUM_THREADS=16 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 16
  Time (mean ± σ):     10.662 s ±  0.041 s    [User: 39.583 s, System: 1.014 s]
  Range (min … max):   10.633 s … 10.691 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 16
  Time (mean ± σ):     10.320 s ±  0.090 s    [User: 38.833 s, System: 1.086 s]
  Range (min … max):   10.257 s … 10.384 s    2 runs
 
Summary
  'hyper_threading_pr threads: 16' ran
    1.03 ± 0.01 times faster than 'hyper_threading_main threads: 16'

github-actions · 2024-05-08T21:37:13Z

Benchmark Results for unmodified programs 🚀

Command	Mean [s]	Min [s]	Max [s]	Relative
`base big_factorial`	2.041 ± 0.010	2.030	2.060	1.00
`head big_factorial`	2.058 ± 0.059	2.027	2.225	1.01 ± 0.03

Command	Mean [s]	Min [s]	Max [s]	Relative
`base big_fibonacci`	1.993 ± 0.014	1.976	2.020	1.00
`head big_fibonacci`	2.000 ± 0.018	1.979	2.031	1.00 ± 0.01

Command	Mean [s]	Min [s]	Max [s]	Relative
`base blake2s_integration_benchmark`	7.597 ± 0.076	7.490	7.721	1.00
`head blake2s_integration_benchmark`	7.619 ± 0.151	7.457	7.952	1.00 ± 0.02

Command	Mean [s]	Min [s]	Max [s]	Relative
`base compare_arrays_200000`	2.120 ± 0.029	2.094	2.175	1.01 ± 0.02
`head compare_arrays_200000`	2.107 ± 0.018	2.086	2.138	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base dict_integration_benchmark`	1.422 ± 0.020	1.407	1.478	1.01 ± 0.02
`head dict_integration_benchmark`	1.402 ± 0.006	1.394	1.414	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base field_arithmetic_get_square_benchmark`	1.291 ± 0.017	1.276	1.336	1.00 ± 0.02
`head field_arithmetic_get_square_benchmark`	1.289 ± 0.013	1.275	1.316	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base integration_builtins`	7.680 ± 0.134	7.520	7.985	1.01 ± 0.02
`head integration_builtins`	7.624 ± 0.077	7.481	7.722	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base keccak_integration_benchmark`	7.873 ± 0.188	7.725	8.367	1.00
`head keccak_integration_benchmark`	7.895 ± 0.100	7.710	8.026	1.00 ± 0.03

Command	Mean [s]	Min [s]	Max [s]	Relative
`base linear_search`	2.065 ± 0.011	2.051	2.086	1.00
`head linear_search`	2.087 ± 0.031	2.053	2.144	1.01 ± 0.02

Command	Mean [s]	Min [s]	Max [s]	Relative
`base math_cmp_and_pow_integration_benchmark`	1.693 ± 0.006	1.681	1.701	1.01 ± 0.01
`head math_cmp_and_pow_integration_benchmark`	1.675 ± 0.010	1.663	1.694	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base math_integration_benchmark`	1.598 ± 0.019	1.584	1.650	1.01 ± 0.02
`head math_integration_benchmark`	1.585 ± 0.016	1.563	1.621	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base memory_integration_benchmark`	1.191 ± 0.004	1.184	1.200	1.00 ± 0.01
`head memory_integration_benchmark`	1.188 ± 0.008	1.175	1.196	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base operations_with_data_structures_benchmarks`	1.828 ± 0.043	1.799	1.945	1.02 ± 0.02
`head operations_with_data_structures_benchmarks`	1.798 ± 0.006	1.790	1.809	1.00

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`base pedersen`	524.8 ± 4.8	519.0	535.5	1.02 ± 0.01
`head pedersen`	514.6 ± 5.7	511.8	530.7	1.00

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`base poseidon_integration_benchmark`	964.9 ± 4.3	957.5	971.3	1.00
`head poseidon_integration_benchmark`	965.9 ± 6.2	959.2	979.2	1.00 ± 0.01

Command	Mean [s]	Min [s]	Max [s]	Relative
`base secp_integration_benchmark`	1.857 ± 0.020	1.838	1.898	1.01 ± 0.01
`head secp_integration_benchmark`	1.848 ± 0.015	1.830	1.873	1.00

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`base set_integration_benchmark`	643.8 ± 5.4	639.3	657.7	1.00
`head set_integration_benchmark`	660.0 ± 2.2	658.2	665.7	1.03 ± 0.01

Command	Mean [s]	Min [s]	Max [s]	Relative
`base uint256_integration_benchmark`	4.206 ± 0.037	4.168	4.291	1.00
`head uint256_integration_benchmark`	4.245 ± 0.064	4.140	4.338	1.01 ± 0.02

Oppen · 2024-05-09T12:45:57Z

vm/src/cairo_run.rs

@@ -153,7 +159,13 @@ pub fn cairo_run_pie(

    cairo_runner
        .run_until_pc(end, &mut vm, hint_processor)


I think I identified why this doesn't work. By this point the Result<_, VirtualMachineError> is already returned. What we need to make smaller is the VirtualMachineError itself.
Also, ProgramError is potentially big and contributes to CairoRunError's size as well. This is because of the IO and Parse variants as well as all of the ones containing String.
It shouldn't matter for the common case as this happens once at the beginning and once at the end. However, VirtualMachineError is likely to be returned in several places.

RunnerError::PageNotOnSegment(Relocatable, usize) is one candidate to boxing.

VirtualMachineError is 32 bits already, should we make it even smaller?

Bytes, but yeah, it's used in many places. I'm not sure the error is the culprit though.

fmoletta added 2 commits May 8, 2024 18:15

Box VmException

830b2e2

Clippy

7ca0d28

Oppen reviewed May 9, 2024

View reviewed changes

fmoletta closed this May 14, 2024

fmoletta deleted the box-vm-exception branch May 14, 2024 13:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Change `CairoRunError::VmException` to `Box<VmException>` #1756

perf: Change `CairoRunError::VmException` to `Box<VmException>` #1756

fmoletta commented May 8, 2024 •

edited

Loading

github-actions bot commented May 8, 2024

github-actions bot commented May 8, 2024

Oppen May 9, 2024

Oppen May 9, 2024

fmoletta May 10, 2024

Oppen May 10, 2024

		@@ -153,7 +159,13 @@ pub fn cairo_run_pie(

		cairo_runner
		.run_until_pc(end, &mut vm, hint_processor)

perf: Change CairoRunError::VmException to Box<VmException> #1756

perf: Change CairoRunError::VmException to Box<VmException> #1756

Conversation

fmoletta commented May 8, 2024 • edited Loading

github-actions bot commented May 8, 2024

github-actions bot commented May 8, 2024

Oppen May 9, 2024

Choose a reason for hiding this comment

Oppen May 9, 2024

Choose a reason for hiding this comment

fmoletta May 10, 2024

Choose a reason for hiding this comment

Oppen May 10, 2024

Choose a reason for hiding this comment

perf: Change `CairoRunError::VmException` to `Box<VmException>` #1756

perf: Change `CairoRunError::VmException` to `Box<VmException>` #1756

fmoletta commented May 8, 2024 •

edited

Loading