feat(nightly): execution `become`s faster #2013

jonathanpwang · 2025-08-20T06:41:41Z

Uses the nightly become keyword to tell LLVM to musttail. This did not work at first when the Handler type returned Result but after I changed it to return nothing it does.

Ideas taken from https://github.com/xacrimon/tcvm/blob/main/src/interp.rs

To remove code duplication, I created local declarative macros dispatch! for any "enum dispatch" logic of function pointers we were doing. The benefit is that this is now shared across four functions: {e1,e2} x {pre_compute,handler}. I did not make a single general purpose macro because Rust token parsing rules actually make this hard because it doesn't let you reference ident's you don't declare, and the ident's used actually vary based on the function.

Removed pc_base and just leave some empty space to avoid pc - pc_base calculation. This is because self.pc_base is a runtime variable, so we don't want to have to use a register (or worse load/store) to access it.

To complete this:

For each crate, add "tco" feature, and then add #[create_tco_handler] attribute above execute_e1_impl functions. Then for each Executor implementation, copy the pre_compute function implementation verbatim but switch to handler function signature and return the tco handler fn pointer instead.
Do the same for metered execution.
~~Switch to x86 global asm instead of relying on LLVM if we want to be extra safe.~~ this seemed complicated and hard to do fully properly so I prefer become.

Closes INT-4309

`become` keyword was causing some corruption and not passing references properly between calls.

codspeed-hq · 2025-08-20T18:42:36Z

CodSpeed WallTime Performance Report

Merging #2013 will improve performances by 95.21%

_{Comparing feat/tco (aee66b1) with main (4852493)}

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

⚡ 18 improvements
✅ 12 untouched benchmarks

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
⚡	`benchmark_execute[bubblesort]`	33.1 ms	21.6 ms	+53.62%
⚡	`benchmark_execute[factorial_iterative_u256]`	150.6 ms	84.6 ms	+78.03%
⚡	`benchmark_execute[fibonacci_iterative]`	48.8 ms	26.5 ms	+84.06%
⚡	`benchmark_execute[fibonacci_recursive]`	75.7 ms	38.8 ms	+95.21%
⚡	`benchmark_execute[keccak256]`	38.3 ms	24.7 ms	+55.16%
⚡	`benchmark_execute[keccak256_iter]`	126.4 ms	83.7 ms	+51.09%
⚡	`benchmark_execute[pairing]`	148.9 ms	134.8 ms	+10.45%
⚡	`benchmark_execute[quicksort]`	37.4 ms	23.6 ms	+58.44%
⚡	`benchmark_execute[revm_transfer]`	53.6 ms	39.2 ms	+36.76%
⚡	`benchmark_execute[sha256]`	37.7 ms	23.2 ms	+62.73%
⚡	`benchmark_execute[sha256_iter]`	109.6 ms	60.1 ms	+82.4%
⚡	`benchmark_execute_metered[bubblesort]`	67.7 ms	44.4 ms	+52.41%
⚡	`benchmark_execute_metered[factorial_iterative_u256]`	252.8 ms	210.8 ms	+19.91%
⚡	`benchmark_execute_metered[fibonacci_recursive]`	109.1 ms	91.6 ms	+19.09%
⚡	`benchmark_execute_metered[keccak256_iter]`	179.6 ms	147.9 ms	+21.4%
⚡	`benchmark_execute_metered[quicksort]`	71.4 ms	50.1 ms	+42.53%
⚡	`benchmark_execute_metered[revm_transfer]`	66.7 ms	60.1 ms	+11.06%
⚡	`benchmark_internal_verifier_execute[fibonacci]`	46.4 ms	40.4 ms	+14.68%

.github/workflows/benchmarks-execute.yml

shuklaayush · 2025-08-22T14:58:41Z

crates/vm/derive/src/tco.rs

+                return;
+            }
+            // exec_state.pc should have been updated by execute_impl at this point
+            let next_handler = interpreter.get_handler(exec_state.vm_state.pc);


both get_handler and get_pre_compute call get_pc_index. maybe we can calculate the index once and reuse it
or add a get_pre_handler_and_pre_compute function

the compiler should optimize this out

oh actually this is the new pc

still think there's duplicate work happening here that won't be optimized out

oh you mean like because new pc doesn't get passed across function boundary

shuklaayush · 2025-08-22T15:02:24Z

crates/vm/derive/src/tco.rs

+            }
+            // exec_state.pc should have been updated by execute_impl at this point
+            let next_handler = interpreter.get_handler(exec_state.vm_state.pc);
+            if next_handler.is_none() {


not required rn but maybe we can just return a null pointer instead of Option

again I hope the compiler just optimizes this out

shuklaayush · 2025-08-22T15:04:07Z

crates/vm/derive/src/tco.rs

+    let handler_fn = quote! {
+        #[inline(never)]
+        unsafe fn #handler_name #handler_generics (
+            interpreter: &::openvm_circuit::arch::interpreter::InterpretedInstance<#f_type, #ctx_type>,


not sure if it matters but maybe we just pass like a slimmed down instruction table that contains a function/mapping from pc -> (data, handler) instead of a reference to this whole struct

yea will do with the register pinning

shuklaayush

looks good. there's an argument that we can have a single handler function instead of separate precompute and handler functions but i think the current approach with using macros to avoid duplication is fine too

benchmarks/prove/Cargo.toml

crates/vm/src/arch/execution.rs

shuklaayush · 2025-08-22T15:25:04Z

crates/vm/src/arch/interpreter.rs

-    /// `pc_index = (pc - pc_base) / DEFAULT_PC_STEP`.
+    /// `pc_index = pc / DEFAULT_PC_STEP`.
+    /// SAFETY: The first `pc_base / DEFAULT_PC_STEP` entries will be unreachable. We do this to
+    /// avoid needing to subtract `pc_base` during runtime.


is pc_base guaranteed to be bounded?

well it's bounded by the ELF size

shuklaayush · 2025-08-22T15:31:38Z

crates/vm/src/arch/interpreter.rs

+            }
+            #[cfg(feature = "tco")]
+            {
+                tracing::debug!("execute_tco");


slightly pedantic, but i feel the tail-recursive function is more like a "trampoline"

hmm I thought (but didn't validate) that other interpreters refer to trampoline as the standard one?

crates/vm/src/arch/interpreter.rs

shuklaayush · 2025-08-22T15:40:35Z

crates/vm/derive/src/tco.rs

+                return;
+            }
+            // exec_state.pc should have been updated by execute_impl at this point
+            let next_handler = interpreter.get_handler(exec_state.vm_state.pc);


still think there's duplicate work happening here that won't be optimized out

crates/vm/src/arch/state.rs

extensions/native/circuit/src/poseidon2/execution.rs

extensions/native/circuit/src/loadstore/execution.rs

extensions/native/circuit/src/jal_rangecheck/execution.rs

nyunyunyunyu

Nothing else besides Ayush's comments. LGTM if the comments are addressed

github-actions · 2025-08-22T21:59:17Z

group	app.proof_time_ms	app.cycles	app.cells_used	leaf.proof_time_ms	leaf.cycles	leaf.cells_used
verify_fibair	2,078	322,700	18,750,324	-	-	-
fibonacci	(-36 [-1.5%]) 2,379	1,500,210	51,504,507	-	-	-
regex	(-111 [-1.5%]) 7,525	4,108,597	164,734,992	-	-	-
ecrecover	(-9 [-0.6%]) 1,377	140,487	8,866,654	-	-	-
pairing	(-41 [-1.1%]) 3,855	1,882,939	98,834,293	-	-	-

Commit: aee66b1

Benchmark Workflow

jonathanpwang added 4 commits August 19, 2025 16:20

chore: add placeholder tco feature

4d9e261

feat: add macro to generate tco handler and update interpreter for tco

13cc6f2

feat: rv32im tco without become keyword

fb3d997

`become` keyword was causing some corruption and not passing references properly between calls.

fmt

0c93530

This comment was marked as outdated.

Sign in to view

This comment was marked as resolved.

Sign in to view

jonathanpwang added 3 commits August 20, 2025 10:47

feat: tco for other extensions

f7fd1d7

chore: update feature deps

f01ccd2

fixes

5a193a0

This comment has been minimized.

Sign in to view

jonathanpwang added 2 commits August 20, 2025 14:47

feat: simplify the handler type without Result

e6affb6

feat: try become keyword again

9d9aa8a

jonathanpwang force-pushed the feat/tco branch from 98988d9 to 9d9aa8a Compare August 20, 2025 21:57

jonathanpwang changed the title ~~feat: tail call elimination~~ feat(nightly): tail call elimination Aug 20, 2025

jonathanpwang added 6 commits August 20, 2025 16:09

chore: propagate tco feature

4c1de9d

feat: use custom macros to reduce code in ecc execution

76aa23d

refactor: use local dispatch! macros to reduce code duplication

a82dfb2

refactor: fp2 dispatch

5c7b832

chore: update feature comment

32c2dbd

feat: metered handler for algebra extension

eee6856

jonathanpwang changed the title ~~feat(nightly): tail call elimination~~ feat(nightly): execution becomes faster Aug 21, 2025

jonathanpwang added 7 commits August 20, 2025 18:41

refactor: use local dispatch! macros to reduce code duplication

c0d4d83

feat: run! macro for tco on pure+metered execution

bae6afe

chore: fmt

6e2b0a2

fix: missing handler for is_eq

714c97a

feat: bigint metered handler

8d3d06e

refactor: use dispatch! for rv32 executors

312f6d5

cleanup: turn off tco feature

9a2df6e

This comment has been minimized.

Sign in to view

chore: cargo shear

d265fe9

This comment has been minimized.

Sign in to view

jonathanpwang commented Aug 21, 2025

View reviewed changes

.github/workflows/benchmarks-execute.yml Show resolved Hide resolved

This comment has been minimized.

Sign in to view

shuklaayush reviewed Aug 21, 2025

View reviewed changes

.github/workflows/benchmarks-execute.yml Outdated Show resolved Hide resolved

fix: ci

64c0571

jonathanpwang force-pushed the feat/tco branch from 3505ba2 to 64c0571 Compare August 21, 2025 23:07

This comment has been minimized.

Sign in to view

jonathanpwang force-pushed the feat/tco branch 3 times, most recently from 4a1e5e8 to e066de4 Compare August 22, 2025 03:55

perf: remove pc_base from pc_idx calc

ce7c037

jonathanpwang force-pushed the feat/tco branch from e066de4 to ce7c037 Compare August 22, 2025 04:28

This comment has been minimized.

Sign in to view

shuklaayush reviewed Aug 22, 2025

View reviewed changes

nyunyunyunyu approved these changes Aug 22, 2025

View reviewed changes

jonathanpwang added 5 commits August 22, 2025 14:26

fix: remove unused error

63b013a

chore: don't keep pre_compute_insns when tco

5f5138c

chore: remove unused derive

f46bf70

chore: dispatch! for poseidon2

8a51077

chore: phantom lifetime

aee66b1

jonathanpwang added this pull request to the merge queue Aug 22, 2025

Merged via the queue into main with commit aad0172 Aug 22, 2025
34 checks passed

jonathanpwang deleted the feat/tco branch August 22, 2025 23:28

feat(nightly): execution becomes faster #2013

feat(nightly): execution becomes faster #2013

Uh oh!

Conversation

jonathanpwang commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

This comment was marked as resolved.

This comment was marked as resolved.

This comment has been minimized.

codspeed-hq bot commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging #2013 will improve performances by 95.21%

Summary

Benchmarks breakdown

Uh oh!

This comment has been minimized.

This comment has been minimized.

Uh oh!

This comment has been minimized.

Uh oh!

This comment has been minimized.

This comment has been minimized.

shuklaayush Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shuklaayush left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nyunyunyunyu left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 22, 2025

Uh oh!

Uh oh!

Uh oh!

feat(nightly): execution `become`s faster #2013

feat(nightly): execution `become`s faster #2013

jonathanpwang commented Aug 20, 2025 •

edited

Loading

codspeed-hq bot commented Aug 20, 2025 •

edited

Loading

shuklaayush Aug 22, 2025 •

edited

Loading