-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: More efficient touch_range #1322
base: main
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
can you add explanation of what the optimization is, and summarize what the perf gain is for microbenchmarks as well as reth benchmark |
seems to have no perf diff |
let first_address_label = address / CHUNK as u32; | ||
let last_address_label = (address + len - 1) / CHUNK as u32; | ||
for address_label in first_address_label..=last_address_label { | ||
self.touch_node(0, as_label, address_label); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is still internally calling the same function, I'm guessing the compiler just optimizes it out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would the compiler optimize it out? Previously this function would have been called N times, now it is called N/8 times.
Closing for now because it doesn't seem to have a real perf impact, and introduces more code |
@jonathanpwang I have a hard time believing this has no real perf impact. Local microbenchmark shows that |
This comment has been minimized.
This comment has been minimized.
still not seeing a lot of perf diff: https://github.com/axiom-crypto/openvm-reth-benchmark/blob/gh-pages/benchmarks-dispatch/refs/heads/main/reth-e9fe5226fd30a7646b0e44d3c1d0681ea961dcd5deeb55e4fd95bf88dd4dfd0f.md I only expect reth.prove_e2e.block_21000000 |
Do we ever call |
Commit: d49e76a |
On my machine, trace gen for reth benchmark for block 21000000 seems to decrease by 45s (~9%). Not sure why this is not reflected on reth benchmark (not sure how memory allocator/machine architecture is really relevant here; this PR should just be strictly decreasing the number of calls to HashMap::insert). |
Previously
touch_range
was implemented by callingtouch_address
on each address inpointer..pointer + len
. But for persistent memory, in bothMemoryMerkleChip
andPersistentBoundaryChip
, we actually care about touched aligned blocks of size 8. So most of the triggered calls totouch_address
via a singletouch_range
were redundant, repeatedly querying the same hashmap at the same block index.