Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile the performance of in memory trie for shard 2 #10877

Closed
Tracked by #46
bowenwang1996 opened this issue Mar 25, 2024 · 10 comments
Closed
Tracked by #46

Profile the performance of in memory trie for shard 2 #10877

bowenwang1996 opened this issue Mar 25, 2024 · 10 comments
Assignees
Labels
A-benchmark Area: performance benchmarks A-stateless-validation Area: stateless validation

Comments

@bowenwang1996
Copy link
Collaborator

Previously, we concluded that with optimizations in runtime, the bottleneck of apply is mostly storage operations, which in memory trie should help a lot with. It would be good to understand how much we gain by enabling in memory trie for a shard and what the remaining performance bottlenecks are.

@tayfunelmas
Copy link
Contributor

Performed an initial set of measurements in apply_chunk latency. See the related dashboard here.
Enabled memtrie for shard 2 and 3, these spans indicate the time memtrie was enabled for these shards:
Screenshot

Observed 3x-10x reduction (will repeat with longer time and with more shared later).

I also observed long load times for memtrie:

Done loading trie from flat state, took 55.47568243s shard_uid=s3.v3
Done loading trie from flat state, took 207.43764391s shard_uid=s2.v3

Will look at what is going on there.

@bowenwang1996
Copy link
Collaborator Author

I also observed long load times for memtrie:

cc @robin-near

@tayfunelmas tayfunelmas added the A-stateless-validation Area: stateless validation label Apr 1, 2024
@tayfunelmas
Copy link
Contributor

tayfunelmas commented Apr 2, 2024

Measured different phases of constructing the memory trie. Looks like the most of the time spent is coming from reading the flat state values from the database (not from the constructing of the trie in memory).

@walnut-the-cat
Copy link
Contributor

@tayfunelmas , does long trie loading(from flat storage) and construction time impact performance? I was assuming this should be done before the next epoch behind the scene?

@tayfunelmas
Copy link
Contributor

tayfunelmas commented Apr 4, 2024

It is called in two cases:

  1. On startup (eg. after a restart of the node). code This load blocks the overall startup (the state sync/catchup happens after that).
  2. After catchup of a new shard. code This is done as part of catching up for the next epoch, so latency in this case should be less of a problem compared to the former case.

Besides, I found out that the latency is not only coming from the iterating over the flat-state over rocksDB. (example profiling view) It is also coming from the construction and hash computation of the memtrie; we encode/decode nodes (serialize/deserialize) between constructing the trie and computing hashes.
The part we need to investigate is whether the mechanism that is causing the slow down is also contributing to the latency post memtrie construction. Currently I am trying to find a way to speed up the loading of memtrie "without" changing this mechanism (which requires understanding why it was designed this way at the first place).

@walnut-the-cat
Copy link
Contributor

On startup (eg. after a restart of the node). code This load blocks the overall startup (the state sync/catchup happens after that).

To make sure I understand the 'latency' here correctly, the latency here is about when a validator can participate in consensus mechanism, right? This latency shouldn't have anything to do with 'how long it will take for a validate to perform chunk generation/validation'.

@tayfunelmas
Copy link
Contributor

Yes, once the memtrie is loaded, this latency will not contribute to the later operations such as block/chunk production or validation. In fact this load code is specific to one-off loading of the state and separate from the rest of the memtrie operations performed during block/chunk generation or validation.

@staffik
Copy link
Contributor

staffik commented Apr 5, 2024

We can move the memtrie loading part to a separate thread. I have a draft implementation. Will do it next week.

@tayfunelmas
Copy link
Contributor

Do you want to move the entire load operation to a separate thread? How does it help? Assuming we are talking about the node startup, the rest of the functionality needs to wait for memtrie load anyways. I think we can parallelize certain parts of the load instead, for example, the hash computation can start earlier in parallel while the trie is being built (it is currently done after the tree is fully constructed). But not sure about having the entire loading in a different thread, I might be missing something.

@staffik
Copy link
Contributor

staffik commented Apr 6, 2024

I was thinking about catchup. Yes, for startup it might be hard to do it. For startup I thought that maybe we could start with regular trie, and load memtrie in background, but that's probably not possible as we might want the state to not change during the load.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-benchmark Area: performance benchmarks A-stateless-validation Area: stateless validation
Projects
None yet
Development

No branches or pull requests

4 participants