-
-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regen refactor > safer resilient strategy #4005
Comments
Regarding loading a state from db, iirc this is expensive, like 6+ seconds. Might be good to benchmark this and get this lower. |
In terms of time to result, the tradeoff math is roughly:
Also keep in mind that if you advance and old state significantly the cost of the final hashTreeRoot can be very high as the whole state is different. However there's a memory limit in the amount and fork-ness of states you can keep in memory. In bad network conditions you must drop states to prevent OOMs, so regen from disk must always be available. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 15 days if no further activity occurs. Thank you for your contributions. |
#6008 should resolve this issue |
closed since n-historical-states is now enabled by default |
Background
Consensus clients need to cache some states to fully participate in the network. States are very heavy so you can't cache all the states that you may need. Writing all the possible states you may need to disk is not practical either. So what do you do?
So the stateCache and checkpointStateCache handle the first point: deciding which states to keep in memory. The regen module handles the second: provide the ability to regenerate any state, within some boundary.
Current strategy
This approach works well for good network conditions. Thanks for tree structural sharing the cost of those 96 states in a linear chain is very low multiple of the cost of a single state (~1.2x).
However, during attacks, bugs or highly forked network our node quickly runs out of memory or can become unable to follow the chain.
Relevant issues:
Improvements goals
So, we can do better. Specifically:
Proposed strategies
1. Regen from memory and disk
On every checkpoint write a state to disk to a "hot state db" bucket. On finalization, move some of those states "cold state db" or "archive db" bucket. Then on regen, use those states depending on the max distance of the closest available state in memory if any. This would allow to drop the need to keep the finalized state in memory.
2. Bound regen depending on consumer
Depending on the caller, restrict the work triggered by regen
3. WeakRef state cache
Allow GC to drop state when low in memory. Cache only 3 states behind current head. Do it behind a flag to extend modes for lightclient
TBD
Closes #3099
The text was updated successfully, but these errors were encountered: