-
-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase stateCache size to fix headState missing in checkpoint sync start #4534
Conversation
Performance Report✔️ no performance regression detected Full benchmark results
|
@@ -6,7 +6,7 @@ import {IMetrics} from "../../metrics/index.js"; | |||
import {MapTracker} from "./mapMetrics.js"; | |||
import {stateInternalCachePopulated} from "./stateContextCheckpointsCache.js"; | |||
|
|||
const MAX_STATES = 3 * 32; | |||
const MAX_STATES = 6 * 32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to investigate the impact of this with regards to the heap memory (which lead to gc
percentage metrics). In the past when chain was not finalized for a while I saw the heap memory spiked at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must not merge this increase in cache size. The problem here is that the head state should never be dropped; period. This change will dramatically increase the change of OOM in Lodestar
👍 let me see how we can handle this particular case in a better way |
fixed in #4562 |
In checkpoint sync start, the stateCache of 3 * 32 caches seem to be not sufficient since generally we start from finalized which is almost 3 epochs behind
So before bn can sync blocks and generate more states nearer to head, the getHeadState functions seem to be running out of the states.
This PR does a quick-fix for expanding the stateCache's cache to 6 epochs (to give margin on 3 epochs) so that bns can stay behind 3 epochs + while still trying to find/sync peers.
There could be a better solution, but this is a hotfix to the current issue users might run into.
(see the notifier crossing 96+ skipped slots here since last finalized checkpoint sync comforably without error)
User issue also seems to have resolved.
Closes #4523