Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fork choice should run after RPC blob processing #5474

Closed
michaelsproul opened this issue Mar 24, 2024 · 3 comments
Closed

Fork choice should run after RPC blob processing #5474

michaelsproul opened this issue Mar 24, 2024 · 3 comments
Labels
bug Something isn't working deneb

Comments

@michaelsproul
Copy link
Member

Description

It seems like we don't run fork choice after importing a block on this code path:

Ok(AvailabilityProcessingStatus::Imported(hash)) => {
debug!(
self.log,
"Block components retrieved";
"result" => "imported block and blobs",
"slot" => %slot,
"block_hash" => %hash,
);
}

This happens when we need to fetch blobs over RPC, and then complete block import off the back of a blob RPC response.

When there are no blobs, we run fork choice after importing the block here:

self.chain.recompute_head_at_current_slot().await;

Similarly, when gossip blobs complete a block, we run fork choice here:

self.chain.recompute_head_at_current_slot().await;

Therefore I think the only case missing is the one where we get blobs by RPC.

Version

Lighthouse v5.1.2

Additional Info

Some logs for slot 8708202 which was processed poorly by one of our mainnet nodes:

Mar 24 23:20:49.049 DEBG Successfully verified gossip block commitments: [0xb271…88cb, 0xa124…b7c1, 0x828d…98d7, 0x99d3…c902, 0x929f…d3df, 0x9075…4243], root: 0x03e76b4edf5a4ad6a54254846b3f184301e61081131ec69229e96d7362563dda, slot: 8708202, graffiti: teku/v24.3.0, service: beacon
Mar 24 23:20:49.373 DEBG Searching for block components block: 0x03e76b4edf5a4ad6a54254846b3f184301e61081131ec69229e96d7362563dda, peer_ids: [PeerId("16Uiu2HAkwNyHdQvmb1WW6wYoxg7i8RPZU2H9bC23ZfnmgAdtKeZP")], service: sync
Mar 24 23:20:49.373 DEBG Sending BlobsByRoot Request lookup_type: Current, peer: 16Uiu2HAkwNyHdQvmb1WW6wYoxg7i8RPZU2H9bC23ZfnmgAdtKeZP, blob_indices: [2, 5], block_root: 0x03e76b4edf5a4ad6a54254846b3f184301e61081131ec69229e96d7362563dda, method: BlobsByRoot, service: sync
Mar 24 23:20:49.663 DEBG Successfully verified gossip blob commitment: 0x9075…4243, index: 5, root: 0x03e7…3dda, slot: 8708202
Mar 24 23:20:49.855 DEBG Peer returned blob for single lookup blob_id: BlobIdentifier { block_root: 0x03e76b4edf5a4ad6a54254846b3f184301e61081131ec69229e96d7362563dda, index: 2 }, peer_id: 16Uiu2HAkwNyHdQvmb1WW6wYoxg7i8RPZU2H9bC23ZfnmgAdtKeZP, service: sync
Mar 24 23:20:50.142 DEBG Block components retrieved block_hash: 0x03e7…3dda, slot: 8708202, result: imported block and blobs

As we can see from these logs, the BN had all block components by 23:20:50.142, just 3.1s into the slot. However we don't see the head get updated until >8s later:

Mar 24 23:20:58.523 DEBG Fork choice updated head old_head: 0xf4d9e783aed03445d3a64e2e741a1631c03a11a38e1fcf4c081785e123a7ff41, old_head_weight: Some(1948605000000000), new_head: 0x03e76b4edf5a4ad6a54254846b3f184301e61081131ec69229e96d7362563dda, new_head_weight: Some(970590000000000), service: beacon

This triggers a snapshot cache miss for advancing the head, because the "head" that the state advance is looking for is the stale head from the previous slot, which has been (rightly) removed from the cache:

Mar 24 23:20:56.001 WARN Did not advance head state reason: Err(HeadMissingFromSnapshotCache(0xf4d9e783aed03445d3a64e2e741a1631c03a11a38e1fcf4c081785e123a7ff41)), service: state_advance

@s1na
Copy link

s1na commented Mar 26, 2024

Not sure if it's related but my node crashed and the last logs was Did not advance head state reason. I'm running v5.1.0:

Mar 24 07:57:01.942 INFO New block received                      root: 0x83a4d1194a6ae466dac3747197a8dc58e5b9d8af31c5558837901b83e7800cf4, slot: 8703583
Mar 24 07:57:05.000 INFO Synced                                  slot: 8703583, block:    …  empty, epoch: 271986, finalized_epoch: 271984, finalized_root: 0x03d0…6ce9, exec_hash: 0xa173…7888 (verified), peers: 92, service: slot_notifier
Mar 24 07:57:08.000 WARN Did not advance head state              reason: Err(HeadMissingFromSnapshotCache(0xbc188d4c79a8e142bfe3bae4436203a3fe42cf99bba90d35e95408b9aff5afe4)), service: state_advance
Mar 24 07:57:15.721 WARN Snapshot cache miss for blob verification, index: 0, block_root: 0xf376…d27f, service: beacon
Mar 24 07:57:15.749 WARN Snapshot cache miss for blob verification, index: 1, block_root: 0xf376…d27f, service: beacon
Mar 24 07:57:15.751 WARN Snapshot cache miss for blob verification, index: 2, block_root: 0xf376…d27f, service: beacon
Mar 24 07:57:15.768 WARN Snapshot cache miss for blob verification, index: 5, block_root: 0xf376…d27f, service: beacon
Mar 24 07:57:15.769 WARN Snapshot cache miss for blob verification, index: 4, block_root: 0xf376…d27f, service: beacon
Mar 24 07:57:15.790 WARN Snapshot cache miss for blob verification, index: 3, block_root: 0xf376…d27f, service: beacon
Mar 24 07:57:17.001 INFO Synced                                  slot: 8703584, block:    …  empty, epoch: 271987, finalized_epoch: 271985, finalized_root: 0x787d…7c05, exec_hash: 0xa173…7888 (verified), peers: 90, service: slot_notifier
./run.sh: line 1: 804065 Killed                  ./lighthouse bn --network mainnet --execution-endpoint http://localhost:8551 --execution-jwt /home/ubuntu/datadir-geth/geth/jwtsecret --checkpoint-sync-url https://mainnet.checkpoint.sigp.io --disable-deposit-contract-sync

@yashutanna
Copy link

yashutanna commented Mar 27, 2024

+1 saw the same issue as @s1na

Memory on the node shot up and crashed the process

Mar 27 05:49:05.001 INFO Synced                                  slot: 8724543, block:    …  empty, epoch: 272641, finalized_epoch: 272639, finalized_root: 0x09d1…dfc2, exec_hash: 0xe1a8…1b3a (verified), peers: 85, service: slot_notifier
Mar 27 05:49:08.001 WARN Did not advance head state              reason: Err(HeadMissingFromSnapshotCache(0x3b2edfbcfa31569c12ea8452d139e425ecc068b601b5d4cc3e6b3fae4ed6cffc)), service: state_advance
Mar 27 05:49:16.286 WARN Snapshot cache miss for blob verification, index: 0, block_root: 0x91ca…de24, service: beacon
Mar 27 05:49:16.319 WARN Snapshot cache miss for blob verification, index: 3, block_root: 0x91ca…de24, service: beacon
Mar 27 05:49:16.334 WARN Snapshot cache miss for blob verification, index: 2, block_root: 0x91ca…de24, service: beacon
Mar 27 05:49:16.387 WARN Snapshot cache miss for blob verification, index: 5, block_root: 0x91ca…de24, service: beacon
Mar 27 05:49:16.566 WARN Snapshot cache miss for blob verification, index: 1, block_root: 0x91ca…de24, service: beacon```

@michaelsproul
Copy link
Member Author

This seems related. We will try to get a release out with the fix soon (likely next week at this rate). If you want early access, you can try running unstable.

The snapshot cache issues will also be further resolved (for good) by tree-states: #3206

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working deneb
Projects
None yet
Development

No branches or pull requests

3 participants