-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimistic post-merge sync #2691
Comments
Do you mean here that the
|
I concur that item's 1 through 5 should not be performed on optimistic BC heads. CERTAINLY items 1 through 3 should not be performed on an unsafe/optimistic head. These are simply dangerous for attesters. An attestation for an incorrect chain could result in the attester stuck on such a chain (in the even that two chains had 2/3 and conflicting ffg info) and building on incorrect beacon blocks is (a) currently just a bad behavior for the network and (b) when we have an execution proof of custody (which we expect sooner rather than later), it could result in slashing in some cases. As for APIs, I don't think it makes sense to serve an optimistic head. The user would not be able to then go look at the EL contents of a such a head and thus would have a broken view of what maybe is the head. If EL isn't resolved for some stretch, that is essentially the aggregate EL+CL client still "syncing" that segment, and thus it is natural to treat it as such (even though one half of layers is resolved). As for P2P, it's a bit less straight forward. I don't think you should serve optimistic beacon blocks in For gossip, though, it's a bit less clear. In many SYNCING situations, EL might be near the head so you want to still get new CL blocks so you can quickly resolve segments when EL finishes SYNCING. If you look at the Merge p2p |
This seems risky. If attesters do not attest to unsafe heads, then how would an unsafe head ever become safe? (I'm sure in some situations it would, because attestations later show up, but not all) |
This only happens if your EL is syncing. *THIS IS NOT safe/unsafe wrt making decisions about beacon blocks and chance of re-org. This is unsafe because the CL has been validated but not the EL. We conflated "unsafe" in two different Merge convos and conventions. "Optimistic" CL is probably a better term here |
We experimented with a similar concept in Grandine for a different purpose - unlimited parallelization of blocks signatures verification. Kinda similar situation as the latest chunk of the chain had semi-verified (everything checked except signatures) blocks too. In our case fork choice built the chain with blocks skipping signature verification in order to advance the state enough so it made possible to spin a high amount (at least hundreds) of block signatures verification tasks. After we implemented it the whole thing looked so terrible and unsafe that we dropped this idea. Quadratic complexity of already complex optimized fork choice. Looking forward to your solution as it will solve the unlimited block signatures verification parallelization too. |
Good point @ralexstokes, thanks. I've added your suggestion :)
Indeed, gossip is a good point. I also tend to think that we should continue to gossip blocks on an optimistic head.
It's important to note that my scheme fails hard (i.e. client shutdown, delete the database) if an invalid block is finalized. The primary reason I would be comfortable implementing such a scheme is because we verify signatures along the way. In order to get a failure in this optimistic execution-payload scheme you need to get 2/3rds (of a random distribution) of active validators signing across invalid blocks. If we were to delay signature verification across a unlimited number of blocks, some batches would contain blocks that finalize blocks earlier in the batch (mainnet usually finalizes every 64 blocks). Since there is no signature verification, it would be trivial for anyone to construct a chain of blocks that looks like it finalizes. So, to do unlimited parallelization of block signatures, you need a client design that makes it possible to revert finality. That is not something I plan to implement here unfortunately. |
This can be solved optimistically by doing a quick check of proposer signature. So that's not too big problem, especially if reverting is implemented.
Grandine doesn't have persistence and finalization coupling. It can run in memory for very long and we dump the state only to avoid full resync after a restart. However, as I mentioned before, the implementation we did back then felt too hacky. Anyway, as signatures are checked, then the only problem is to not get into a situation where 2/3rds finalizes invalid payload. This means that unsafe head should be an isolated optimization and should not be exponsed elsewhere, otherwise we may learn how users use it in creative ways that make 2/3rds finalizing invalid payload. |
I've done some more thinking on this and my latest collection of information lives here: |
I'll close this since we've already implemented optimistic sync (and done the merge 🎉) |
Description
This is a tracking and discussion issue for implementing "optimistic" Beacon Chain (BC) sync once the merge-fork-epoch has passed. It aims to collate the lessons learned and information shared in the following two Lighthouse PRs:
This is a work-in-progress effort to maintain my notes in an organised fashion.
Terminology
MERGE_FORK_EPOCH
. PoW Ethereum can (theoretically) exist indefinitely beyond this point.get_terminal_pow_block
first returns
Some(pow_block)
and it is included by reference as the parent of anExecutionPayload
in the BC.This must happen either at or after the Merge Fork. PoW Ethereum ends here.
Optimistic Sync
After TB inclusion on the BC, if we follow the specs exactly then we are simply unable to import beacon blocks without a connection to an EL client that is synced to our head (or later).
Whilst this is nice from a specification point of view, it's not great in practice. EL clients have developed very advanced ways of syncing the Ethereum state across long block-spans. Being spoon-fed block-by-block from a CL client is a major step backwards.
In order for EL clients to be able to use their fancy sync mechanisms, the CL clients need to zoom ahead and obtain all the valid beacon blocks they can and send the execution payloads to the EL clients. Ideally, the CL clients zoom to the head of the BC and are able to start sharing the latest, tip-of-the-chain
execution_payload
s with the EL. This gives the EL a nice big, juicy chain segment to sync.Since the CL needs to reach the head of the BC before the EL can sync to an equivalent head, the CL must import beacon blocks without verifying the execution payloads. This is, technically, a violation of the BC specification. Some might call it "unsafe", but we call it "optimistic".
In summary, optimistic sync is where a CL syncs the BC without verifying all the execution payload values with an EL.
From Optimism to Realism
Syncing a CL client without verifying the execution payload values at all is simply unsafe (at least as far as I'm concerned). So, once we mange to get our EL synced, we should go back and verify all of the execution payloads we imported along the way.
Thankfully, this is not as tedious as it sounds. If one execution payload is valid, then all the ancestors must be valid. So, as long as we've ensured that the execution payloads we've imported all form a chain, if all the chain-heads (chain-tips) are valid, then all of our beacon blocks become fully verified and we're no longer an optimistic client (a realistic client?).
But what if one of those execution payloads is invalid? Well, we just need to invalidate that block and its descendants. That sounds easy, but there are two scenarios to consider:
In the case of (1), we're in serious trouble. As I understand it, there aren't any CL clients prepared to handle a reversion in the finalized chain (Lighthouse wont). So, in this case I think we simply need to shutdown, log critical errors and request the user to re-sync on a trusted internet connection.
In the case of (2), this is going to be much simpler. All the CL clients are prepared for re-orgs in the non-finalized chain. What they would do is go and remove the invalid block (and descendants) from their fork-choice tree and then run the fork-choice algorithm to find a new head that does not include any invalid execution payloads.
Dealing with Uncertainty
There are various different things a CL client needs to do with the blocks in their database:
When it comes to blocks with a valid payload, it's clear that we're free to do any of those tasks. However, when it comes to invalid blocks, I'd say it's clear that we shouldn't do any of those things.
But what about when we have blocks with an unknown execution payload status? I.e., the blocks we imported optimistically and haven't yet had verified? At this point, I think I'm also of the opinion that we should do any of those things either. Notably, it would be impossible to produce a block atop a block with an "unknown" status, since our EL can't build a new block atop one it doesn't know!
So, if we know that our head has an unknown status we can't build atop it. But should we try to fork around it and build atop the best verified head (our "safe head")? I'm not convinced the correct behaviour here, but I think that we should not try to build around it, since we would be forking the chain when we know that we have an incomplete picture. I really need to think deeply about this and if it will cause liveness failures.
Additional Resources
most_recent_correct_ancestor
toengine_processPayload
which would make it very easy for us to find all the invalid ancestors of a block in our fork-choice tree.The text was updated successfully, but these errors were encountered: