-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
batcher: keep blocks, channels and frames in strict order & simplify reorg handling #12390
Conversation
this is strange, I don't think we should expect channels with frames but no blocks...
and simplify channel.TxConfirmed API
Instead of optimizing for a clean shutdown (which we couldn't guarantee anyway), this change optimizes for code simplicity. This change also helps us restrict the amount of code which mutates the channelQueue (removePendingChannel was doing removal of channels at arbitrary positions in the queue). The downside is that we may end up needlessly resubmitting some data after the reset. Reorgs are rare, so it makes sense to optimize for correctness rather than DA costs here.
using new TestMetrics struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one confusion in handleChannelTimeout
that we need to sort out. Otherwise lgtm, just minor comments and improvement proposals.
…by requeue or timeout
We were trimming older channels and keeping new ones. We need to trim newer channels and keep old ones. Fixes associated test (see previous commit).
|
||
|
||
### Reorgs | ||
When an L2 unsafe reorg is detected, the batch submitter will reset its state, and wait for any in flight transactions to be ingested by the verifier nodes before starting work again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes safe L2 blocks can be reorged by an L1 reorg.
What I mean here by an L2 reorg, is when the unsafe blocks being pulled from the sequencer don't descend from the previous L2 blocks in the batcher's memory. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that makes sense! Could we rephrase to just say "ingested by the sequencer's op-node" here? Or are there cases the batcher is really waiting for nodeS?
And just to be sure, this kind of reorg can only happen when an L1 reorg happens right..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can make that edit.
An unsafe reorg can happen when e.g. the batcher is down for a long time such that the sequencing window is breached https://docs.optimism.io/stack/rollup/derivation-pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On reading the code more here, this same logic would also be applied for safe reorgs right? So think this should say both unsafe and safe reorgs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good point. Could you send a PR for these readme changes?
Also, the logic for handling these reorgs is being refactored here #13060
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I'll wait for that PR to land and then refactor my PR on top of it. Also I'll include the readme changes in my PR to force you to review it :D
Closes #12123
Design doc https://www.notion.so/oplabs/op-batcher-re-architecture-114f153ee162803d943ff4628ab6578f?pvs=4
Changes (at the reviewer's request I could split this PR up):
op-service/queue
type in a couple more places in the batcherFrom the issue:
We should check