-
Notifications
You must be signed in to change notification settings - Fork 700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
synchronizer: check l1blocks #3546
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the scope of this task, but the PR changes seem like overkill.
I expected it to be just a single method running concurrently to check for blocks related to the consolidated point on Ethereum.
I have added many comments, but I confess I got exhausted while reviewing it because I always thought, "Is this necessary?"
I understand this PR can achieve the goal, but assuming the price we are paying to maintain this implementation in the future, I prefer to drop everything and come up with a more straightforward solution that gets the consolidated point from Ethereum and checks how we are from there.
Proposal:
Assuming the Synchronizer must be able to recover by itself when a reorg is detected and this mechanism is a secondary protection to help the Synchronizer identify the reorg with a different strategy, I would suggest implementing the following in the simplest way possible, like:
- at the start of the application, get the current consolidated point block from Ethereum
- get all the blocks we have from the consolidated point and check all of them for a reorg
- if reorg is detected, flag it and wait for the synchronizer to fix it. Once fixed
- stores the last block we checked in memory
- keep monitoring the consolidated point on Ethereum until the last block we checked matches
- repeat
Integration with the Synchronizer can be done via a simple channel, which can be appended to the current synchronization process.
Advantages of this approach:
no changes in the DB
faster, runs only on memory
takes advantage of the network consolidation point instead of verifying everything
with less checks, it can load all the blocks in a single shot instead of one by one and check them concurrently
way less code to maintain
way less changes in the real code due to this extra protection
Reasoning:
We assume the consolidation point on Ethereum is where we trust a reorg will never happen; from this point, we do our own check to make sure all the blocks not consolidated yet are matching with Ethereum; once we guarantee this, the synchronizer takes place and continues his job synchronizing block by block until all the blocks we have checked are now part of the consolidated part of Ethereum, then we start over. If, for some reason, the synchronizer is not able to detect a reorg in the regular synchronization process, our next check will start from the last time we checked until the latest synchronized block, and we will find it, flagging it to the synchronizer to allow the reorg process to be executed in the next synchronizer reorg check.
Conclusion:
I don't feel comfortable merging this whole PR, and I'm open to discussing it if you consider it worth it. Otherwise, you can check my comments in the PR and follow with this implementation.
I double-check the scope of the task and it's the expected behaviour |
synchronizer/config.go
Outdated
// dontDoReorgCheckBeforeL2Sync if is true then the reorg check is not done before the L2 sync | ||
// this is a private field, can not be configured | ||
dontDoReorgCheckBeforeL2Sync bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this param needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unittest is failing because there is a extra call to CheckReorg before starting L2 sync. With this flag we skip this
if err != nil { | ||
log.Errorf("error resetting the state to a discrepancy block. Retrying... Err: %v", err) | ||
continue | ||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it good to block it forever?? Maybe 10 retries or something like that
* wip * run on background L1block checker * fix lint and documentation * fix conflict * add unittest * more unittest * fix lint * increase timeout for async unittest * fix unittest * rename GetResponse for GetResult and fix uniitest * add a second gorutines for check the newest blocks * more unittest * add unittest and run also preCheck on launch * by default Precheck from FINALIZED and SAFE * fix unittest, apply PR comments * changes suggested by ARR552 in integration method * fix documentation * import new network-l1-mock from PR#3553 * import new network-l1-mock from PR#3553 * import new network-l1-mock from PR#3553 * import new network-l1-mock from PR#3553 * fix unittest * fix PR comments * fix error * checkReorgAndExecuteReset can't be call with lastEthBlockSynced=nil * add parentHash to error * fix error * merge 3553 fix unittest * fix unittest * fix wrong merge * adapt parallel reorg detection to flow * fix unit tests * fix log * allow use sync parallel mode --------- Co-authored-by: Alonso <ARR551@protonmail.com>
* change number migration * add column checked on state.block * if no unchecked blocks return ErrNotFound * migration set to checked all but the block with number below max-1000 * add column checked on state.block (#3543) * add column checked on state.block * if no unchecked blocks return ErrNotFound * migration set to checked all but the block with number below max-1000 * Feature/#3549 reorgs improvement (#3553) * New reorg function * mocks * linter * Synchronizer tests * new elderberry smc docker image * new image * logs * fix json rpc * fix * Test sync from empty block * Regular reorg case tested * linter * remove empty block + fix LatestSyncedBlockEmpty * Improve check reorgs when no block is received during the call * fix RPC error code for eth_estimateGas and eth_call for reverted tx and no return value; fix e2e test; * fix test * Extra unit test * fix reorg until genesis * disable parallel synchronization --------- Co-authored-by: tclemos <thiago@polygon.technology> * migrations * Fix + remove empty blocks * unit test * linter * Fix + remove empty blocks (#3564) * Fix + remove empty blocks * unit test * linter * Fix/#3565 reorg (#3566) * fix + logs * fix loop * Revert "fix + logs" This reverts commit 39ced69. * fix L1InfoRoot when an error happens during the process of the L1 information (#3576) * fix * Comments + mock * avoid error from some L1providers when fromBlock is higher than toBlock * Revert some changes * comments * add L2BlockModulus to L1check * doc * fix dbTx = nil * fix unit tests * config * fix sync unit test * linter * fix config param typo * synchronizer: check l1blocks (#3546) * wip * run on background L1block checker * fix lint and documentation * fix conflict * add unittest * more unittest * fix lint * increase timeout for async unittest * fix unittest * rename GetResponse for GetResult and fix uniitest * add a second gorutines for check the newest blocks * more unittest * add unittest and run also preCheck on launch * by default Precheck from FINALIZED and SAFE * fix unittest, apply PR comments * changes suggested by ARR552 in integration method * fix documentation * import new network-l1-mock from PR#3553 * import new network-l1-mock from PR#3553 * import new network-l1-mock from PR#3553 * import new network-l1-mock from PR#3553 * fix unittest * fix PR comments * fix error * checkReorgAndExecuteReset can't be call with lastEthBlockSynced=nil * add parentHash to error * fix error * merge 3553 fix unittest * fix unittest * fix wrong merge * adapt parallel reorg detection to flow * fix unit tests * fix log * allow use sync parallel mode --------- Co-authored-by: Alonso <ARR551@protonmail.com> * linter * comment check --------- Co-authored-by: tclemos <thiago@polygon.technology>
* wip * run on background L1block checker * fix lint and documentation * fix conflict * add unittest * more unittest * fix lint * increase timeout for async unittest * fix unittest * rename GetResponse for GetResult and fix uniitest * add a second gorutines for check the newest blocks * more unittest * add unittest and run also preCheck on launch * by default Precheck from FINALIZED and SAFE * fix unittest, apply PR comments * changes suggested by ARR552 in integration method * fix documentation * import new network-l1-mock from PR#3553 * import new network-l1-mock from PR#3553 * import new network-l1-mock from PR#3553 * import new network-l1-mock from PR#3553 * fix unittest * fix PR comments * fix error * checkReorgAndExecuteReset can't be call with lastEthBlockSynced=nil * add parentHash to error * fix error * merge 3553 fix unittest * fix unittest * fix wrong merge * adapt parallel reorg detection to flow * fix unit tests * fix log * allow use sync parallel mode --------- Co-authored-by: Alonso <ARR551@protonmail.com>
* wip * run on background L1block checker * fix lint and documentation * fix conflict * add unittest * more unittest * fix lint * increase timeout for async unittest * fix unittest * rename GetResponse for GetResult and fix uniitest * add a second gorutines for check the newest blocks * more unittest * add unittest and run also preCheck on launch * by default Precheck from FINALIZED and SAFE * fix unittest, apply PR comments * changes suggested by ARR552 in integration method * fix documentation * import new network-l1-mock from PR#3553 * import new network-l1-mock from PR#3553 * import new network-l1-mock from PR#3553 * import new network-l1-mock from PR#3553 * fix unittest * fix PR comments * fix error * checkReorgAndExecuteReset can't be call with lastEthBlockSynced=nil * add parentHash to error * fix error * merge 3553 fix unittest * fix unittest * fix wrong merge * adapt parallel reorg detection to flow * fix unit tests * fix log * allow use sync parallel mode --------- Co-authored-by: Alonso <ARR551@protonmail.com>
* check GER and index of synced L1InfoRoot matches with sc values (0xPolygonHermez#3551) * apply txIndex fix to StoreTransactions; add migration to fix wrong txIndexes (0xPolygonHermez#3556) * Feature/0xPolygonHermez#3549 reorgs improvement (0xPolygonHermez#3553) * New reorg function * mocks * linter * Synchronizer tests * new elderberry smc docker image * new image * logs * fix json rpc * fix * Test sync from empty block * Regular reorg case tested * linter * remove empty block + fix LatestSyncedBlockEmpty * Improve check reorgs when no block is received during the call * fix RPC error code for eth_estimateGas and eth_call for reverted tx and no return value; fix e2e test; * fix test * Extra unit test * fix reorg until genesis * disable parallel synchronization --------- Co-authored-by: tclemos <thiago@polygon.technology> * Fix adding tx that matches with tx that is being processed (0xPolygonHermez#3559) * fix adding tx that matches (same addr and nonce) tx that is being processing * fix generate mocks * fix updateCurrentNonceBalance * synchronizer: check l1blocks (0xPolygonHermez#3546) * wip * run on background L1block checker * fix lint and documentation * fix conflict * add unittest * more unittest * fix lint * increase timeout for async unittest * fix unittest * rename GetResponse for GetResult and fix uniitest * add a second gorutines for check the newest blocks * more unittest * add unittest and run also preCheck on launch * by default Precheck from FINALIZED and SAFE * fix unittest, apply PR comments * changes suggested by ARR552 in integration method * fix documentation * import new network-l1-mock from PR#3553 * import new network-l1-mock from PR#3553 * import new network-l1-mock from PR#3553 * import new network-l1-mock from PR#3553 * fix unittest * fix PR comments * fix error * checkReorgAndExecuteReset can't be call with lastEthBlockSynced=nil * add parentHash to error * fix error * merge 3553 fix unittest * fix unittest * fix wrong merge * adapt parallel reorg detection to flow * fix unit tests * fix log * allow use sync parallel mode --------- Co-authored-by: Alonso <ARR551@protonmail.com> * Fix + remove empty blocks (0xPolygonHermez#3564) * Fix + remove empty blocks * unit test * linter * Fix/0xPolygonHermez#3565 reorg (0xPolygonHermez#3566) * fix + logs * fix loop * Revert "fix + logs" This reverts commit 39ced69. * fix L1InfoRoot when an error happens during the process of the L1 information (0xPolygonHermez#3576) * fix * Comments + mock * avoid error from some L1providers when fromBlock is higher than toBlock * Revert some changes * comments * add L2BlockModulus to L1check * doc * fix dbTx = nil * fix unit tests * added logs to analyze blocking issue when storing L2 block * add debug logs for datastreamer * fix 0xPolygonHermez#3581 synchronizer panic synchronizing from trusted node (0xPolygonHermez#3582) * synchronized: 0xPolygonHermez#3583 stop sync from l2 after no closed batch (0xPolygonHermez#3584) * stop processing trusted Node after first open batch * Update datastream lib to the latest version with additional debug info * update dslib client interface * Update the diff * Fix non-e2e tests * Update the docker image for the mock L1 network * Update the diff * Fix typo in the comment * Use the Geth v1.13.11 Docker image and update the genesis spec * Update the diff --------- Co-authored-by: agnusmor <100322135+agnusmor@users.noreply.github.com> Co-authored-by: Thiago Coimbra Lemos <tclemos@users.noreply.github.com> Co-authored-by: Alonso Rodriguez <ARR552@users.noreply.github.com> Co-authored-by: tclemos <thiago@polygon.technology> Co-authored-by: Joan Esteban <129153821+joanestebanr@users.noreply.github.com> Co-authored-by: Alonso <ARR551@protonmail.com> Co-authored-by: agnusmor <agnusmor@gmail.com> Co-authored-by: dPunisher <dpunish3r@users.noreply.github.com>
Closes #3540 #3561
What does this PR do?
Reviewers
Main reviewers:
Codeowner reviewers: