Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polygon misses "statesync" transaction if heimdall is unreachable #4

Closed
sduchesneau opened this issue Mar 14, 2023 · 4 comments
Closed
Assignees

Comments

@sduchesneau
Copy link

When Heimdall is unreachable for a while, the "SprintEnd" blocks that should contain system transactions are produced, with the correct block hash, but they do not contain the StateSync transactions.

Example log sequence for block 40335152 that should contain system transactions:

########## BAD BLOCK #########
Chain config: {ChainID: 137 ..... bor}

Number: 40335152
Hash: 0xf13b363616ae2ea40b2c9173a0e7393c2b67dd5d000d063542bd547d4e06a378
...
Error: invalid merkle root (remote: 5132c05119a8afa2e7c47c1d96a8c613777c30979a8624a50be5fe0c5ddf8f57 local: 3679c655048318d563c817eee56664e4b2af42d0fc7dcef395e570fb9363299c)
#############################

INFO [03-14|12:14:34.954] Fetching state updates from Heimdall     fromID=2,612,092 to=2023-03-14T12:10:44Z
INFO [03-14|12:14:34.954] Fetching state sync events               queryParams="from-id=2612092&to-time=1678795844&limit=50"
INFO [03-14|12:14:34.956] StateSyncData                            gas=0         number=40,335,152 lastStateID=2,612,091 total records=0  fetch time=2       process time=0

^ these messages get repeated during the period where heimdall is unavailable.

The next "BAD BLOCK" message is preceded by INFO [03-14|12:14:34.991] Skip duplicated bad block number=40,335,152 hash=f13b36..06a378

Then eventually we get:

INFO [03-14|12:18:09.505] Fetching state updates from Heimdall     fromID=2,612,092 to=2023-03-14T12:10:44Z
INFO [03-14|12:18:09.505] Fetching state sync events               queryParams="from-id=2612092&to-time=1678795844&limit=50"
INFO [03-14|12:18:09.515] → committing new state                   eventRecord="id 2612092, contract 0xD9c7C4ED4B66858301D0cb28Cc88bf655Fe34861, data: 0x000000000000000000000000519e9d6846e0bdbb715d4bd8d2261863740bce5d0000000000000000000000007d1afa7b718fb893db30a3abc0cfc608aacfebb000000000000000000000000000000000000000000000000013e62837b17b62000000000000000000000000000000000000000000000000000000000019e1fa02, txHash: 0x913c29409976126a62df552986fc1a0c2baf039e78b091628865daee1b3df7df, logIndex: 156, chainId: 137, time 2023-03-14T12:10:39Z, gasUsed 74888"
INFO [03-14|12:18:09.517] → committing new state                   eventRecord="id 2612093, contract 0xA6FA4fB5f76172d178d61B04b0ecd319C5d1C0aa, data: 0x87a7811f4bfedea3d341ad165680ae306b01aaeacc205d227629cf157dd9f821000000000000000000000000000000000000000000000000000000000000004000000000000000000000000000000000000000000000000000000000000000a0000000000000000000000000072f16ecddd7ebe1c166c3141b4e75484e65a401000000000000000000000000eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee0000000000000000000000000000000000000000000000000000000000000060000000000000000000000000000000000000000000000000000000000000002000000000000000000000000000000000000000000000000000e35fa931a00000, txHash: 0x157e21ca2dded88b93fe1da9efd2a1e1ede4204aa4c697b967943fbe9e7b49a6, logIndex: 418, chainId: 137, time 2023-03-14T12:10:39Z, gasUsed 61470"
INFO [03-14|12:18:09.520] → committing new state                   eventRecord="id 2612094, contract 0xA6FA4fB5f76172d178d61B04b0ecd319C5d1C0aa, data: 0x87a7811f4bfedea3d341ad165680ae306b01aaeacc205d227629cf157dd9f821000000000000000000000000000000000000000000000000000000000000004000000000000000000000000000000000000000000000000000000000000000a0000000000000000000000000f31f2d41b7639ef06c7638a8046f95edb92580910000000000000000000000007fc66500c84a76ad7e9c93437bfc5ac33e2ddae900000000000000000000000000000000000000000000000000000000000000600000000000000000000000000000000000000000000000000000000000000020000000000000000000000000000000000000000000000000033cb036c5391c00, txHash: 0x5d0dcdec6968213bf7add2d47369e3d4360ff7c1aea33d105dc19554a11886f1, logIndex: 97, chainId: 137, time 2023-03-14T12:10:39Z, gasUsed 57715"
INFO [03-14|12:18:09.523] → committing new state                   eventRecord="id 2612095, contract 0xA6FA4fB5f76172d178d61B04b0ecd319C5d1C0aa, data: 0x87a7811f4bfedea3d341ad165680ae306b01aaeacc205d227629cf157dd9f821000000000000000000000000000000000000000000000000000000000000004000000000000000000000000000000000000000000000000000000000000000a0000000000000000000000000dd81387444364a9cc93156da7b686e4171b187e8000000000000000000000000a0b86991c6218b36c1d19d4a2e9eb0ce3606eb48000000000000000000000000000000000000000000000000000000000000006000000000000000000000000000000000000000000000000000000000000000200000000000000000000000000000000000000000000000000000000199109f3e, txHash: 0x2f0939444385456c1f34da2f936973612234f5bf60ec54b0c4f26774fdab1b6a, logIndex: 157, chainId: 137, time 2023-03-14T12:10:39Z, gasUsed 57439"
INFO [03-14|12:18:09.523] StateSyncData                            gas=251,512   number=40,335,152 lastStateID=2,612,095 total records=4  fetch time=8       process time=9
INFO [03-14|12:18:12.307] Deep froze chain segment                 blocks=3   elapsed=4.606ms     number=40,245,154 hash=542859..04a222
@sduchesneau
Copy link
Author

The committing new state' message comes from func (*GenesisContractsClient) CommitState`

I believe the block (40335152) was produced into firehose, then this happened:

https://github.com/streamingfast/go-ethereum/blob/release/polygon-0.x-fh2/core/blockchain.go#L1834 -> fmt.Errorf("invalid merkle root (remote: %x local: %x)"...)

if the block was already "closed" at that moment, then we won't create the same block again in firehose.. right? investigating...

@sduchesneau
Copy link
Author

system transactions come from:

(core/state_processor) Process() (between finalizeblock and endblock) ->
  (bor) Finalize() ->
    (bor) CommitStates() ->
      (bor/GenesisContractsClient) CommitState() -> 
         (bor/statefull) ApplyMessage() -> 
              (firehose/context) StartTransactionRaw()

however, the validation of the block is done after that!
so the block would be ended completely, THEN it would be rejected. THEN it would be "produced again" ...
but since it has the same hash, the incomplete version of the block is kept.

@sduchesneau
Copy link
Author

After catching another similar event with the full FIRE logs activated, I see that the block 40343056 is emitted mulitple times WITHOUT the system transactions while the heimdall is not up-to-date, failing "after" the END_BLOCK statement.

When the heimdall is back in-sync, the block is emitted again, this time with the system transactions (between "FIRE FINALIZE_BLOCK 40343056" and "FIRE END_BLOCK 40343056"). the console reader, will only take the first block into account (mostly because it does not ovewritee the one-block-files on disk), so we end up with the invalid block.

Solution will be to call END_BLOCK only after the block is validated, and trigger a "failure" if it is invalid.

@sduchesneau
Copy link
Author

Fixed

maoueh pushed a commit that referenced this issue Apr 29, 2024
This commit squashes the op-geth fork history into a more maintainable
diff for rebasing upon upstream geth.

reference-optimistic-geth changes (origins of op-geth in early Bedrock
development stage):
- Deposit TX Type
- Enable deposit tx in EVM/tx pool
- Change deposit nonce to not be the max nonce
- Extend PayloadAttributesV1 with a Transactions field
- Force deposits at the start of each L2 block
- Fix height check
- noTxPool flag, reproduce block in verifier mode without tx pool interference
- Fix RPC json marshalling (ref op-geth PR 4)
- Deposit txs block height check in block body validation (ref op-geth PR 5)
- core: do not try to reinject deposit txs into tx-pool (ref-op-geth PR 6)
- deposit source hash field instead of L2 block height and tx index combination
- Include invalid deposits, rewind state, but always persist mint (#10)
- Provide gas to Call/Create in deposit transactions (#12)
- Add docker builds (ref-op-geth PR 16, 17)
- Don't panic on deposit transaction signature values or chain ID (ref-op-geth PR 18)
- core: Add version to DepositTx (ref-op-geth PR 19)
- Enable Geth build/lint/test in CircleCI (ref-op-geth PR 23)
- core: Include guaranteed gas in the gas pool (ref-op-geth PR 21)
- core: handle base fee, l1 availability fee, tx fee (ref-op-geth PR 27)
- fix: deposit tx hash
- fix l1 fee cache, rpc, tracing and tx pool
- core: remove deposit-tx sub-type (a.k.a. deposit version byte)
- eth/catalyst: allow engine user to reorg own chain
- miner: restore ability to reorg deep as block builder
- params: print Optimism consensus type in banner
- core/types: remove unused protected() method, see upstream PR 23376
- core: do not mutate original balance value in tx pool l1 cost adjustment
- core: subtract deposit gas from pool, so other txs do not use the same gas. And fail tx processing if deposits reach gas limit
- core/types: deposits do not tip, avoid basefee subtraction
- Unmeter the L1 Attributes Transaction
- miner: handle force tx errors as critical, clean up diff
- ci: Switch branch
- eth,miner: return STATUS_INVALID when failing to process forced transactions in request (ref-op-geth PR 40)
- verifier: forward tx to sequencer based on flag
- txpool: add flag to disable tx gossip (ref-op-geth PR 42)
- Add op-geth version in addition to geth version (ref-op-geth PR 43)
- ci: CircleCI improvements (ref-op-geth PR 44)
- Rename to op-geth
- Build latest tag on optimism branch

op-geth changes:
- Expose cache config in simulated backend (#2)
- Add EIP-1559 parameters
- eth/catalyst: update payload id computation (#1)
- make eip1559 configurable (#4)
- post-merge network should not log warnings about missing transition information (#5)
- Make the simulator more configurable (#6)
- fix OPB-6 - IsDepositTx check instead of artificial nonce value check (#7)
- Simulated backend - enable proof of stake consensus type and fix performance issue (#8)
- accounts: simulated backend consensus engine option and immediate tx indexing
- consensus/beacon: recognize all blocks as reached TTD with 0 TTD in chain config
- Add --rollup.historicalhttp CLI flag and fix backend iface
- Flags and interfaces for historical RPC requests (#12)
- Redirect historical RPC requests (ethereum#13)
- Use the pre-existing ethereum.NotFound error (ethereum#18)
- Add historical endpoint to TraceBlockByNumber and TraceBlockByHash (ethereum#19)
- Add historical endpoint to TraceTransaction (ethereum#20)
- Add historical endpoint to TraceCall (ethereum#21)
- optimism: fee params from info txi, update l1 cost func GPO params read (ethereum#15)
- add hardcoded addresses for fee payouts (ethereum#23)
- dynamic gas limit via engine API (ethereum#22)

Co-authored-by: Matthew Slipper <me@matthewslipper.com>
Co-authored-by: Joshua Gutow <jgutow@oplabs.co>
Co-authored-by: protolambda <proto@protolambda.com>
Co-authored-by: Mark Tyneway <mark.tyneway@gmail.com>
Co-authored-by: Maurelian <maurelian@protonmail.ch>
sduchesneau pushed a commit that referenced this issue Jun 14, 2024
Include EOF Machinery To Differentiate Stylus Programs from EVM Bytecode
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants