Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

op-geth receipt-hashing bug #144

Closed
Tracked by #7452
trianglesphere opened this issue Sep 28, 2023 · 0 comments · Fixed by #152
Closed
Tracked by #7452

op-geth receipt-hashing bug #144

trianglesphere opened this issue Sep 28, 2023 · 0 comments · Fixed by #152

Comments

@trianglesphere
Copy link
Contributor

Purpose

This design doc documents a non-severe post-launch op-geth bug, and proposes how to mitigate it.

Summary

The deposit-nonce of the deposit-receipt is not included during receipts-root construction, making the data non-verifiable after state-sync, but otherwise correct during the regular block-by-block sync.

Problem Statement + Context

Pre-Regolith we faced the UX issue of serving the tx-nonce of deposits, without including the nonce in the tx-data, like regular txs. Deposit transactions originate from L1, and thus do not include a nonce.

With Regolith, we introduced the nonce field as an optional extension to a deposit-tx receipt. After execution, the pre-state nonce is registered in the receipt, so it can be served in the RPC at a later time.

The bug is that the deposit-nonce value is encoded/decoded correctly, but not correctly copied over in an alternative struct that is used during hashing (when constructing the receipts-root of the block-header): the deposit receipts root thus does not commit to the nonce value, and hashed as if it’s a regular receipt.

See code here:

data := &receiptRLP{r.statusEncoding(), r.CumulativeGasUsed, r.Bloom, r.Logs}

This bug was identified by re-implementing the same code in op-reth. Credits to Roberto (Base), Ben Clabby and Andreas for finding this bug!

In the regolith design phase, this optional out-of-protocol DB-extension was considered, but then was decided against since we need to commit over the data if we state-sync: without it, we cannot trust the deposit-nonce values a peer serves us. When reproducing the data by computing the blocks (a.k.a. regular sync through the op-node) the deposit-nonce is computed too, and can thus be trusted.

If we fix it with a hardfork, we end up with 3 possible states:

  • Pre-regolith (Base/OP-Goerli only): no deposit nonce data.
  • Post-regolith, pre-bugfix: deposit nonce in DB, broken sync.
  • Post-bugfix: deposit nonce in DB, and syncable.

Effects

Note that the deposit-nonce of the receipt data is not used anywhere in the protocol, it’s exclusively there for serving in the RPC. And non-snap/state-sync full-nodes will have verified the data. The severity of this bug is thus low; no funds are at risk.

The contract-address in the RPC receipt responses is based on the sender and the nonce data; the contract-address may thus also be untrusted, if state-synced from untrusted nonce.

Spec

This bug does contradict the specs that call it a “consensus-enforced RLP field”): https://github.com/ethereum-optimism/optimism/blob/develop/specs/deposits.md#deposit-receipt

The chains cannot be reverted at this point however, and the bug is thus canonical behavior, and the specs will need an amendment.

Alternatives Considered

  • Ignoring it: the issue can be ignored short-term, as state-sync is not used by default.
  • Accepting it: we can adjust sync to force the deposit-nonce to be zero (since it cannot be verified), or trust the value if it’s within a reasonable range (we must not allow arbitrary high nonce values).
  • Correcting it: with a hardfork, starting at a specific block, all future deposits can have a receipt with corrected hashing. This makes the receipt-hashing specific to the block-time. (unless we extend the encoding, see proposed solution)
    • Long-term we can add some type of hash to the code that helps verify all pre-bugfix deposit-nonces, to ensure correctness after sync from an untrusted peer.
  • Replacing it: with a hardfork, we can introduce a new deposit-tx type, which fixes this issue, and can fix other deposit-tx issues:
    • We could remove the deposit system-tx boolean
    • We could introduce a form of account-override, a deposit that forcefully deploys code to a specific address: this would allow future hardforks to insert predeploys at specific addresses without being limited to the predeploy proxy range. E.g. multicall3.
    • Drawback: additional tx type, more geth diff.

Proposed Solution

With low-severity, it is not worth doing an emergency hardfork, and we can start with ignoring it.

To be secure against any potential sync issues (unexpectedly large deposit receipt nonce values) we should adapt the sync to handle this, as proposed in the “accepting” case.

With the next state-transition changing hardfork we can either correct it, or replace the deposit-tx type with a new better type.

Correcting it may be preferable, since the deposit-tx issues are minor, and replacing it adds additional diff.

To make correction easy, at the time of the hardfork we can adapt the encoding:

  • the regular encoding is: type_byte ++ RLP([PostStateOrStatus, CumulativeGasUsed, Bloom, Logs])
  • The deposit regular encoding is: type_byte ++ RLP([PostStateOrStatus, CumulativeGasUsed, Bloom, Logs, DepositNonce]) (i.e. we added an optional RLP field)
  • The post-bugfix encoding can be: type_byte ++ RLP([PostStateOrStatus, CumulativeGasUsed, Bloom, Logs, DepositNonce, DepositReceiptVersion]) (i.e. we add another optional RLP field, to distinguish old and new, with DepositReceiptVersion = 1)

By extending the encoding this way (or some approach like it), we can differentiate the old and new receipts in the DB, which informs whether or not to include the deposit-nonce in hashing, which avoids the need for hydrated-receipts or block-number/time information during hashing.

Risks & Uncertainties

If we proceed with the DepositReceiptVersion fix then we should add it to the RPC response, so users can also identify the different way of hashing, or otherwise they may produce the wrong receipts-root.

This is a non-severe protocol bug, but should be communicated with partners, in case they are using snap-sync with untrusted nodes, to understand the risk of wrong deposit-nonces.

Additionally, we need to verify if the unverified deposit-nonce is a potential snap-sync DoS vector: we do not want a million-digit nonce value to pass through without verification.

We also need to update op-reth and op-erigon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant