You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This design doc documents a non-severe post-launch op-geth bug, and proposes how to mitigate it.
Summary
The deposit-nonce of the deposit-receipt is not included during receipts-root construction, making the data non-verifiable after state-sync, but otherwise correct during the regular block-by-block sync.
Problem Statement + Context
Pre-Regolith we faced the UX issue of serving the tx-nonce of deposits, without including the nonce in the tx-data, like regular txs. Deposit transactions originate from L1, and thus do not include a nonce.
With Regolith, we introduced the nonce field as an optional extension to a deposit-tx receipt. After execution, the pre-state nonce is registered in the receipt, so it can be served in the RPC at a later time.
The bug is that the deposit-nonce value is encoded/decoded correctly, but not correctly copied over in an alternative struct that is used during hashing (when constructing the receipts-root of the block-header): the deposit receipts root thus does not commit to the nonce value, and hashed as if it’s a regular receipt.
This bug was identified by re-implementing the same code in op-reth. Credits to Roberto (Base), Ben Clabby and Andreas for finding this bug!
In the regolith design phase, this optional out-of-protocol DB-extension was considered, but then was decided against since we need to commit over the data if we state-sync: without it, we cannot trust the deposit-nonce values a peer serves us. When reproducing the data by computing the blocks (a.k.a. regular sync through the op-node) the deposit-nonce is computed too, and can thus be trusted.
If we fix it with a hardfork, we end up with 3 possible states:
Pre-regolith (Base/OP-Goerli only): no deposit nonce data.
Post-regolith, pre-bugfix: deposit nonce in DB, broken sync.
Post-bugfix: deposit nonce in DB, and syncable.
Effects
Note that the deposit-nonce of the receipt data is not used anywhere in the protocol, it’s exclusively there for serving in the RPC. And non-snap/state-sync full-nodes will have verified the data. The severity of this bug is thus low; no funds are at risk.
The contract-address in the RPC receipt responses is based on the sender and the nonce data; the contract-address may thus also be untrusted, if state-synced from untrusted nonce.
The chains cannot be reverted at this point however, and the bug is thus canonical behavior, and the specs will need an amendment.
Alternatives Considered
Ignoring it: the issue can be ignored short-term, as state-sync is not used by default.
Accepting it: we can adjust sync to force the deposit-nonce to be zero (since it cannot be verified), or trust the value if it’s within a reasonable range (we must not allow arbitrary high nonce values).
Correcting it: with a hardfork, starting at a specific block, all future deposits can have a receipt with corrected hashing. This makes the receipt-hashing specific to the block-time. (unless we extend the encoding, see proposed solution)
Long-term we can add some type of hash to the code that helps verify all pre-bugfix deposit-nonces, to ensure correctness after sync from an untrusted peer.
Replacing it: with a hardfork, we can introduce a new deposit-tx type, which fixes this issue, and can fix other deposit-tx issues:
We could remove the deposit system-tx boolean
We could introduce a form of account-override, a deposit that forcefully deploys code to a specific address: this would allow future hardforks to insert predeploys at specific addresses without being limited to the predeploy proxy range. E.g. multicall3.
Drawback: additional tx type, more geth diff.
Proposed Solution
With low-severity, it is not worth doing an emergency hardfork, and we can start with ignoring it.
To be secure against any potential sync issues (unexpectedly large deposit receipt nonce values) we should adapt the sync to handle this, as proposed in the “accepting” case.
With the next state-transition changing hardfork we can either correct it, or replace the deposit-tx type with a new better type.
Correcting it may be preferable, since the deposit-tx issues are minor, and replacing it adds additional diff.
To make correction easy, at the time of the hardfork we can adapt the encoding:
the regular encoding is: type_byte ++ RLP([PostStateOrStatus, CumulativeGasUsed, Bloom, Logs])
The deposit regular encoding is: type_byte ++ RLP([PostStateOrStatus, CumulativeGasUsed, Bloom, Logs, DepositNonce]) (i.e. we added an optional RLP field)
The post-bugfix encoding can be: type_byte ++ RLP([PostStateOrStatus, CumulativeGasUsed, Bloom, Logs, DepositNonce, DepositReceiptVersion]) (i.e. we add another optional RLP field, to distinguish old and new, with DepositReceiptVersion = 1)
By extending the encoding this way (or some approach like it), we can differentiate the old and new receipts in the DB, which informs whether or not to include the deposit-nonce in hashing, which avoids the need for hydrated-receipts or block-number/time information during hashing.
Risks & Uncertainties
If we proceed with the DepositReceiptVersion fix then we should add it to the RPC response, so users can also identify the different way of hashing, or otherwise they may produce the wrong receipts-root.
This is a non-severe protocol bug, but should be communicated with partners, in case they are using snap-sync with untrusted nodes, to understand the risk of wrong deposit-nonces.
Additionally, we need to verify if the unverified deposit-nonce is a potential snap-sync DoS vector: we do not want a million-digit nonce value to pass through without verification.
We also need to update op-reth and op-erigon.
The text was updated successfully, but these errors were encountered:
Purpose
This design doc documents a non-severe post-launch op-geth bug, and proposes how to mitigate it.
Summary
The deposit-nonce of the deposit-receipt is not included during receipts-root construction, making the data non-verifiable after state-sync, but otherwise correct during the regular block-by-block sync.
Problem Statement + Context
Pre-Regolith we faced the UX issue of serving the tx-nonce of deposits, without including the nonce in the tx-data, like regular txs. Deposit transactions originate from L1, and thus do not include a nonce.
With Regolith, we introduced the nonce field as an optional extension to a deposit-tx receipt. After execution, the pre-state nonce is registered in the receipt, so it can be served in the RPC at a later time.
The bug is that the deposit-nonce value is encoded/decoded correctly, but not correctly copied over in an alternative struct that is used during hashing (when constructing the receipts-root of the block-header): the deposit receipts root thus does not commit to the nonce value, and hashed as if it’s a regular receipt.
See code here:
op-geth/core/types/receipt.go
Line 457 in ee5b962
This bug was identified by re-implementing the same code in op-reth. Credits to Roberto (Base), Ben Clabby and Andreas for finding this bug!
In the regolith design phase, this optional out-of-protocol DB-extension was considered, but then was decided against since we need to commit over the data if we state-sync: without it, we cannot trust the deposit-nonce values a peer serves us. When reproducing the data by computing the blocks (a.k.a. regular sync through the op-node) the deposit-nonce is computed too, and can thus be trusted.
If we fix it with a hardfork, we end up with 3 possible states:
Effects
Note that the deposit-nonce of the receipt data is not used anywhere in the protocol, it’s exclusively there for serving in the RPC. And non-snap/state-sync full-nodes will have verified the data. The severity of this bug is thus low; no funds are at risk.
The contract-address in the RPC receipt responses is based on the sender and the nonce data; the contract-address may thus also be untrusted, if state-synced from untrusted nonce.
Spec
This bug does contradict the specs that call it a “consensus-enforced RLP field”): https://github.com/ethereum-optimism/optimism/blob/develop/specs/deposits.md#deposit-receipt
The chains cannot be reverted at this point however, and the bug is thus canonical behavior, and the specs will need an amendment.
Alternatives Considered
Proposed Solution
With low-severity, it is not worth doing an emergency hardfork, and we can start with ignoring it.
To be secure against any potential sync issues (unexpectedly large deposit receipt nonce values) we should adapt the sync to handle this, as proposed in the “accepting” case.
With the next state-transition changing hardfork we can either correct it, or replace the deposit-tx type with a new better type.
Correcting it may be preferable, since the deposit-tx issues are minor, and replacing it adds additional diff.
To make correction easy, at the time of the hardfork we can adapt the encoding:
type_byte ++ RLP([PostStateOrStatus, CumulativeGasUsed, Bloom, Logs])
type_byte ++ RLP([PostStateOrStatus, CumulativeGasUsed, Bloom, Logs, DepositNonce])
(i.e. we added an optional RLP field)type_byte ++ RLP([PostStateOrStatus, CumulativeGasUsed, Bloom, Logs, DepositNonce, DepositReceiptVersion])
(i.e. we add another optional RLP field, to distinguish old and new, withDepositReceiptVersion = 1
)By extending the encoding this way (or some approach like it), we can differentiate the old and new receipts in the DB, which informs whether or not to include the deposit-nonce in hashing, which avoids the need for hydrated-receipts or block-number/time information during hashing.
Risks & Uncertainties
If we proceed with the
DepositReceiptVersion
fix then we should add it to the RPC response, so users can also identify the different way of hashing, or otherwise they may produce the wrong receipts-root.This is a non-severe protocol bug, but should be communicated with partners, in case they are using snap-sync with untrusted nodes, to understand the risk of wrong deposit-nonces.
Additionally, we need to verify if the unverified deposit-nonce is a potential snap-sync DoS vector: we do not want a million-digit nonce value to pass through without verification.
We also need to update op-reth and op-erigon.
The text was updated successfully, but these errors were encountered: