-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERC: State Snapshot JSON format #227
Comments
Thanks for pushing this forward, @cdetrio. Since a state snapshot can have many different uses, I think the best strategy is to have a format that is as extensible as possible, while permitting small, simple snapshots as well. I would recommend making An example where optional fields would shine is in an RPC test suite (;)), where some RPC features might involve uncles, while others might only refer to the current state. Defining the blockchain state for such tests would ideally be as concise as possible, for readability, and reducing side effects. |
For background, pyethereum's JSON state snapshot feature was documented here, and Parity's was mentioned in the v1.5.0 release notes. |
I want to suggest another aspect: memory friendly. Snapshot export for pyethapp constructs the whole json in memory, which comsumes more than serveral GBs memory if there're millions of accounts in state. This problem could be easily avoided if the snapshot format supports efficient update. |
While the JSON snapshots are useful for performing analysis of the state and perhaps hand-editing tests, JSON is definitely not a compact enough representation for something as large as Ethereum's state. In terms of use-cases where file size and avoiding data redundancy is desired, neither of the JSON formats described here is really suitable. Instead, something like Parity's RLP state snapshots should be used. Although Parity supports JSON export, the state snapshot format for Parity Warp Sync (also implemented in EthereumJ) is fully documented at https://github.com/paritytech/parity/wiki/Warp-Sync-Snapshot-Format, never re-includes duplicate code, and is designed to allow out-of-order restoration of chunks. It uses RLP, which is already a core part of most Ethereum clients and is the de facto standard for serialization within this space.
With the JSON state export (it is called state export), sure, but RLP snapshots can include header, transaction, uncle, and receipt data for a configurable amount recent blocks (albeit also in compressed form). Reading and restoring from this data should be no more difficult than doing so with JSON, provided appropriate libraries. Doing analysis on this data shouldn't be much worse, either. |
That said, it is important to note that both proposed formats here store trie key pre-images w.r.t. accounts and contract storage, which is definitely more human readable, although it does impose a much storage overhead on the node to keep track of those pre-images in order to create snapshots later.
Doubly-so here: having snapshots cover intermediate states requires nodes to actually store intermediate states, and there already isn't much reason to. With EIP98 implemented, intermediate state snapshots would be much more annoying to verify than currently. |
How about a combination of the two? Metadata in a json file and accounts data in another RLP file. |
There has been no activity on this issue for two months. It will be closed in a week if no further activity occurs. If you would like to move this EIP forward, please respond to any outstanding feedback or add a comment indicating that you have addressed all required feedback and are ready for a review. |
This issue was closed due to inactivity. If you are still pursuing it, feel free to reopen it and respond to any feedback or request a review in a comment. |
This is a pre-draft ERC (Ethereum Request for Comments) to gather feedback on standardizing the format for JSON state snapshots. Currently, two clients (pyethereum and Parity) support exporting JSON state snapshots in two different formats. For reference, an example of each is included below.
In comparing the two formats to a single standard, there are major and minor aspects to consider. The major ones:
Should a state snapshot include any header data of previous blocks?
Pyethereum includes some header data from previous blocks (
prev_headers
), as well some header data of recent uncles (recent_uncles
). Parity includes no header data.Should a state snapshot include any block header data or metadata?
Pyethereum includes the following data:
block_number
,block_coinbase
,refunds
,timestamp
,gas_used
,txindex
,block_difficulty
,bloom
,gas_limit
. Parity includes none.Should state snapshots cover intermediate states?
Pyethereum uses tx_index to indicate that the snapshot is of an intermediate state.
On the third point, I'll suggest that intermediate snapshots be JSON Patch extensions of the state snapshot.
Minor aspects to consider include encodings (hex or decimal), hash fields (
code_hash
andstorage_root
, which can be derived by hashing thecode
andstorage
, respectively), and null/empty fields.Here is an excerpt of a Pyethereum state snapshot, at block #100000 (full snapshot):
Here is an excerpt of a Parity state snapshot:
The text was updated successfully, but these errors were encountered: