Skip to content

Erigon2 prototype

ledgerwatch edited this page Jan 16, 2022 · 13 revisions

How to run

After checking in the code from the desired branch, run the following commands:

make state
./build/bin/state erigon2 --datadir <your_datadir>

The directory referenced by the --datadir option needs to contain block headers, block bodies, and recovered senders downloaded and computed by a recent version of Erigon. All stages are not required, only first four: Headers, BlockHashes, Bodies, Senders.

The prototype starts replaying blocks and their transactions starting from genesis, and then block 1, block 2, and so on. Every 1000 blocks it prints the progress, like so:

INFO[01-16|20:12:15.536] Processed                                blocks=133000
INFO[01-16|20:12:15.848] Processed                                blocks=134000
INFO[01-16|20:12:16.162] Processed                                blocks=135000
INFO[01-16|20:12:16.914] Processed                                blocks=136000
INFO[01-16|20:12:17.233] Processed                                blocks=137000
INFO[01-16|20:12:17.561] Processed                                blocks=138000

It is possible to interrupt the prototype by pressing Ctrl-C on the console, or sending SIGTERM (-15) signal to a process on Unix. When this interruption happens, informations similar to the following is printed:

INFO[01-16|20:28:21.169] Processed                                blocks=956000
^CINFO[01-16|20:28:21.926] Got interrupt, shutting down...
INFO[01-16|20:28:21.926] Got interrupt, shutting down...
INFO[01-16|20:28:21.926] interrupted, please wait for cleanup, next time start with --block 956723

This information helps resume the prototype from the point it was interrupted, instead of starting from the beginning, like so (in our example):

./build/bin/state erigon2 --datadir <your_datadir> --block 956723

The replaying will resume from where it was interrupted, like so:

./build/bin/state erigon2 --datadir ~/mainnet --block 956723
INFO[01-16|20:28:59.088] Processed                                blocks=957000
INFO[01-16|20:28:59.833] Processed                                blocks=958000
INFO[01-16|20:29:05.028] Processed                                blocks=959000
INFO[01-16|20:29:05.729] Processed                                blocks=960000

Which files it creates and where

The prototypes creates and modifies files in two directories:

<your_datadir>\aggregator
<your_datadir>\statedb

Files in the aggregator directory are of the following three types:

  1. Change file (extension .chg). These files are created for 4 groups of content: accounts, storage, code, and commitment, this can be recognised by the first part of their file names. Within each group of content, there are 3 possible "sequences": keys, before values, and after values. By default, only keys and after values are written. Each file corresponds to an interval of blocks up to 4096 blocks large, starting and ending blocks (exclusive) are part of the file names. Change files can be though of "Write Ahead Log" (WAL) files that contain the recent history of changes in the state. The combination of keys and after values is then used to aggregate the changes into data files described next. After aggregation, change files are removed. When enabled, the plan for before files is to be used for unwinding the state, as well as for creating the change history, though none of these two features are implemented yet.
  2. Data files (extension .dat). These files, like change files, are created for 4 groups of content: accounts, storage, code, and commitment. They also correspond to an interval of blocks, but unlike change files, the interval of blocks can be larger than 4096 blocks. In fact, currently, it can be 8192, 16384, and so on blocks. Initially, data files for 4096 block intervals are created by aggregating the content of the corresponding change files. Then, individual data files can be merged with one another to form larger and larger data files. The plan for data files is to be seeded via Content Delivery Networks.
  3. Index files (extension .idx). Every index file corresponds to its data file, and has almost the same file name, apart from the extension. Index files contain the serialised representation of the minimal perfect hash table, offset table (with each key or value in the data file having and offset in that table), and optionally a compressed mapping for accessing data file as an array of items. Index files are not going to be shared via Content Delivery Network, but instead created locally. For prevention potential DOS vulnerabilities, there is a plan for each Erigon2 node to build ideal hash table part from a randomly generated seed, so that no specific seed can be potentially exploited.

Files in statedb directory contain MDBX database that is used for storing recent state (recent 90k + 4096 blocks). The recent state is getting pruned on each aggregation, and that keep the State DB size quite low.