-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding/decoding spec (rfc) #1
Comments
Looks good overall! Just a few thoughts:
|
@lightclient a few comments on data / file size now that I've done some measurements.
Overall I would mildly lean for variable block counts, as it leads to fewer and uniformly-sized files, but I'm ok going with either approach if there is a preference for uniform. |
@henridf it might be worth posting in the eth r&d discord for some opinions. Maybe even asking on ACD. It's one of those simple decisions that could have a relatively big impact, so we may as well get as many opinions on it as possible. I still don't think I have strong opinion on it.
Do you know what format geth stores them on-disk? Last fall I did a
Again, an important decision, but I don't think I lean strongly in either direction. I think I empathize the most with minimizing the size so that the constraints we impose on memory aren't too large. But memory gets cheaper by the day and blocks bigger by the year, so may be preferable to be more future-proof and err on the larger side. Would also be a good discord / ACD question. |
I agree this is an important decision deserving broader exposure. I'd like to make a little progress on the accumulator and verifying to see if that informs anything. (I'll be starting this next now that encoding is mostly implemented at https://github.com/henridf/eip44s-proto/tree/wip-v2).
Yep, the vast majority are in the freezer which uses snappy-compression for these tables (https://github.com/ethereum/go-ethereum/blob/997f1c4f0abcd78f645e6e7ced6db4b42ad59c9d/core/rawdb/schema.go#L133). From compressing a few history files with gzip, I saw a 2.5x-ish compression gain, while geth inspect shows 335GB as of recently. So the numbers line up pretty well.
|
Makes sense! Thanks |
Another argument (from @karalabe, relayed by @lightclient) for variable block counts, fixed file sizes: It avoids situations where a chain has many many empty blocks leading to needless additional files |
An update / correction on this: with the revamped layout, it would be feasible to write a custom ssz streaming decoder for the Of course this is some one-off work wrt e.g. using the fastssz-generated marshalling, but once the format is settled it would be worthwhile. Most importantly, we probably don't need to worry so much about erring on the size of small files. |
Outline for "EIP-4444 for pow blocks" (https://notes.ethereum.org/@ralexstokes/BJWd8saB9)
In summary, the proposal is to derive from the Beacon Chain's ExecutionPayload encoding to store blocks, and extend it with uncle headers and a new
ReceiptsPayload
encoding (defined below) to store receipts. A double-batched accumulator is used to provide one root per block archive file.Block archive file
A block archive file consists of a header and body that are individually concatenated and encoded into a file.
Note: initially, both the header fields and the block were in the same container. Splitting them out has two advantages:
File encoding and decoding
The encoding tool will take a stream of exported blocks/receipts (RLP-encoded), encode each into the SSZ representations described below, and periodically "flush" a batch into a new ssz-encoded archive file. A simple approach would be to flush every
N
blocks, but it may be more practical to flush based on the size of the accumulated data, in order to have a smaller number of files. In particular, if we set a constant number of blocks per file, we would have to set that constant that based on recent blocks that are "large" (in order to reach a target file size). That would result in lots of smaller files in the earlier part of the chain history, where blocks are much smaller. I will experiment with this and propose something along with concrete numbers.Double-batched accumulator
The encoding tool is also responsible for computing and outputting the hash tree roots along the way. There is one root per archive file, which can be used as proofs for an entire file, but also for granular proofs reaching all the way down into individual block values.
It does this via a double-batched accumulator similar to the beacon chain's
BeaconState.historical_roots
. The one difference is that each successive historical root does not necessarily accumulate the same (constant) number of block roots.Block
(This section derived from beacon chain's
ExecutionPayload
, though the resulting structs are different. Notably, uncles and receipts are present here).Custom types
Transaction
ByteList[MAX_BYTES_PER_TRANSACTION]
ExecutionAddress
Bytes20
Constants
MAX_BYTES_PER_TRANSACTION
uint64(2**30)
(= 1,073,741,824)MAX_TRANSACTIONS_PER_PAYLOAD
uint64(2**20)
(= 1,048,576)MAX_UNCLES_PER_PAYLOAD
10
(??)BYTES_PER_LOGS_BLOOM
uint64(2**8)
(= 256)MAX_EXTRA_DATA_BYTES
2**5
(= 32)Receipts payload
Constants
MAX_TOPICS_PER_LOG
uint8(2**2)
(= 4)MAX_LOGS_PER_PAYLOAD
uint64(2**20)
(= 1,048,576)BYTES_PER_LOGS_BLOOM
uint64(2**8)
(= 256)MAX_LOG_DATA_BYTES
uint64(2**22)
(= 4,194,304)The text was updated successfully, but these errors were encountered: