-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add an overview of the creation and querying of snapshots #5270
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this,
after looking at the graphs I have a better understanding now.
[`Snapshotter`](../../crates/snapshot/src/snapshotter.rs#L20): A `reth` background service that **copies** data from the database to new snapshot files when the block height reaches a certain threshold (e.g., `500_000th`). Upon completion, it dispatches a notification about the higher snapshotted block to `HighestSnapshotTracker` channel. **It DOES NOT remove data from the database.** | ||
|
||
[`HighestSnapshotTracker`](../../crates/snapshot/src/snapshotter.rs#L22): A channel utilized by `Snapshotter` to announce the newest snapshot block to all components with a listener: `Pruner` (to know which additional tables can be pruned) and `DatabaseProvider` (to know which data can be queried from the snapshots). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so the HighestSnapshotTracker
channel acts both as a notification about changed snapshot tracker (for pruner) and to track the highest snapshot (for database)
I("BLOCK_HEIGHT % 500_000 == 0")--triggers-->SP(Snapshotter) | ||
SP --> |triggers| SH["create_snapshot(block_range, SnapshotSegment::Headers)"] | ||
SP --> |triggers| ST["create_snapshot(block_range, SnapshotSegment::Transactions)"] | ||
SP --> |triggers| SR["create_snapshot(block_range, SnapshotSegment::Receipts)"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should emphasize that this is a list of "segments", perhaps with an additional ...
or a label
f18d82c
to
ec32c36
Compare
ec32c36
to
d4013d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, pending more reviews
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, only nits, really like these diagrams!
crates/snapshot/README.md
Outdated
HN --> |true| NJC(NippyJar::Compression) | ||
NJC --store--> NJ | ||
HN --> |false| NJ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it should probably point from NJC
to HN
in case it's true
, so it'll be a while HasNext
block. And on false
, we can specify that we finished.
PF--shares-->SP1("Arc(SnapshotProvider)") | ||
SP1--shares-->PD(DatabaseProvider) | ||
PF--creates-->PD | ||
PD--check `HighestSnapshotTracker`-->PD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd rather have a separate block for tracker, so it's visible that it's a separate entity defined in the previous diagram
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it becomes ugly in the diagram, i dont know how to force it to the side lol
crates/snapshot/README.md
Outdated
|
||
[`NippyJarCursor`](../../crates/storage/nippy-jar/src/cursor.rs#L12) Accessor of data in a `NippyJar` file. It enables queries either by row number (e.g., block number 1) or by a predefined key not part of the file (e.g., transaction hashes). If a file has multiple columns (e.g., `Tx | TxSender | Signature`), and one wishes to access only one of the column values, this can be accomplished by bitmasks. (e.g., for `TxSender`, the mask would be `0b010`). | ||
|
||
[`NippyJar`](../../crates/storage/nippy-jar/src/lib.rs#57) A create-only file format. No data can be appended after creation. It supports multiple columns, compression (e.g., Zstd (with and without dictionaries), lz4, uncompressed) and inclusion filters (e.g., cuckoo filter: `is hash X part of this dataset`). Snapshots are organized by block ranges. (e.g., `TransactionSnapshot_500_000.jar` contains a transaction per row for all transactions until block `500_000`). For more check the struct documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it's clear that we're talking about a range, not 0..=N
[`NippyJar`](../../crates/storage/nippy-jar/src/lib.rs#57) A create-only file format. No data can be appended after creation. It supports multiple columns, compression (e.g., Zstd (with and without dictionaries), lz4, uncompressed) and inclusion filters (e.g., cuckoo filter: `is hash X part of this dataset`). Snapshots are organized by block ranges. (e.g., `TransactionSnapshot_500_000.jar` contains a transaction per row for all transactions until block `500_000`). For more check the struct documentation. | |
[`NippyJar`](../../crates/storage/nippy-jar/src/lib.rs#57) A create-only file format. No data can be appended after creation. It supports multiple columns, compression (e.g., Zstd (with and without dictionaries), lz4, uncompressed) and inclusion filters (e.g., cuckoo filter: `is hash X part of this dataset`). Snapshots are organized by block ranges. (e.g., `TransactionSnapshot_500_000.jar` contains a transaction per row for all transactions from block `0` to block `500_000`). For more check the struct documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh btw does it actually contain 0..=500_000
or 0..500_000
? 0
is genesis, but we snapshot it too iiuc?
d5dd799
to
951686a
Compare
Provides a current overview on how each component interacts with each other when creating a snapshot, or querying a bunch of them.
The glossary provides a brief summary of each component/struct alongside a link to the source code (which is documented), for a more detailed view.