Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add an overview of the creation and querying of snapshots #5270

Merged
merged 3 commits into from
Nov 2, 2023

Conversation

joshieDo
Copy link
Collaborator

@joshieDo joshieDo commented Nov 1, 2023

Provides a current overview on how each component interacts with each other when creating a snapshot, or querying a bunch of them.

The glossary provides a brief summary of each component/struct alongside a link to the source code (which is documented), for a more detailed view.

@joshieDo joshieDo added C-docs An addition or correction to our documentation A-static-files Related to static files labels Nov 1, 2023
@joshieDo joshieDo requested a review from mattsse November 1, 2023 20:18
Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this,

after looking at the graphs I have a better understanding now.

Comment on lines +71 to +73
[`Snapshotter`](../../crates/snapshot/src/snapshotter.rs#L20): A `reth` background service that **copies** data from the database to new snapshot files when the block height reaches a certain threshold (e.g., `500_000th`). Upon completion, it dispatches a notification about the higher snapshotted block to `HighestSnapshotTracker` channel. **It DOES NOT remove data from the database.**

[`HighestSnapshotTracker`](../../crates/snapshot/src/snapshotter.rs#L22): A channel utilized by `Snapshotter` to announce the newest snapshot block to all components with a listener: `Pruner` (to know which additional tables can be pruned) and `DatabaseProvider` (to know which data can be queried from the snapshots).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so the HighestSnapshotTracker channel acts both as a notification about changed snapshot tracker (for pruner) and to track the highest snapshot (for database)

Comment on lines +16 to +19
I("BLOCK_HEIGHT % 500_000 == 0")--triggers-->SP(Snapshotter)
SP --> |triggers| SH["create_snapshot(block_range, SnapshotSegment::Headers)"]
SP --> |triggers| ST["create_snapshot(block_range, SnapshotSegment::Transactions)"]
SP --> |triggers| SR["create_snapshot(block_range, SnapshotSegment::Receipts)"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should emphasize that this is a list of "segments", perhaps with an additional ... or a label

Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, pending more reviews

@shekhirin shekhirin self-requested a review November 2, 2023 15:47
Copy link
Collaborator

@shekhirin shekhirin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only nits, really like these diagrams!

Comment on lines 27 to 29
HN --> |true| NJC(NippyJar::Compression)
NJC --store--> NJ
HN --> |false| NJ
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it should probably point from NJC to HN in case it's true, so it'll be a while HasNext block. And on false, we can specify that we finished.

PF--shares-->SP1("Arc(SnapshotProvider)")
SP1--shares-->PD(DatabaseProvider)
PF--creates-->PD
PD--check `HighestSnapshotTracker`-->PD
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd rather have a separate block for tracker, so it's visible that it's a separate entity defined in the previous diagram

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it becomes ugly in the diagram, i dont know how to force it to the side lol


[`NippyJarCursor`](../../crates/storage/nippy-jar/src/cursor.rs#L12) Accessor of data in a `NippyJar` file. It enables queries either by row number (e.g., block number 1) or by a predefined key not part of the file (e.g., transaction hashes). If a file has multiple columns (e.g., `Tx | TxSender | Signature`), and one wishes to access only one of the column values, this can be accomplished by bitmasks. (e.g., for `TxSender`, the mask would be `0b010`).

[`NippyJar`](../../crates/storage/nippy-jar/src/lib.rs#57) A create-only file format. No data can be appended after creation. It supports multiple columns, compression (e.g., Zstd (with and without dictionaries), lz4, uncompressed) and inclusion filters (e.g., cuckoo filter: `is hash X part of this dataset`). Snapshots are organized by block ranges. (e.g., `TransactionSnapshot_500_000.jar` contains a transaction per row for all transactions until block `500_000`). For more check the struct documentation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it's clear that we're talking about a range, not 0..=N

Suggested change
[`NippyJar`](../../crates/storage/nippy-jar/src/lib.rs#57) A create-only file format. No data can be appended after creation. It supports multiple columns, compression (e.g., Zstd (with and without dictionaries), lz4, uncompressed) and inclusion filters (e.g., cuckoo filter: `is hash X part of this dataset`). Snapshots are organized by block ranges. (e.g., `TransactionSnapshot_500_000.jar` contains a transaction per row for all transactions until block `500_000`). For more check the struct documentation.
[`NippyJar`](../../crates/storage/nippy-jar/src/lib.rs#57) A create-only file format. No data can be appended after creation. It supports multiple columns, compression (e.g., Zstd (with and without dictionaries), lz4, uncompressed) and inclusion filters (e.g., cuckoo filter: `is hash X part of this dataset`). Snapshots are organized by block ranges. (e.g., `TransactionSnapshot_500_000.jar` contains a transaction per row for all transactions from block `0` to block `500_000`). For more check the struct documentation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh btw does it actually contain 0..=500_000 or 0..500_000? 0 is genesis, but we snapshot it too iiuc?

@joshieDo joshieDo added this pull request to the merge queue Nov 2, 2023
Merged via the queue into main with commit 9a56e4b Nov 2, 2023
22 checks passed
@joshieDo joshieDo deleted the joshie/doc-snapshot branch November 2, 2023 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-static-files Related to static files C-docs An addition or correction to our documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants