-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: CAR-backed Blockstore #3085
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, left some thoughts and nits.
} | ||
|
||
#[cfg(test)] | ||
mod tests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of coverage, we could add cases for empty and corrupted CAR files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it. I changed forest-cli snapshot validate
to use this code, and here are the numbers when validating a 36GiB snapshot:
CarDB: total: 44s, indexing: 29s, walking: 15s
ParityDB: total: 171s, loading: 147s, walking: 24s
Results:
- No wasted disk space. This blockstore doesn't require any disk space (as long as the CAR files are uncompressed).
- Significantly faster load times: 29s vs 147s.
- Significantly faster access times: 15s vs 24s to visit every node in the graph.
Future improvements:
- The
CarBackedBlockstore
takes a buffered reader. This might not be the best option, especially when we know exactly how much to read after seeking. - The CIDs are not validated. The
CarReader
validates that the CIDs are correct. We should do the same (but in parallel).
This blockstore is a huge improvement for forest-cli snapshot validate
. Let's use it immediately.
How feasible is it to support multiple |
The easiest way to handle that would be to concatenate the two CAR files. # This adds the manifest key-value pairs to the snapshot.car file
cat manifest.car >> snapshot.car |
From a code read, I'm not sure our current code handles, that, neither would this PR, but this could be a future enhancement. |
a045c87
to
07b5d43
Compare
/// [`Blockstore`] API calls may panic if this is not upheld. | ||
/// - `reader`'s buffer should have enough room for the [`CarHeader`] and any [`Cid`]s. | ||
#[tracing::instrument(level = "debug", skip_all)] | ||
pub fn new(mut reader: ReaderT) -> cid::Result<Self> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would wrapping reader
in a BufReader
improve performance during the indexing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested a few buffer sizes, on a 36G snapshot and a 2.6G snapshot
$ du -h ...
36G /home/aatif/chainsafe/snapshots/filecoin_full_calibnet_2023-04-07_450000.car
2.6G /home/aatif/chainsafe/snapshots/forest_snapshot_calibnet_2023-06-29_height_690463.car
Large snapshot
Note there's a low confidence here - n=1
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
./target/release/examples/benchmark --mode buffer8k /home/aatif/chainsafe/snapshots/filecoin_full_calibnet_2023-04-07_450000.car |
226.545 | 226.545 | 226.545 | 1.05 |
./target/release/examples/benchmark --mode buffer1k /home/aatif/chainsafe/snapshots/filecoin_full_calibnet_2023-04-07_450000.car |
237.063 | 237.063 | 237.063 | 1.09 |
./target/release/examples/benchmark --mode buffer100 /home/aatif/chainsafe/snapshots/filecoin_full_calibnet_2023-04-07_450000.car |
216.535 | 216.535 | 216.535 | 1.00 |
./target/release/examples/benchmark --mode unbuffered /home/aatif/chainsafe/snapshots/filecoin_full_calibnet_2023-04-07_450000.car |
234.794 | 234.794 | 234.794 | 1.08 |
Small snapshot
Probably not worth a lot, because the OS will probably keep a hot cache of the file on disk
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
./target/release/examples/benchmark --mode buffer8k /home/aatif/chainsafe/snapshots/forest_snapshot_calibnet_2023-06-29_height_690463.car |
3.229 ± 0.248 | 2.946 | 3.660 | 1.32 ± 0.13 |
./target/release/examples/benchmark --mode buffer1k /home/aatif/chainsafe/snapshots/forest_snapshot_calibnet_2023-06-29_height_690463.car |
2.443 ± 0.148 | 2.276 | 2.668 | 1.00 |
./target/release/examples/benchmark --mode buffer100 /home/aatif/chainsafe/snapshots/forest_snapshot_calibnet_2023-06-29_height_690463.car |
2.562 ± 0.305 | 2.149 | 3.043 | 1.05 ± 0.14 |
./target/release/examples/benchmark --mode unbuffered /home/aatif/chainsafe/snapshots/forest_snapshot_calibnet_2023-06-29_height_690463.car |
5.788 ± 0.192 | 5.573 | 6.105 | 2.37 ± 0.16 |
Note that std::io::Seek::seek
-ing on a std::io::BufReader
always discards the buffer.
So maybe a small buffer is good for our use-case here? I'd say it's not clear, and I don't want to spend too much time benchmarking. Gonna have a 100 byte buffer and call it at that, please do double check my benchmarks if you've got time: here's the code 7a20c4c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
buffer8k |
52.861 | 52.156 | 54.317 | 1.02 |
buffer1k |
52.028 | 51.455 | 53.880 | 1.00 |
buffer100 |
53.525 | 52.877 | 55.187 | 1.03 |
unbuffered |
85.451 | 84.166 | 87.108 | 1.64 |
David did some benchmarking (n=10) on filecoin_full_calibnet_2023-04-07_450000.car
- buffering is absolutely worth it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unabridged results over 10 runs:
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
./target/release/examples/benchmark --mode buffer8k ../lotus/filecoin_full_calibnet_2023-04-07_450000.car |
52.861 ± 0.587 | 52.156 | 54.317 | 1.02 ± 0.02 |
./target/release/examples/benchmark --mode buffer1k ../lotus/filecoin_full_calibnet_2023-04-07_450000.car |
52.028 ± 0.687 | 51.455 | 53.880 | 1.00 |
./target/release/examples/benchmark --mode buffer100 ../lotus/filecoin_full_calibnet_2023-04-07_450000.car |
53.525 ± 0.638 | 52.877 | 55.187 | 1.03 ± 0.02 |
./target/release/examples/benchmark --mode unbuffered ../lotus/filecoin_full_calibnet_2023-04-07_450000.car |
85.451 ± 1.011 | 84.166 | 87.108 | 1.64 ± 0.03 |
…ks our performance. A simpler blocking impl is the way to go
… put-keyed to only cache new CIDs
This reverts commit 7a20c4c.
c993d5e
to
d924693
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely.
Summary of changes
Work for #3074.
I've not wrapped a
Blockstore
, opting to keep awrite_cache
instead for simplicity.Is this appropriate for the intended usecase?
If not, I will probably create an inner
ParityDb
, backed by an anonymous file/directory, as that will be more appropriate for this usecase (users shouldn't have to configure a database, or pick a directory, it's conceptually private to theCarBackedBlockstore
.Other information and links
Change checklist