feat: CAR-backed Blockstore #3085

aatifsyed · 2023-06-27T13:40:20Z

Summary of changes

Work for #3074.

I've not wrapped a Blockstore, opting to keep a write_cache instead for simplicity.
Is this appropriate for the intended usecase?

If not, I will probably create an inner ParityDb, backed by an anonymous file/directory, as that will be more appropriate for this usecase (users shouldn't have to configure a database, or pick a directory, it's conceptually private to the CarBackedBlockstore.

Change checklist

I have performed a self-review of my own code,
I have made corresponding changes to the documentation,
I have added tests that prove my fix is effective or that my feature works (if possible),
I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.
- N/A

ruseinov

Looks good, left some thoughts and nits.

src/car_backed_blockstore.rs

LesnyRumcajs · 2023-06-28T07:57:50Z

src/car_backed_blockstore.rs

+}
+
+#[cfg(test)]
+mod tests {


For the sake of coverage, we could add cases for empty and corrupted CAR files.

lemmih

Love it. I changed forest-cli snapshot validate to use this code, and here are the numbers when validating a 36GiB snapshot:

    CarDB:    total: 44s,  indexing: 29s, walking: 15s
    ParityDB: total: 171s, loading: 147s, walking: 24s

Results:

No wasted disk space. This blockstore doesn't require any disk space (as long as the CAR files are uncompressed).
Significantly faster load times: 29s vs 147s.
Significantly faster access times: 15s vs 24s to visit every node in the graph.

Future improvements:

The CarBackedBlockstore takes a buffered reader. This might not be the best option, especially when we know exactly how much to read after seeking.
The CIDs are not validated. The CarReader validates that the CIDs are correct. We should do the same (but in parallel).

This blockstore is a huge improvement for forest-cli snapshot validate. Let's use it immediately.

src/car_backed_blockstore.rs

hanabi1224 · 2023-06-28T11:18:36Z

How feasible is it to support multiple car files? I can think of a scenario that uses a subset car file of a snapshot and the manifest bundle car file (2 car files in total) to unit test state migration(s)

lemmih · 2023-06-28T11:31:28Z

How feasible is it to support multiple car files? I can think of a scenario that uses a subset car file of a snapshot and the manifest bundle car file (2 car files in total) to unit test state migration(s)

The easiest way to handle that would be to concatenate the two CAR files.

# This adds the manifest key-value pairs to the snapshot.car file
cat manifest.car >> snapshot.car

aatifsyed · 2023-06-28T12:51:00Z

The easiest way to handle that would be to concatenate the two CAR files.

From a code read, I'm not sure our current code handles, that, neither would this PR, but this could be a future enhancement.

src/car_backed_blockstore.rs

lemmih · 2023-06-29T06:07:40Z

src/car_backed_blockstore.rs

+    ///   [`Blockstore`] API calls may panic if this is not upheld.
+    /// - `reader`'s buffer should have enough room for the [`CarHeader`] and any [`Cid`]s.
+    #[tracing::instrument(level = "debug", skip_all)]
+    pub fn new(mut reader: ReaderT) -> cid::Result<Self> {


Would wrapping reader in a BufReader improve performance during the indexing?

I've tested a few buffer sizes, on a 36G snapshot and a 2.6G snapshot

$ du -h ... 36G /home/aatif/chainsafe/snapshots/filecoin_full_calibnet_2023-04-07_450000.car 2.6G /home/aatif/chainsafe/snapshots/forest_snapshot_calibnet_2023-06-29_height_690463.car

Large snapshot

Note there's a low confidence here - n=1

Command Mean [s] Min [s] Max [s] Relative

./target/release/examples/benchmark --mode buffer8k /home/aatif/chainsafe/snapshots/filecoin_full_calibnet_2023-04-07_450000.car 226.545 226.545 226.545 1.05

./target/release/examples/benchmark --mode buffer1k /home/aatif/chainsafe/snapshots/filecoin_full_calibnet_2023-04-07_450000.car 237.063 237.063 237.063 1.09

./target/release/examples/benchmark --mode buffer100 /home/aatif/chainsafe/snapshots/filecoin_full_calibnet_2023-04-07_450000.car 216.535 216.535 216.535 1.00

./target/release/examples/benchmark --mode unbuffered /home/aatif/chainsafe/snapshots/filecoin_full_calibnet_2023-04-07_450000.car 234.794 234.794 234.794 1.08

Small snapshot

Probably not worth a lot, because the OS will probably keep a hot cache of the file on disk

Command Mean [s] Min [s] Max [s] Relative

./target/release/examples/benchmark --mode buffer8k /home/aatif/chainsafe/snapshots/forest_snapshot_calibnet_2023-06-29_height_690463.car 3.229 ± 0.248 2.946 3.660 1.32 ± 0.13

./target/release/examples/benchmark --mode buffer1k /home/aatif/chainsafe/snapshots/forest_snapshot_calibnet_2023-06-29_height_690463.car 2.443 ± 0.148 2.276 2.668 1.00

./target/release/examples/benchmark --mode buffer100 /home/aatif/chainsafe/snapshots/forest_snapshot_calibnet_2023-06-29_height_690463.car 2.562 ± 0.305 2.149 3.043 1.05 ± 0.14

./target/release/examples/benchmark --mode unbuffered /home/aatif/chainsafe/snapshots/forest_snapshot_calibnet_2023-06-29_height_690463.car 5.788 ± 0.192 5.573 6.105 2.37 ± 0.16

Note that std::io::Seek::seek-ing on a std::io::BufReader always discards the buffer.

So maybe a small buffer is good for our use-case here? I'd say it's not clear, and I don't want to spend too much time benchmarking. Gonna have a 100 byte buffer and call it at that, please do double check my benchmarks if you've got time: here's the code 7a20c4c

Command Mean [s] Min [s] Max [s] Relative

buffer8k 52.861 52.156 54.317 1.02

buffer1k 52.028 51.455 53.880 1.00

buffer100 53.525 52.877 55.187 1.03

unbuffered 85.451 84.166 87.108 1.64

David did some benchmarking (n=10) on filecoin_full_calibnet_2023-04-07_450000.car - buffering is absolutely worth it

Unabridged results over 10 runs:

Command Mean [s] Min [s] Max [s] Relative

./target/release/examples/benchmark --mode buffer8k ../lotus/filecoin_full_calibnet_2023-04-07_450000.car 52.861 ± 0.587 52.156 54.317 1.02 ± 0.02

./target/release/examples/benchmark --mode buffer1k ../lotus/filecoin_full_calibnet_2023-04-07_450000.car 52.028 ± 0.687 51.455 53.880 1.00

./target/release/examples/benchmark --mode buffer100 ../lotus/filecoin_full_calibnet_2023-04-07_450000.car 53.525 ± 0.638 52.877 55.187 1.03 ± 0.02

./target/release/examples/benchmark --mode unbuffered ../lotus/filecoin_full_calibnet_2023-04-07_450000.car 85.451 ± 1.011 84.166 87.108 1.64 ± 0.03

…ks our performance. A simpler blocking impl is the way to go

… put-keyed to only cache new CIDs

This reverts commit 7a20c4c.

lemmih

Lovely.

aatifsyed requested review from elmattic, lemmih, LesnyRumcajs, jdjaustin, hanabi1224, creativcoder, sudo-shashank and ruseinov as code owners June 27, 2023 13:40

ruseinov approved these changes Jun 27, 2023

View reviewed changes

src/car_backed_blockstore.rs Show resolved Hide resolved

src/car_backed_blockstore.rs Show resolved Hide resolved

src/car_backed_blockstore.rs Outdated Show resolved Hide resolved

src/car_backed_blockstore.rs Outdated Show resolved Hide resolved

LesnyRumcajs reviewed Jun 28, 2023

View reviewed changes

lemmih requested changes Jun 28, 2023

View reviewed changes

aatifsyed commented Jun 28, 2023

View reviewed changes

src/car_backed_blockstore.rs Show resolved Hide resolved

aatifsyed marked this pull request as draft June 28, 2023 10:53

ruseinov reviewed Jun 28, 2023

View reviewed changes

src/car_backed_blockstore.rs Outdated Show resolved Hide resolved

aatifsyed self-assigned this Jun 28, 2023

aatifsyed commented Jun 28, 2023

View reviewed changes

src/car_backed_blockstore.rs Show resolved Hide resolved

aatifsyed force-pushed the aatifsyed/read-cars branch from a045c87 to 07b5d43 Compare June 28, 2023 22:36

lemmih reviewed Jun 29, 2023

View reviewed changes

aatifsyed added 9 commits June 29, 2023 15:32

feat: count sections works

d8b0ae8

feat: codec impl

5f3139b

note: using codec _requires_ reading the frame into memory, which tan…

0bf9589

…ks our performance. A simpler blocking impl is the way to go

feat: initial implementation in example

1d529e4

feat: demo

0a604d1

feat: car backed blockstore

6e4bf0b

run: rm -rf examples

22b1303

feat: CarBackedBlockstore

ae41766

fix: unused dependencies

7b30bcb

aatifsyed added 18 commits June 29, 2023 15:33

feat: ready for review

e4a00a8

fix: spelling

745b9cb

fix: copyright

2aa8725

fix: change debug! to trace!, add some trivial documentation, and fix…

f270d36

… put-keyed to only cache new CIDs

feat: use Read, not BufRead

773e6aa

fix: take impl Read, not &mut impl Read

974cb33

feat: more diagrams, BlockDataLocation refers to just the data

5d0322f

chore: spellcheck

80e5f09

feat: validate snapshot command uses car backed storage

68f5dd9

docs: add a changelog entry

7b23e5d

fix: remove #[allow(unused)]

b4fbf80

chore: prettier

91e6a32

fix: tests no longer use --force flag

5dfc265

fix: Result::transpose

25040be

fix: integration tests and better help

7f8fad7

nomerge: benchmark buffer sizes

ef49b09

Revert "nomerge: benchmark buffer sizes"

f8dab7e

This reverts commit 7a20c4c.

fix: use a small buffer when indexing

d924693

aatifsyed force-pushed the aatifsyed/read-cars branch from c993d5e to d924693 Compare June 29, 2023 14:36

aatifsyed marked this pull request as ready for review June 29, 2023 14:36

aatifsyed requested a review from lemmih June 29, 2023 14:58

aatifsyed enabled auto-merge (squash) June 29, 2023 15:24

lemmih approved these changes Jun 29, 2023

View reviewed changes

aatifsyed merged commit 293cb20 into main Jun 29, 2023

aatifsyed deleted the aatifsyed/read-cars branch June 29, 2023 15:36

jdjaustin mentioned this pull request Jun 29, 2023

Restore --tipset flag to snapshot export command #3027

Merged

4 tasks

aatifsyed mentioned this pull request Jul 3, 2023

Discussion: compression as part of the CARv2 standard ipld/ipld#288

Open

This was referenced Jul 12, 2023

feat: use zstd compressed cars as blockstores #3149

Merged

Forest CAR roadmap #3222

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: CAR-backed Blockstore #3085

feat: CAR-backed Blockstore #3085

aatifsyed commented Jun 27, 2023 •

edited

Loading

ruseinov left a comment

LesnyRumcajs Jun 28, 2023

lemmih left a comment

hanabi1224 commented Jun 28, 2023

lemmih commented Jun 28, 2023 •

edited

Loading

aatifsyed commented Jun 28, 2023

lemmih Jun 29, 2023

aatifsyed Jun 29, 2023

aatifsyed Jun 29, 2023 •

edited

Loading

lemmih Jun 29, 2023

lemmih left a comment

Command	Mean [s]	Min [s]	Max [s]	Relative
`./target/release/examples/benchmark --mode buffer8k /home/aatif/chainsafe/snapshots/filecoin_full_calibnet_2023-04-07_450000.car`	226.545	226.545	226.545	1.05
`./target/release/examples/benchmark --mode buffer1k /home/aatif/chainsafe/snapshots/filecoin_full_calibnet_2023-04-07_450000.car`	237.063	237.063	237.063	1.09
`./target/release/examples/benchmark --mode buffer100 /home/aatif/chainsafe/snapshots/filecoin_full_calibnet_2023-04-07_450000.car`	216.535	216.535	216.535	1.00
`./target/release/examples/benchmark --mode unbuffered /home/aatif/chainsafe/snapshots/filecoin_full_calibnet_2023-04-07_450000.car`	234.794	234.794	234.794	1.08

Command	Mean [s]	Min [s]	Max [s]	Relative
`./target/release/examples/benchmark --mode buffer8k /home/aatif/chainsafe/snapshots/forest_snapshot_calibnet_2023-06-29_height_690463.car`	3.229 ± 0.248	2.946	3.660	1.32 ± 0.13
`./target/release/examples/benchmark --mode buffer1k /home/aatif/chainsafe/snapshots/forest_snapshot_calibnet_2023-06-29_height_690463.car`	2.443 ± 0.148	2.276	2.668	1.00
`./target/release/examples/benchmark --mode buffer100 /home/aatif/chainsafe/snapshots/forest_snapshot_calibnet_2023-06-29_height_690463.car`	2.562 ± 0.305	2.149	3.043	1.05 ± 0.14
`./target/release/examples/benchmark --mode unbuffered /home/aatif/chainsafe/snapshots/forest_snapshot_calibnet_2023-06-29_height_690463.car`	5.788 ± 0.192	5.573	6.105	2.37 ± 0.16

Command	Mean [s]	Min [s]	Max [s]	Relative
`buffer8k`	52.861	52.156	54.317	1.02
`buffer1k`	52.028	51.455	53.880	1.00
`buffer100`	53.525	52.877	55.187	1.03
`unbuffered`	85.451	84.166	87.108	1.64

Command	Mean [s]	Min [s]	Max [s]	Relative
`./target/release/examples/benchmark --mode buffer8k ../lotus/filecoin_full_calibnet_2023-04-07_450000.car`	52.861 ± 0.587	52.156	54.317	1.02 ± 0.02
`./target/release/examples/benchmark --mode buffer1k ../lotus/filecoin_full_calibnet_2023-04-07_450000.car`	52.028 ± 0.687	51.455	53.880	1.00
`./target/release/examples/benchmark --mode buffer100 ../lotus/filecoin_full_calibnet_2023-04-07_450000.car`	53.525 ± 0.638	52.877	55.187	1.03 ± 0.02
`./target/release/examples/benchmark --mode unbuffered ../lotus/filecoin_full_calibnet_2023-04-07_450000.car`	85.451 ± 1.011	84.166	87.108	1.64 ± 0.03

feat: CAR-backed Blockstore #3085

feat: CAR-backed Blockstore #3085

Conversation

aatifsyed commented Jun 27, 2023 • edited Loading

Summary of changes

Other information and links

Change checklist

ruseinov left a comment

Choose a reason for hiding this comment

LesnyRumcajs Jun 28, 2023

Choose a reason for hiding this comment

lemmih left a comment

Choose a reason for hiding this comment

hanabi1224 commented Jun 28, 2023

lemmih commented Jun 28, 2023 • edited Loading

aatifsyed commented Jun 28, 2023

lemmih Jun 29, 2023

Choose a reason for hiding this comment

aatifsyed Jun 29, 2023

Choose a reason for hiding this comment

Large snapshot

Small snapshot

aatifsyed Jun 29, 2023 • edited Loading

Choose a reason for hiding this comment

lemmih Jun 29, 2023

Choose a reason for hiding this comment

lemmih left a comment

Choose a reason for hiding this comment

aatifsyed commented Jun 27, 2023 •

edited

Loading

lemmih commented Jun 28, 2023 •

edited

Loading

aatifsyed Jun 29, 2023 •

edited

Loading