Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: CAR-backed Blockstore #3085

Merged
merged 27 commits into from
Jun 29, 2023
Merged

feat: CAR-backed Blockstore #3085

merged 27 commits into from
Jun 29, 2023

Conversation

aatifsyed
Copy link
Contributor

@aatifsyed aatifsyed commented Jun 27, 2023

Summary of changes

Work for #3074.

I've not wrapped a Blockstore, opting to keep a write_cache instead for simplicity.
Is this appropriate for the intended usecase?

If not, I will probably create an inner ParityDb, backed by an anonymous file/directory, as that will be more appropriate for this usecase (users shouldn't have to configure a database, or pick a directory, it's conceptually private to the CarBackedBlockstore.

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.
    • N/A

Copy link
Contributor

@ruseinov ruseinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, left some thoughts and nits.

src/car_backed_blockstore.rs Show resolved Hide resolved
src/car_backed_blockstore.rs Show resolved Hide resolved
src/car_backed_blockstore.rs Outdated Show resolved Hide resolved
src/car_backed_blockstore.rs Outdated Show resolved Hide resolved
src/car_backed_blockstore.rs Outdated Show resolved Hide resolved
src/car_backed_blockstore.rs Outdated Show resolved Hide resolved
}

#[cfg(test)]
mod tests {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of coverage, we could add cases for empty and corrupted CAR files.

Copy link
Contributor

@lemmih lemmih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it. I changed forest-cli snapshot validate to use this code, and here are the numbers when validating a 36GiB snapshot:

    CarDB:    total: 44s,  indexing: 29s, walking: 15s
    ParityDB: total: 171s, loading: 147s, walking: 24s

Results:

  • No wasted disk space. This blockstore doesn't require any disk space (as long as the CAR files are uncompressed).
  • Significantly faster load times: 29s vs 147s.
  • Significantly faster access times: 15s vs 24s to visit every node in the graph.

Future improvements:

  • The CarBackedBlockstore takes a buffered reader. This might not be the best option, especially when we know exactly how much to read after seeking.
  • The CIDs are not validated. The CarReader validates that the CIDs are correct. We should do the same (but in parallel).

This blockstore is a huge improvement for forest-cli snapshot validate. Let's use it immediately.

@aatifsyed aatifsyed marked this pull request as draft June 28, 2023 10:53
@hanabi1224
Copy link
Contributor

How feasible is it to support multiple car files? I can think of a scenario that uses a subset car file of a snapshot and the manifest bundle car file (2 car files in total) to unit test state migration(s)

@lemmih
Copy link
Contributor

lemmih commented Jun 28, 2023

How feasible is it to support multiple car files? I can think of a scenario that uses a subset car file of a snapshot and the manifest bundle car file (2 car files in total) to unit test state migration(s)

The easiest way to handle that would be to concatenate the two CAR files.

# This adds the manifest key-value pairs to the snapshot.car file
cat manifest.car >> snapshot.car

@aatifsyed
Copy link
Contributor Author

The easiest way to handle that would be to concatenate the two CAR files.

From a code read, I'm not sure our current code handles, that, neither would this PR, but this could be a future enhancement.

@aatifsyed aatifsyed self-assigned this Jun 28, 2023
@aatifsyed aatifsyed force-pushed the aatifsyed/read-cars branch from a045c87 to 07b5d43 Compare June 28, 2023 22:36
/// [`Blockstore`] API calls may panic if this is not upheld.
/// - `reader`'s buffer should have enough room for the [`CarHeader`] and any [`Cid`]s.
#[tracing::instrument(level = "debug", skip_all)]
pub fn new(mut reader: ReaderT) -> cid::Result<Self> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would wrapping reader in a BufReader improve performance during the indexing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested a few buffer sizes, on a 36G snapshot and a 2.6G snapshot

$ du -h ...
36G     /home/aatif/chainsafe/snapshots/filecoin_full_calibnet_2023-04-07_450000.car
2.6G    /home/aatif/chainsafe/snapshots/forest_snapshot_calibnet_2023-06-29_height_690463.car

Large snapshot

Note there's a low confidence here - n=1

Command Mean [s] Min [s] Max [s] Relative
./target/release/examples/benchmark --mode buffer8k /home/aatif/chainsafe/snapshots/filecoin_full_calibnet_2023-04-07_450000.car 226.545 226.545 226.545 1.05
./target/release/examples/benchmark --mode buffer1k /home/aatif/chainsafe/snapshots/filecoin_full_calibnet_2023-04-07_450000.car 237.063 237.063 237.063 1.09
./target/release/examples/benchmark --mode buffer100 /home/aatif/chainsafe/snapshots/filecoin_full_calibnet_2023-04-07_450000.car 216.535 216.535 216.535 1.00
./target/release/examples/benchmark --mode unbuffered /home/aatif/chainsafe/snapshots/filecoin_full_calibnet_2023-04-07_450000.car 234.794 234.794 234.794 1.08

Small snapshot

Probably not worth a lot, because the OS will probably keep a hot cache of the file on disk

Command Mean [s] Min [s] Max [s] Relative
./target/release/examples/benchmark --mode buffer8k /home/aatif/chainsafe/snapshots/forest_snapshot_calibnet_2023-06-29_height_690463.car 3.229 ± 0.248 2.946 3.660 1.32 ± 0.13
./target/release/examples/benchmark --mode buffer1k /home/aatif/chainsafe/snapshots/forest_snapshot_calibnet_2023-06-29_height_690463.car 2.443 ± 0.148 2.276 2.668 1.00
./target/release/examples/benchmark --mode buffer100 /home/aatif/chainsafe/snapshots/forest_snapshot_calibnet_2023-06-29_height_690463.car 2.562 ± 0.305 2.149 3.043 1.05 ± 0.14
./target/release/examples/benchmark --mode unbuffered /home/aatif/chainsafe/snapshots/forest_snapshot_calibnet_2023-06-29_height_690463.car 5.788 ± 0.192 5.573 6.105 2.37 ± 0.16

Note that std::io::Seek::seek-ing on a std::io::BufReader always discards the buffer.

So maybe a small buffer is good for our use-case here? I'd say it's not clear, and I don't want to spend too much time benchmarking. Gonna have a 100 byte buffer and call it at that, please do double check my benchmarks if you've got time: here's the code 7a20c4c

Copy link
Contributor Author

@aatifsyed aatifsyed Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Command Mean [s] Min [s] Max [s] Relative
buffer8k 52.861 52.156 54.317 1.02
buffer1k 52.028 51.455 53.880 1.00
buffer100 53.525 52.877 55.187 1.03
unbuffered 85.451 84.166 87.108 1.64

David did some benchmarking (n=10) on filecoin_full_calibnet_2023-04-07_450000.car - buffering is absolutely worth it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unabridged results over 10 runs:

Command Mean [s] Min [s] Max [s] Relative
./target/release/examples/benchmark --mode buffer8k ../lotus/filecoin_full_calibnet_2023-04-07_450000.car 52.861 ± 0.587 52.156 54.317 1.02 ± 0.02
./target/release/examples/benchmark --mode buffer1k ../lotus/filecoin_full_calibnet_2023-04-07_450000.car 52.028 ± 0.687 51.455 53.880 1.00
./target/release/examples/benchmark --mode buffer100 ../lotus/filecoin_full_calibnet_2023-04-07_450000.car 53.525 ± 0.638 52.877 55.187 1.03 ± 0.02
./target/release/examples/benchmark --mode unbuffered ../lotus/filecoin_full_calibnet_2023-04-07_450000.car 85.451 ± 1.011 84.166 87.108 1.64 ± 0.03

@aatifsyed aatifsyed force-pushed the aatifsyed/read-cars branch from c993d5e to d924693 Compare June 29, 2023 14:36
@aatifsyed aatifsyed marked this pull request as ready for review June 29, 2023 14:36
@aatifsyed aatifsyed requested a review from lemmih June 29, 2023 14:58
@aatifsyed aatifsyed enabled auto-merge (squash) June 29, 2023 15:24
Copy link
Contributor

@lemmih lemmih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants