Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: basic validation of CAR files #3591

Merged
merged 16 commits into from
Oct 17, 2023
Merged

feat: basic validation of CAR files #3591

merged 16 commits into from
Oct 17, 2023

Conversation

lemmih
Copy link
Contributor

@lemmih lemmih commented Oct 13, 2023

Summary of changes

Changes introduced in this pull request:

  • add forest-tool car validate command.

This new command verifies the invariants expected to be true for all CAR files (not just Filecoin CAR files).

Reference issue to close (if applicable)

Closes

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

@lemmih lemmih marked this pull request as ready for review October 13, 2023 12:46
@lemmih lemmih requested a review from a team as a code owner October 13, 2023 12:46
@lemmih lemmih requested review from hanabi1224 and LesnyRumcajs and removed request for a team October 13, 2023 12:46
@lemmih lemmih marked this pull request as draft October 16, 2023 06:38
@lemmih lemmih marked this pull request as ready for review October 16, 2023 14:22
Comment on lines 121 to 234
async fn validate_junk_car() {
let mut temp_path = NamedTempFile::new_in(".").unwrap();
temp_path.write_all(&[0xde, 0xad, 0xbe, 0xef]).unwrap();
assert!(validate(&temp_path.into_temp_path(), false, false)
.await
.is_err());
}

#[tokio::test]
async fn validate_empty_car() {
let temp_path = NamedTempFile::new_in(".").unwrap();
assert!(validate(&temp_path.into_temp_path(), false, false)
.await
.is_err());
}

#[tokio::test]
async fn validate_mainnet_genesis() {
let mut temp_path = NamedTempFile::new_in(".").unwrap();
temp_path.write_all(mainnet::DEFAULT_GENESIS).unwrap();
assert!(validate(&temp_path.into_temp_path(), false, true)
.await
.is_ok());
}

#[tokio::test]
async fn validate_calibnet_genesis() {
let mut temp_path = NamedTempFile::new_in(".").unwrap();
temp_path.write_all(calibnet::DEFAULT_GENESIS).unwrap();
assert!(validate(&temp_path.into_temp_path(), false, true)
.await
.is_ok());
}

async fn create_raw_car_file(car_blocks: Vec<CarBlock>, ignored_cids: Vec<Cid>) -> TempPath {
let temp_path = NamedTempFile::new_in(".").unwrap().into_temp_path();
let mut writer = tokio::fs::File::create(&temp_path).await.unwrap();

let frames = forest::Encoder::compress_stream_default(iter(car_blocks).map(Ok)).map_ok(
|(cids, bytes)| {
(
cids.into_iter()
.filter(|cid| !ignored_cids.contains(cid))
.collect(),
bytes,
)
},
);

// Write zstd frames and include a skippable index
forest::Encoder::write(&mut writer, vec![], frames)
.await
.unwrap();

// Flush to ensure everything has been successfully written
writer.flush().await.unwrap();
writer.shutdown().await.unwrap();
temp_path
}

// Sanity check to verify that we can create valid forest.car.zst files
#[tokio::test]
async fn validate_valid_file() {
let data = "this data _does_ match the CID".as_bytes().to_vec();
let temp_path = create_raw_car_file(
vec![CarBlock {
cid: Cid::new_v1(0, Code::Blake2b256.digest(&data)),
data,
}],
vec![],
)
.await;

assert!(validate(&temp_path, false, false).await.is_ok());
}

#[tokio::test]
async fn validate_invalid_blocks() {
let temp_path = create_raw_car_file(
vec![CarBlock {
cid: Cid::new_v1(0, Code::Identity.digest(&[10])),
data: "this data doesn't match the CID".as_bytes().to_vec(),
}],
vec![],
)
.await;

assert!(validate(&temp_path, false, false).await.is_err());
}

// If a CarBlock exist that isn't referenced in the index, this is an error.
#[tokio::test]
async fn validate_invalid_index() {
let data = "this data _does_ match the CID".as_bytes().to_vec();
let cid = Cid::new_v1(0, Code::Blake2b256.digest(&data));
let temp_path = create_raw_car_file(vec![CarBlock { cid, data }], vec![cid]).await;

assert!(validate(&temp_path, false, false).await.is_err());
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short, concise tests. That's just code porn.

src/tool/subcommands/car_cmd.rs Outdated Show resolved Hide resolved
Copy link
Member

@LesnyRumcajs LesnyRumcajs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also cover the case where we ignore the block validity and the Forest index. All in all, a user could do this. (not sure why, though)

@lemmih
Copy link
Contributor Author

lemmih commented Oct 16, 2023

We could also cover the case where we ignore the block validity and the Forest index. All in all, a user could do this. (not sure why, though)

Done.

@lemmih lemmih enabled auto-merge October 16, 2023 15:26
@lemmih lemmih requested a review from LesnyRumcajs October 17, 2023 08:02
@lemmih lemmih added this pull request to the merge queue Oct 17, 2023
Merged via the queue into main with commit bedc816 Oct 17, 2023
@lemmih lemmih deleted the lemmih/forest-tool-car-check branch October 17, 2023 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants