-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Blake3 support for Verify Store #575
Add Blake3 support for Verify Store #575
Conversation
Hope you to review this. cc: @allada, @MarcusSorealheis, @aaronmondal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 LGTMs obtained
a discussion (no related file):
Sadly I don't think we should do it this way. The original request knows what hash function to use and here we just assume it's the default.
I'd rather do it the right way.
I'm not sure what the best way to do this is. On first glance it feels like adding the hash enum type to DigestInfo
seems like the right way, but we frequently cast DigestInfo
into the proto's Digest
which looses this information.
Another possible option is to change the store api so it passes the digest function along with the requests. This will require a lot of pluming though and should be done in multiple PRs if this is the right way.
As for right now I'm not sure what the best option is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 LGTMs obtained
a discussion (no related file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
Sadly I don't think we should do it this way. The original request knows what hash function to use and here we just assume it's the default.
I'd rather do it the right way.
I'm not sure what the best way to do this is. On first glance it feels like adding the hash enum type to
DigestInfo
seems like the right way, but we frequently castDigestInfo
into the proto'sDigest
which looses this information.Another possible option is to change the store api so it passes the digest function along with the requests. This will require a lot of pluming though and should be done in multiple PRs if this is the right way.
As for right now I'm not sure what the best option is.
I thought about this a bit more and did some rough tests to see how difficult it would be for a refactor in this area, and believe that adding a config for what hash function to use on the config for this store is the right way for now.
So if we change the config to support a hash function I think we can proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 LGTMs obtained
a discussion (no related file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
I thought about this a bit more and did some rough tests to see how difficult it would be for a refactor in this area, and believe that adding a config for what hash function to use on the config for this store is the right way for now.
So if we change the config to support a hash function I think we can proceed.
Thanks for your feedback, @allada . Let me do work in that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
65cefeb
to
e669c0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r2.
Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Remote / large-ubuntu-22.04
nativelink-config/src/stores.rs
line 17 at r2 (raw file):
use serde::{Deserialize, Serialize}; use crate::cas_server::ConfigDigestHashFunction;
Lets move this into this crate. By doing this, it creates a circular reference, which is bad practice.
nativelink-config/src/stores.rs
line 338 at r2 (raw file):
/// This should be set to false for AC, but true for CAS stores. #[serde(default)] pub verify_hash: bool,
nit: Lets rename this:
hash_verification_function
and make it an Option<ConfigDigestHashFunction>
Also add a comment above saying that None
means it won't verify the hash.
nativelink-store/src/verify_store.rs
line 60 at r2 (raw file):
mut rx: DropCloserReadHalf, size_info: UploadSizeInfo, mut maybe_hasher: Option<([u8; 32], DigestHasher)>,
nit: Lets split this up. Les pass
nativelink-store/src/verify_store.rs
line 83 at r2 (raw file):
} if let Some((original_hash, hasher)) = maybe_hasher.as_mut() { let hash_result: [u8; 32] = hasher.finalize_digest(i64::try_from(sum_size)?).packed_hash;
nit: We can't trust this size. But we also don't need it. So instead, lets pass -1
here and make a comment that this digest is only validating the hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll defer to Blaise on this one, as I think his comments point in the right direction and you have done some important work to get started. At a high level:
Tests, as Blaise said
Config, as Blaise said. You already have a lot in place for that direction.
The only thing I would think about on the code is making the implementation in verify_store.rs
less coupled to one hashing algo or another. It should be straightforward but happy to help (or ask someone else) if needed.
Nice work and thank you for the contribution.
nativelink-config/src/stores.rs
Outdated
|
||
/// Digest hash function to use for hashing contents in the verify store | ||
/// | ||
/// Default: ConfigDigestHashFunction::sha256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe specify what another option might be here in the comments. Blake3
nativelink-config/src/cas_server.rs
Outdated
@@ -544,7 +544,7 @@ pub enum WorkerConfig { | |||
} | |||
|
|||
#[allow(non_camel_case_types)] | |||
#[derive(Deserialize, Debug, Clone, Copy)] | |||
#[derive(Serialize, Deserialize, Debug, Clone, Copy)] | |||
pub enum ConfigDigestHashFunction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, just below the changed line, we have the enum we need. to build upon to support sha256 or Blake3.
nativelink-store/src/verify_store.rs
Outdated
hasher = Some(( | ||
digest.packed_hash, | ||
DigestHasher::from(DigestHasherFunc::from( | ||
self.digest_hash_function.unwrap_or(ConfigDigestHashFunction::sha256), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this here? Im confused.
nativelink-store/src/verify_store.rs
Outdated
if let Some((original_hash, hasher)) = maybe_hasher { | ||
let hash_result: [u8; 32] = hasher.finalize().into(); | ||
if original_hash != hash_result { | ||
if let Some((original_hash, hasher)) = maybe_hasher.as_mut() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the if let
conditional is fine, but computing the hash should live elsewhere. That way, here you can still have the comparator on current line 84 (if *original_hash != hash_result
), with which type of hash being used is outside of the block. It should work for either hashing algorithm.
Right now, hash_result
depends on Blake3 and that would be bad.
121af6d
to
c6cd616
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / ubuntu-22.04, Local / ubuntu-22.04, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), pre-commit-checks, publish-image, ubuntu-20.04, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable
a discussion (no related file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
Also needs tests.
Done.
nativelink-config/src/cas_server.rs
line 548 at r2 (raw file):
Previously, MarcusSorealheis (Marcus Eagan) wrote…
oh, just below the changed line, we have the enum we need. to build upon to support sha256 or Blake3.
Done.
nativelink-config/src/stores.rs
line 17 at r2 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
Lets move this into this crate. By doing this, it creates a circular reference, which is bad practice.
Done.
nativelink-config/src/stores.rs
line 342 at r2 (raw file):
Previously, MarcusSorealheis (Marcus Eagan) wrote…
maybe specify what another option might be here in the comments.
Blake3
Done.
nativelink-store/src/verify_store.rs
line 82 at r2 (raw file):
Previously, MarcusSorealheis (Marcus Eagan) wrote…
I think the
if let
conditionalis fine, but computing the hash should live elsewhere. That way, here you can still have the comparator on current line 84 (
if *original_hash != hash_result`), with which type of hash being used is outside of the block. It should work for either hashing algorithm.Right now,
hash_result
depends on Blake3 and that would be bad.
Done.
nativelink-store/src/verify_store.rs
line 146 at r2 (raw file):
Previously, MarcusSorealheis (Marcus Eagan) wrote…
why is this here? Im confused.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 11 files at r3, all commit messages.
Reviewable status: 0 of 1 LGTMs obtained, and pending CI: pre-commit-checks
nativelink-config/src/stores.rs
line 342 at r2 (raw file):
Previously, steed924 (Steed) wrote…
Done.
I don't think we should encourage blake3
. Sha256 is what the default is for most implementations. Using blake3 requires the client to be setup for blake3 (which none to my knowledge do by default. They all use sha256 by default).
I'd rather just say "Like sha256" and not even mention blake3, since it requires a lot to go right.
nativelink-store/src/verify_store.rs
line 141 at r3 (raw file):
} let mut hasher = None;
nit: use (you can inline this too after it's done):
let hasher = self.hash_verification_function.map(|v| DigestHasher::from(DigestHasherFunc::from(v)));
nativelink-store/src/verify_store.rs
line 186 at r3 (raw file):
"If the verification store is verifying the size of the data", ); if let Some(hash_verification_function) = self.hash_verification_function {
nit: Just use:
c.publish("hash_verification_function", format!("{:?}", self.hash_verification_function));
c6cd616
to
9f28d2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / ubuntu-22.04, Local / ubuntu-22.04, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), pre-commit-checks, publish-image, ubuntu-20.04, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable
nativelink-config/src/stores.rs
line 342 at r2 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
I don't think we should encourage
blake3
. Sha256 is what the default is for most implementations. Using blake3 requires the client to be setup for blake3 (which none to my knowledge do by default. They all use sha256 by default).I'd rather just say "Like sha256" and not even mention blake3, since it requires a lot to go right.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 11 files at r3, 2 of 2 files at r4, all commit messages.
Dismissed @MarcusSorealheis from 4 discussions.
Reviewable status: 1 of 1 LGTMs obtained, and pending CI: Remote / large-ubuntu-22.04, pre-commit-checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: 2 of 1 LGTMs obtained, and pending CI: pre-commit-checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: 2 of 1 LGTMs obtained, and pending CI: pre-commit-checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 1 LGTMs obtained, and pending CI: pre-commit-checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steed924 , can you rebase and we'll get this landed.
Reviewable status: 3 of 1 LGTMs obtained, and pending CI: pre-commit-checks
9f28d2f
to
13cc9cc
Compare
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Just done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @steed924 The failing CI job are the pre-commit hooks. I think it's trailing whitespace in the README.
Reviewed all commit messages.
Reviewable status: 4 of 1 LGTMs obtained, and pending CI: pre-commit-checks
13cc9cc
to
610adbb
Compare
Thanks, @aaronmondal . Just fixed that issue. ;) |
Description
I've added Blake3 hash support Verify Store.
I just used the
digest_hasher
utiliy module to keep consistency.For simplicity, I just used the default hash function which is set from the global config.
I've done integration tests on my machine to verify blake3 function actually works.
But since the global hash function is
OnceLock
and it can't be set twice, I couldn't add specific unit tests for blake3 and sha256 separately.May it would be better if we add hash function config variable for the Verify Store Config?
Or is it just adding extra complexity?
I'd like to make this clear.
Fixes #444
Type of change
Checklist
bazel test //...
passes locallygit amend
see some docsThis change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)