Skip to content

Commit

Permalink
JLAP support (#197)
Browse files Browse the repository at this point in the history
* adding initial module for JLAP support; still a WIP

* making a few updates to the JLAP module

* updating doc string

* initial commit to add a blake2b has to the state file

* removing accidentally committed jlap stuff

* removing unnecessary dependency

* updating imports

* fixing formatting issues

* updating this branch to instead switch the blake2 hash implementation to blake2b

* fixing formatting issues

* attempting to fix an issue with the windows test runner

* saving progress so far

* range request works better now

* adding the variant checks for JLAP

* adding a new JLAPManager object to hold a lot of the data and logic need for fetching, updating and patching; still incomplete

* making a few updates to JLAPManager struct; adding blake2_hash field

* caching and updating the jlap file is more-or-less working

* I think this actually works! Stil need to write tests though 🙄; coming soon...

* adding the start of some testing

* finished add first test ✅

* Added more tests and example

The tests work when just running the jlap module but fail when running
the entire test suite.

* Fixing test related stuff

Because I used the `tokio:fs` crate in my JLAP code I was seeing
unpredictable behavior when mixing in the `std::fs` crate with my tests.
Switching the tests to use `tokio::fs` instead appears to have resolved
this issue.

* Adding hash verification; test refactor

Here, I'm adding a hash verification as the last step in a successful
jlap patch operation. The `patch_repo_data` function now returns this
updated hash. I'm guessing it will be useful for updating the
`*.state.json` file.

This commit also refactors the tests a little bit to make it easier to
reason about (trying to hide some of the setup code in their own
functions).

* couple of small changes

* Updates to return the hash object instead of string

This commit also makes changes to the rattler_digest code. I think it's
better to store the hash type there rather than in the
rattler_repodata::fetch::cache module that is private.

I also updated the way that the hashes are generated to take advantage
of the rattler_digest library too.

* Re-working the way JLAP works

This commit is a large refactor based on a conversation I had with a
colleague. Instead of caching the JLAP file itself, we now just store
information about the request in the .state.json file. This helps
simplify the code quite a bit.

* clippy issues; updating docs

* more updates based on review comments

* Refactoring to make code easier read plus more!

This commit does some refactor to hopefully increase code readability.
It introduces a new `JLAP` struct which holds information related to the
JLAP response.

* Adds working checksum validation 🙌

This commit finishes the validate_checksum method. It also updates the
error messages to be a little less redundant (removing the `Error`
suffix).

* changes based on comments from review

* Updates the serializers for hash values

This commit updates the serializers used for hash values it relies on
the `serde_as` macro now. I had to change some values in the
`Cargo.toml` file to get this to work.

* more tweaks and fixes

* addressing more comments from review

* more suggestions from review

* updating comment

* moving hex decoding to serde

* we always need to return the latest iv value we get by running validate_checksum

* use Blake2b in `validate_cached_state`

* updating fetch_repo_data so it returns earlier when it successfully
fetches JLAP data

* Getting closer to a working JLAP

What still should be done:

- We should save the headers of the JLAP request to respect the cache
  timeouts
- This doesn't play well with the progress bar yet

* updating docs and adding a new error for when the parsing of the checksum fails

* Lots of refactor from manual testing

This commit inclues a lot of refactoring I did based on manual testing
with real repodata (still needs to be included as unit tests 😬). The
case I was not handling well were empty JLAP responses. This happens
when there is no data to update. I was also saving the wrong values for
the new initialization vector and had to address this too.

* fixing documentation error

* adding another test case to handle jlap responses with no new patches

* adding another test case to make sure that the range not satisfiable error handling logic works as expected

* refactoring tests to make them easier to read

* Refactor test cases and adds ordering of repo data

This commit contains two things:

1. I added a way for the serializer to order the repo data. This is
   necessary to make the blake2 hashes line up correctly. It does incur
   a performance penalty.
2. I refactored the tests to make use of `rstest` and test cases. I did
   this because there was a lot of redundant code in the test module

---------

Co-authored-by: Wolf Vollprecht <w.vollprecht@gmail.com>
  • Loading branch information
travishathaway and wolfv authored Jun 12, 2023
1 parent b5f117b commit ce910d4
Show file tree
Hide file tree
Showing 9 changed files with 1,180 additions and 26 deletions.
1 change: 1 addition & 0 deletions crates/rattler_digest/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ hex = "0.4.3"
serde = { version = "1.0.163", features = ["derive"], optional = true }
sha2 = "0.10.6"
md-5 = "0.10.5"
blake2 = "0.10.6"
serde_with = "3.0.0"

[features]
Expand Down
8 changes: 8 additions & 0 deletions crates/rattler_digest/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ pub mod serde;

pub use digest;

use blake2::digest::consts::U32;
use blake2::{Blake2b, Blake2bMac};
use digest::{Digest, Output};
use std::io::Read;
use std::{fs::File, io::Write, path::Path};
Expand All @@ -59,6 +61,12 @@ pub type Sha256Hash = sha2::digest::Output<Sha256>;
/// A type alias for the output of an MD5 hash.
pub type Md5Hash = md5::digest::Output<Md5>;

/// A type for a 32 bit length blake2b digest.
pub type Blake2b256 = Blake2b<U32>;

/// A type alias for the output of a blake2b256 hash.
pub type Blake2bMac256 = Blake2bMac<U32>;

/// Compute a hash of the file at the specified location.
pub fn compute_file_digest<D: Digest + Default + Write>(
path: impl AsRef<Path>,
Expand Down
8 changes: 5 additions & 3 deletions crates/rattler_repodata_gateway/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -30,14 +30,16 @@ serde = { version = "1.0.163", features = ["derive"] }
serde_json = { version = "1.0.96" }
pin-project-lite = "0.2.9"
md-5 = "0.10.5"
rattler_digest = { version = "0.2.0", path = "../rattler_digest", features = ["tokio"] }
rattler_digest = { version = "0.2.0", path = "../rattler_digest", features = ["tokio", "serde"] }
rattler_conda_types = { version = "0.2.0", path = "../rattler_conda_types", optional = true }
fxhash = { version = "0.2.1", optional = true }
memmap2 = { version = "0.6.2", optional = true }
ouroboros = { version = "0.15.6", optional = true }
serde_with = { version = "3.0.0", optional = true }
serde_with = "3.0.0"
superslice = { version = "1.0.0", optional = true }
itertools = { version = "0.10.5", optional = true }
json-patch = "1.0.0"
hex = { version = "0.4.3", features = ["serde"] }

[target.'cfg(unix)'.dependencies]
libc = "0.2"
Expand All @@ -59,4 +61,4 @@ rstest = "0.17.0"
default = ['native-tls']
native-tls = ['reqwest/native-tls']
rustls-tls = ['reqwest/rustls-tls']
sparse = ["rattler_conda_types", "memmap2", "ouroboros", "serde_with", "superslice", "itertools", "serde_json/raw_value"]
sparse = ["rattler_conda_types", "memmap2", "ouroboros", "superslice", "itertools", "serde_json/raw_value"]
39 changes: 34 additions & 5 deletions crates/rattler_repodata_gateway/src/fetch/cache/mod.rs
Original file line number Diff line number Diff line change
@@ -1,15 +1,12 @@
mod cache_headers;

use blake2::digest::consts::U32;
use blake2::Blake2b;
pub use cache_headers::CacheHeaders;
use rattler_digest::{serde::SerializableHash, Blake2b256};
use serde::{Deserialize, Deserializer, Serialize, Serializer};
use serde_with::serde_as;
use std::{fs::File, io::Read, path::Path, str::FromStr, time::SystemTime};
use url::Url;

/// Custom blake2b type
pub type Blake2b256 = Blake2b<U32>;

/// Representation of the `.state.json` file alongside a `repodata.json` file.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RepoDataState {
Expand Down Expand Up @@ -51,6 +48,9 @@ pub struct RepoDataState {

/// Whether or not JLAP is available for the subdirectory
pub has_jlap: Option<Expiring<bool>>,

/// State information related to JLAP
pub jlap: Option<JLAPState>,
}

impl RepoDataState {
Expand Down Expand Up @@ -80,6 +80,35 @@ impl FromStr for RepoDataState {
}
}

/// Used inside of the `RepoDataState` to store information related to our JLAP state
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct JLAPState {
/// Initialization Vector (IV) for of the JLAP file; this is found on the first line of the
/// JLAP file.
#[serde(rename = "iv", with = "hex")]
pub initialization_vector: Vec<u8>,

/// Current position to use for the bytes offset in the range request for JLAP
#[serde(rename = "pos")]
pub position: u64,

/// Footer contains metadata about the JLAP file such as which url it is for
pub footer: JLAPFooter,
}

/// Represents the metadata for a JLAP file, which is typically found at the very end
#[serde_as]
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
pub struct JLAPFooter {
/// This is not actually a full URL, just the last part of it (i.e. the filename
/// `repodata.json`). That's why we store it as a [`String`]
pub url: String,

/// blake2b hash of the latest `repodata.json` file
#[serde_as(as = "SerializableHash::<rattler_digest::Blake2b256>")]
pub latest: blake2::digest::Output<Blake2b256>,
}

/// Represents a value and when the value was last checked.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Expiring<T> {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,5 @@ has_zst:
last_checked: "2023-02-13T14:08:50Z"
has_bz2: ~
has_jlap: ~
jlap: ~

Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,5 @@ has_bz2:
value: true
last_checked: "2023-05-18T13:59:07.112638Z"
has_jlap: ~
jlap: ~

Loading

0 comments on commit ce910d4

Please sign in to comment.