-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fallback to fully reading the package stream when downloading before attempting decompression #797
Conversation
.to_string() | ||
.contains(DATA_DESCRIPTOR_ERROR_MESSAGE)) => | ||
{ | ||
tracing::debug!("Failed to stream decompress conda package from '{}' due to the presence of zip data descriptors. Falling back to non streaming decompression;", url); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit split between using debug
or warn
here. I'm tempted to use WARN since this is a non-advisable state to be in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the approach is valid using a fallback method.
Im wondering though if we can detect this without having to resort to catching and interpreting string errors? Can we read the header ourself beforehand?
Reading to memory might be risking OOM issues, perhaps use a SpooledTempFile.
We definitely need an example of such a ZIP file.
The zipfile contains multiple headers, one per file inside, so I think to be able to read the headers we'd need to basically reimplement the zip crate ourselves, which I don't think is viable. Will follow the file suggestion and work on creating a repro! |
@baszalmstra managed to repro the issue. You can just download a conda package, decompress it and then recompress it with I've updated the code with a test resource generated exactly this way! Would appreciate a new review! |
hey @baszalmstra would appreciate a new review when possible for you :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good ! Thanks for contributing this
I've left some small comments
@@ -266,7 +267,7 @@ fn test_extract_flaky_conda(#[values(0, 1, 13, 50, 74, 150, 8096, 16384, 20000)] | |||
let temp_dir = Path::new(env!("CARGO_TARGET_TMPDIR")); | |||
println!("Target dir: {}", temp_dir.display()); | |||
let target_dir = temp_dir.join(package_path.file_stem().unwrap()); | |||
let result = extract_conda( | |||
let result = extract_conda_via_streaming( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the same here ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same response as above ^
I checked the code a little, could we potentially rely on this and do this verification ourselves and correctly choose the the way we should decompress this? ( via streaming or buffering ) cc @baszalmstra |
hey @nichmor thank you for the review! I've addressed them or commented back, would appreciate a new review whenever you have the time :) |
@nichmor regarding checking the flags. I'm not 100% familiar with all the details, but a zipfile will contain multiple local headers that may contain the data descriptors. I'm worries that maybe the first 2 headers do not use data descriptors but the third one uses it and we accidentally let the package go the streaming route because the first header we check did not have descriptors. I've actually seen some packages that are exactly like this, only some of the headers using data descriptors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small requests but otherwise LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! ❤️
## 🤖 New release * `rattler`: 0.27.2 -> 0.27.3 (✓ API compatible changes) * `rattler_cache`: 0.1.4 -> 0.1.5 (✓ API compatible changes) * `rattler_conda_types`: 0.26.3 -> 0.27.0 (⚠️ API breaking changes) * `rattler_package_streaming`: 0.21.7 -> 0.22.0 (⚠️ API breaking changes) * `rattler_networking`: 0.20.10 -> 0.21.0 (⚠️ API breaking changes) * `rattler_lock`: 0.22.16 -> 0.22.17 (✓ API compatible changes) * `rattler_repodata_gateway`: 0.21.3 -> 0.21.4 (✓ API compatible changes) * `rattler_solve`: 1.0.0 -> 1.0.1 (✓ API compatible changes) * `rattler_index`: 0.19.21 -> 0.19.22 (✓ API compatible changes) * `rattler_shell`: 0.21.3 -> 0.21.4 * `rattler_virtual_packages`: 1.0.0 -> 1.0.1 ###⚠️ `rattler_conda_types` breaking changes ``` --- failure enum_variant_added: enum variant added on exhaustive enum --- Description: A publicly-visible enum without #[non_exhaustive] has a new variant. ref: https://doc.rust-lang.org/cargo/reference/semver.html#enum-variant-new impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.29.0/src/lints/enum_variant_added.ron Failed in: variant ParseChannelError:InvalidName in C:\Users\bas\AppData\Local\Temp\.tmpZUBw8C\rattler\crates\rattler_conda_types\src\channel\mod.rs:391 --- failure enum_variant_missing: pub enum variant removed or renamed --- Description: A publicly-visible enum has at least one variant that is no longer available under its prior name. It may have been renamed or removed entirely. ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.29.0/src/lints/enum_variant_missing.ron Failed in: variant ParseMatchSpecError::InvalidNumberOfColons, previously in file C:\Users\bas\AppData\Local\Temp\.tmpiIwxAW\rattler_conda_types\src\match_spec\parse.rs:59 ``` ###⚠️ `rattler_package_streaming` breaking changes ``` --- failure function_missing: pub fn removed or renamed --- Description: A publicly-visible function cannot be imported by its prior path. A `pub use` may have been removed, or the function itself may have been renamed or removed entirely. ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.29.0/src/lints/function_missing.ron Failed in: function rattler_package_streaming::read::extract_conda, previously in file C:\Users\bas\AppData\Local\Temp\.tmpiIwxAW\rattler_package_streaming\src\read.rs:47 ``` ###⚠️ `rattler_networking` breaking changes ``` --- failure function_missing: pub fn removed or renamed --- Description: A publicly-visible function cannot be imported by its prior path. A `pub use` may have been removed, or the function itself may have been renamed or removed entirely. ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.29.0/src/lints/function_missing.ron Failed in: function rattler_networking::redact_known_secrets_from_url, previously in file C:\Users\bas\AppData\Local\Temp\.tmpiIwxAW\rattler_networking\src\redaction.rs:24 --- failure pub_module_level_const_missing: pub module-level const is missing --- Description: A public const is missing, renamed, or changed from const to static. ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.29.0/src/lints/pub_module_level_const_missing.ron Failed in: DEFAULT_REDACTION_STR in file C:\Users\bas\AppData\Local\Temp\.tmpiIwxAW\rattler_networking\src\redaction.rs:5 --- failure trait_missing: pub trait removed or renamed --- Description: A publicly-visible trait cannot be imported by its prior path. A `pub use` may have been removed, or the trait itself may have been renamed or removed entirely. ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.29.0/src/lints/trait_missing.ron Failed in: trait rattler_networking::Redact, previously in file C:\Users\bas\AppData\Local\Temp\.tmpiIwxAW\rattler_networking\src\redaction.rs:44 ``` <details><summary><i><b>Changelog</b></i></summary><p> ## `rattler` <blockquote> ## [0.27.3](baszalmstra/rattler@rattler-v0.27.2...rattler-v0.27.3) - 2024-08-02 ### Other - mark some crates 1.0 ([#789](https://github.com/baszalmstra/rattler/pull/789)) </blockquote> ## `rattler_cache` <blockquote> ## [0.1.5](baszalmstra/rattler@rattler_cache-v0.1.4...rattler_cache-v0.1.5) - 2024-08-02 ### Other - mark some crates 1.0 ([#789](https://github.com/baszalmstra/rattler/pull/789)) </blockquote> ## `rattler_conda_types` <blockquote> ## [0.27.0](baszalmstra/rattler@rattler_conda_types-v0.26.3...rattler_conda_types-v0.27.0) - 2024-08-02 ### Fixed - redact secrets in the `canonical_name` functions ([#801](https://github.com/baszalmstra/rattler/pull/801)) - make `base_url` of `Channel` always contain a trailing slash ([#800](https://github.com/baszalmstra/rattler/pull/800)) - parse channel in matchspec string ([#792](https://github.com/baszalmstra/rattler/pull/792)) - constraints on virtual packages were ignored ([#795](https://github.com/baszalmstra/rattler/pull/795)) - url parsing for namelessmatchspec and cleanup functions ([#790](https://github.com/baszalmstra/rattler/pull/790)) ### Other - mark some crates 1.0 ([#789](https://github.com/baszalmstra/rattler/pull/789)) </blockquote> ## `rattler_package_streaming` <blockquote> ## [0.22.0](baszalmstra/rattler@rattler_package_streaming-v0.21.7...rattler_package_streaming-v0.22.0) - 2024-08-02 ### Fixed - redact secrets in the `canonical_name` functions ([#801](https://github.com/baszalmstra/rattler/pull/801)) - Fallback to fully reading the package stream when downloading before attempting decompression ([#797](https://github.com/baszalmstra/rattler/pull/797)) ### Other - mark some crates 1.0 ([#789](https://github.com/baszalmstra/rattler/pull/789)) </blockquote> ## `rattler_networking` <blockquote> ## [0.21.0](baszalmstra/rattler@rattler_networking-v0.20.10...rattler_networking-v0.21.0) - 2024-08-02 ### Fixed - redact secrets in the `canonical_name` functions ([#801](https://github.com/baszalmstra/rattler/pull/801)) </blockquote> ## `rattler_lock` <blockquote> ## [0.22.17](baszalmstra/rattler@rattler_lock-v0.22.16...rattler_lock-v0.22.17) - 2024-08-02 ### Other - mark some crates 1.0 ([#789](https://github.com/baszalmstra/rattler/pull/789)) </blockquote> ## `rattler_repodata_gateway` <blockquote> ## [0.21.4](baszalmstra/rattler@rattler_repodata_gateway-v0.21.3...rattler_repodata_gateway-v0.21.4) - 2024-08-02 ### Fixed - redact secrets in the `canonical_name` functions ([#801](https://github.com/baszalmstra/rattler/pull/801)) ### Other - mark some crates 1.0 ([#789](https://github.com/baszalmstra/rattler/pull/789)) </blockquote> ## `rattler_solve` <blockquote> ## [1.0.1](baszalmstra/rattler@rattler_solve-v1.0.0...rattler_solve-v1.0.1) - 2024-08-02 ### Fixed - constraints on virtual packages were ignored ([#795](https://github.com/baszalmstra/rattler/pull/795)) </blockquote> ## `rattler_index` <blockquote> ## [0.19.22](baszalmstra/rattler@rattler_index-v0.19.21...rattler_index-v0.19.22) - 2024-08-02 ### Other - mark some crates 1.0 ([#789](https://github.com/baszalmstra/rattler/pull/789)) </blockquote> ## `rattler_shell` <blockquote> ## [0.21.4](baszalmstra/rattler@rattler_shell-v0.21.3...rattler_shell-v0.21.4) - 2024-08-02 ### Other - updated the following local packages: rattler_conda_types </blockquote> ## `rattler_virtual_packages` <blockquote> ## [1.0.1](baszalmstra/rattler@rattler_virtual_packages-v1.0.0...rattler_virtual_packages-v1.0.1) - 2024-08-02 ### Other - updated the following local packages: rattler_conda_types </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/MarcoIeni/release-plz/). Signed-off-by: Bas Zalmstra <bas@prefix.dev>
…938) ### Description In #837 locking logic was added to the download reporter to prevent multiple downloads. This was clashing with the logic added in #797 where, if we detect we're trying to download a zip that has data descriptors, we fallback to doing a full download and interrupt the streaming decompression download. The issue comes from us not releasing the lock before attempting the second download, hitting this line https://github.com/conda/rattler/blob/8bb883536df52de46fd2c040228bb77d39c92607/crates/rattler_cache/src/package_cache/mod.rs#L439 As a fix, we report the previous download as completed before attempting the second one! Have verified in my custom setup that this works.
Fixes #794.
Now, we detect the error message thrown when attempting to decompress the HTTP stream. When this is the case, we do a second request and read all the data to an in-memory buffer, before attempting decompression.
The chosen approach was the third one discussed in this comment #794 (comment) as it is the one that works out of the box + does not have a perf implication on the normal streaming downloads.
Haven't implemented tests yet, as I'd first like to discuss the approach in itself. But I've confirmed this fixes the issue I was facing!