Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fallback to fully reading the package stream when downloading before attempting decompression #797

Merged
merged 19 commits into from
Aug 1, 2024
Merged

fix: Fallback to fully reading the package stream when downloading before attempting decompression #797

merged 19 commits into from
Aug 1, 2024

Conversation

jpcorreia99
Copy link
Contributor

@jpcorreia99 jpcorreia99 commented Jul 26, 2024

Fixes #794.

Now, we detect the error message thrown when attempting to decompress the HTTP stream. When this is the case, we do a second request and read all the data to an in-memory buffer, before attempting decompression.

The chosen approach was the third one discussed in this comment #794 (comment) as it is the one that works out of the box + does not have a perf implication on the normal streaming downloads.

Haven't implemented tests yet, as I'd first like to discuss the approach in itself. But I've confirmed this fixes the issue I was facing!

.to_string()
.contains(DATA_DESCRIPTOR_ERROR_MESSAGE)) =>
{
tracing::debug!("Failed to stream decompress conda package from '{}' due to the presence of zip data descriptors. Falling back to non streaming decompression;", url);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit split between using debug or warn here. I'm tempted to use WARN since this is a non-advisable state to be in.

@jpcorreia99 jpcorreia99 changed the title Fallback to fully reading the package stream when downloading before attempting decompression fix: Fallback to fully reading the package stream when downloading before attempting decompression Jul 26, 2024
@jpcorreia99
Copy link
Contributor Author

jpcorreia99 commented Jul 26, 2024

@wolfv / @nichmor here's my attempted fix for the issue discussed yesterday! Confirmed it fixed my case. Will try to create a zipfile representing this state so that you may reproduce it as well.

(As usual excuse my not great rust)

Copy link
Collaborator

@baszalmstra baszalmstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the approach is valid using a fallback method.

Im wondering though if we can detect this without having to resort to catching and interpreting string errors? Can we read the header ourself beforehand?

Reading to memory might be risking OOM issues, perhaps use a SpooledTempFile.

We definitely need an example of such a ZIP file.

crates/rattler_package_streaming/src/reqwest/tokio.rs Outdated Show resolved Hide resolved
crates/rattler_package_streaming/src/tokio/async_read.rs Outdated Show resolved Hide resolved
@jpcorreia99
Copy link
Contributor Author

@baszalmstra

Im wondering though if we can detect this without having to resort to catching and interpreting string errors? Can we read the header ourself beforehand?

The zipfile contains multiple headers, one per file inside, so I think to be able to read the headers we'd need to basically reimplement the zip crate ourselves, which I don't think is viable.

Will follow the file suggestion and work on creating a repro!

@jpcorreia99
Copy link
Contributor Author

jpcorreia99 commented Jul 30, 2024

@baszalmstra managed to repro the issue. You can just download a conda package, decompress it and then recompress it with zip -r <output_file> <input_folder> -fd.

I've updated the code with a test resource generated exactly this way! Would appreciate a new review!

@jpcorreia99 jpcorreia99 requested a review from baszalmstra July 30, 2024 14:38
@jpcorreia99
Copy link
Contributor Author

hey @baszalmstra would appreciate a new review when possible for you :)

Copy link
Contributor

@nichmor nichmor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good ! Thanks for contributing this
I've left some small comments

crates/rattler_package_streaming/src/tokio/async_read.rs Outdated Show resolved Hide resolved
crates/rattler_package_streaming/src/tokio/async_read.rs Outdated Show resolved Hide resolved
crates/rattler_package_streaming/src/tokio/async_read.rs Outdated Show resolved Hide resolved
crates/rattler_package_streaming/tests/extract.rs Outdated Show resolved Hide resolved
@@ -266,7 +267,7 @@ fn test_extract_flaky_conda(#[values(0, 1, 13, 50, 74, 150, 8096, 16384, 20000)]
let temp_dir = Path::new(env!("CARGO_TARGET_TMPDIR"));
println!("Target dir: {}", temp_dir.display());
let target_dir = temp_dir.join(package_path.file_stem().unwrap());
let result = extract_conda(
let result = extract_conda_via_streaming(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the same here ^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same response as above ^

crates/rattler_package_streaming/tests/extract.rs Outdated Show resolved Hide resolved
@nichmor
Copy link
Contributor

nichmor commented Jul 31, 2024

I checked the code a little,
so here
they are reading some flags to determine if an error should be thrown - so it's the one we try to catch

could we potentially rely on this and do this verification ourselves and correctly choose the the way we should decompress this? ( via streaming or buffering ) cc @baszalmstra

@jpcorreia99
Copy link
Contributor Author

hey @nichmor thank you for the review! I've addressed them or commented back, would appreciate a new review whenever you have the time :)

@jpcorreia99 jpcorreia99 requested a review from nichmor July 31, 2024 12:45
@jpcorreia99
Copy link
Contributor Author

@nichmor regarding checking the flags. I'm not 100% familiar with all the details, but a zipfile will contain multiple local headers that may contain the data descriptors. I'm worries that maybe the first 2 headers do not use data descriptors but the third one uses it and we accidentally let the package go the streaming route because the first header we check did not have descriptors.

I've actually seen some packages that are exactly like this, only some of the headers using data descriptors

Copy link
Collaborator

@baszalmstra baszalmstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small requests but otherwise LGTM

crates/rattler_package_streaming/tests/extract.rs Outdated Show resolved Hide resolved
@jpcorreia99 jpcorreia99 requested a review from baszalmstra August 1, 2024 08:37
Copy link
Contributor

@nichmor nichmor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! ❤️

@nichmor nichmor merged commit ce331ff into conda:main Aug 1, 2024
15 checks passed
baszalmstra added a commit that referenced this pull request Aug 2, 2024
## 🤖 New release
* `rattler`: 0.27.2 -> 0.27.3 (✓ API compatible changes)
* `rattler_cache`: 0.1.4 -> 0.1.5 (✓ API compatible changes)
* `rattler_conda_types`: 0.26.3 -> 0.27.0 (⚠️ API breaking changes)
* `rattler_package_streaming`: 0.21.7 -> 0.22.0 (⚠️ API breaking
changes)
* `rattler_networking`: 0.20.10 -> 0.21.0 (⚠️ API breaking changes)
* `rattler_lock`: 0.22.16 -> 0.22.17 (✓ API compatible changes)
* `rattler_repodata_gateway`: 0.21.3 -> 0.21.4 (✓ API compatible
changes)
* `rattler_solve`: 1.0.0 -> 1.0.1 (✓ API compatible changes)
* `rattler_index`: 0.19.21 -> 0.19.22 (✓ API compatible changes)
* `rattler_shell`: 0.21.3 -> 0.21.4
* `rattler_virtual_packages`: 1.0.0 -> 1.0.1

### ⚠️ `rattler_conda_types` breaking changes

```
--- failure enum_variant_added: enum variant added on exhaustive enum ---

Description:
A publicly-visible enum without #[non_exhaustive] has a new variant.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#enum-variant-new
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.29.0/src/lints/enum_variant_added.ron

Failed in:
  variant ParseChannelError:InvalidName in C:\Users\bas\AppData\Local\Temp\.tmpZUBw8C\rattler\crates\rattler_conda_types\src\channel\mod.rs:391

--- failure enum_variant_missing: pub enum variant removed or renamed ---

Description:
A publicly-visible enum has at least one variant that is no longer available under its prior name. It may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.29.0/src/lints/enum_variant_missing.ron

Failed in:
  variant ParseMatchSpecError::InvalidNumberOfColons, previously in file C:\Users\bas\AppData\Local\Temp\.tmpiIwxAW\rattler_conda_types\src\match_spec\parse.rs:59
```

### ⚠️ `rattler_package_streaming` breaking changes

```
--- failure function_missing: pub fn removed or renamed ---

Description:
A publicly-visible function cannot be imported by its prior path. A `pub use` may have been removed, or the function itself may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.29.0/src/lints/function_missing.ron

Failed in:
  function rattler_package_streaming::read::extract_conda, previously in file C:\Users\bas\AppData\Local\Temp\.tmpiIwxAW\rattler_package_streaming\src\read.rs:47
```

### ⚠️ `rattler_networking` breaking changes

```
--- failure function_missing: pub fn removed or renamed ---

Description:
A publicly-visible function cannot be imported by its prior path. A `pub use` may have been removed, or the function itself may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.29.0/src/lints/function_missing.ron

Failed in:
  function rattler_networking::redact_known_secrets_from_url, previously in file C:\Users\bas\AppData\Local\Temp\.tmpiIwxAW\rattler_networking\src\redaction.rs:24

--- failure pub_module_level_const_missing: pub module-level const is missing ---

Description:
A public const is missing, renamed, or changed from const to static.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.29.0/src/lints/pub_module_level_const_missing.ron

Failed in:
  DEFAULT_REDACTION_STR in file C:\Users\bas\AppData\Local\Temp\.tmpiIwxAW\rattler_networking\src\redaction.rs:5

--- failure trait_missing: pub trait removed or renamed ---

Description:
A publicly-visible trait cannot be imported by its prior path. A `pub use` may have been removed, or the trait itself may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.29.0/src/lints/trait_missing.ron

Failed in:
  trait rattler_networking::Redact, previously in file C:\Users\bas\AppData\Local\Temp\.tmpiIwxAW\rattler_networking\src\redaction.rs:44
```

<details><summary><i><b>Changelog</b></i></summary><p>

## `rattler`
<blockquote>

##
[0.27.3](baszalmstra/rattler@rattler-v0.27.2...rattler-v0.27.3)
- 2024-08-02

### Other
- mark some crates 1.0
([#789](https://github.com/baszalmstra/rattler/pull/789))
</blockquote>

## `rattler_cache`
<blockquote>

##
[0.1.5](baszalmstra/rattler@rattler_cache-v0.1.4...rattler_cache-v0.1.5)
- 2024-08-02

### Other
- mark some crates 1.0
([#789](https://github.com/baszalmstra/rattler/pull/789))
</blockquote>

## `rattler_conda_types`
<blockquote>

##
[0.27.0](baszalmstra/rattler@rattler_conda_types-v0.26.3...rattler_conda_types-v0.27.0)
- 2024-08-02

### Fixed
- redact secrets in the `canonical_name` functions
([#801](https://github.com/baszalmstra/rattler/pull/801))
- make `base_url` of `Channel` always contain a trailing slash
([#800](https://github.com/baszalmstra/rattler/pull/800))
- parse channel in matchspec string
([#792](https://github.com/baszalmstra/rattler/pull/792))
- constraints on virtual packages were ignored
([#795](https://github.com/baszalmstra/rattler/pull/795))
- url parsing for namelessmatchspec and cleanup functions
([#790](https://github.com/baszalmstra/rattler/pull/790))

### Other
- mark some crates 1.0
([#789](https://github.com/baszalmstra/rattler/pull/789))
</blockquote>

## `rattler_package_streaming`
<blockquote>

##
[0.22.0](baszalmstra/rattler@rattler_package_streaming-v0.21.7...rattler_package_streaming-v0.22.0)
- 2024-08-02

### Fixed
- redact secrets in the `canonical_name` functions
([#801](https://github.com/baszalmstra/rattler/pull/801))
- Fallback to fully reading the package stream when downloading before
attempting decompression
([#797](https://github.com/baszalmstra/rattler/pull/797))

### Other
- mark some crates 1.0
([#789](https://github.com/baszalmstra/rattler/pull/789))
</blockquote>

## `rattler_networking`
<blockquote>

##
[0.21.0](baszalmstra/rattler@rattler_networking-v0.20.10...rattler_networking-v0.21.0)
- 2024-08-02

### Fixed
- redact secrets in the `canonical_name` functions
([#801](https://github.com/baszalmstra/rattler/pull/801))
</blockquote>

## `rattler_lock`
<blockquote>

##
[0.22.17](baszalmstra/rattler@rattler_lock-v0.22.16...rattler_lock-v0.22.17)
- 2024-08-02

### Other
- mark some crates 1.0
([#789](https://github.com/baszalmstra/rattler/pull/789))
</blockquote>

## `rattler_repodata_gateway`
<blockquote>

##
[0.21.4](baszalmstra/rattler@rattler_repodata_gateway-v0.21.3...rattler_repodata_gateway-v0.21.4)
- 2024-08-02

### Fixed
- redact secrets in the `canonical_name` functions
([#801](https://github.com/baszalmstra/rattler/pull/801))

### Other
- mark some crates 1.0
([#789](https://github.com/baszalmstra/rattler/pull/789))
</blockquote>

## `rattler_solve`
<blockquote>

##
[1.0.1](baszalmstra/rattler@rattler_solve-v1.0.0...rattler_solve-v1.0.1)
- 2024-08-02

### Fixed
- constraints on virtual packages were ignored
([#795](https://github.com/baszalmstra/rattler/pull/795))
</blockquote>

## `rattler_index`
<blockquote>

##
[0.19.22](baszalmstra/rattler@rattler_index-v0.19.21...rattler_index-v0.19.22)
- 2024-08-02

### Other
- mark some crates 1.0
([#789](https://github.com/baszalmstra/rattler/pull/789))
</blockquote>

## `rattler_shell`
<blockquote>

##
[0.21.4](baszalmstra/rattler@rattler_shell-v0.21.3...rattler_shell-v0.21.4)
- 2024-08-02

### Other
- updated the following local packages: rattler_conda_types
</blockquote>

## `rattler_virtual_packages`
<blockquote>

##
[1.0.1](baszalmstra/rattler@rattler_virtual_packages-v1.0.0...rattler_virtual_packages-v1.0.1)
- 2024-08-02

### Other
- updated the following local packages: rattler_conda_types
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/MarcoIeni/release-plz/).

Signed-off-by: Bas Zalmstra <bas@prefix.dev>
baszalmstra pushed a commit that referenced this pull request Nov 14, 2024
…938)

### Description
In #837 locking logic was added to
the download reporter to prevent multiple downloads.

This was clashing with the logic added in
#797 where, if we detect we're
trying to download a zip that has data descriptors, we fallback to doing
a full download and interrupt the streaming decompression download.

The issue comes from us not releasing the lock before attempting the
second download, hitting this line
https://github.com/conda/rattler/blob/8bb883536df52de46fd2c040228bb77d39c92607/crates/rattler_cache/src/package_cache/mod.rs#L439

As a fix, we report the previous download as completed before attempting
the second one! Have verified in my custom setup that this works.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possibly add resiliency towards conda zip archives that use data descriptors.
3 participants