Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retries for package installations #1994

Open
pavelzw opened this issue Sep 6, 2024 · 10 comments
Open

Add retries for package installations #1994

pavelzw opened this issue Sep 6, 2024 · 10 comments
Labels
✨ enhancement Feature request

Comments

@pavelzw
Copy link
Contributor

pavelzw commented Sep 6, 2024

Problem description

In fragile networks with higher package loss, we often run into issues like the following:

× failed to fetch msgpack-python-1.0.8-py311h52f7536_0.conda
    ├─▶ error sending request for url (https://my.conda.mirror.com/
    │   artifactory/conda-forge/linux-64/msgpack-python-1.0.8-
    │   py311h52f7536_0.conda)
    ├─▶ client error (SendRequest)
    ├─▶ connection error
    ╰─▶ bytes remaining on stream

or

    × failed to fetch custom.package-1.0.0-1-.conda
    ├─▶ error sending request for url (https://my.conda.mirror.com/
    │   artifactory/conda-custom-channel/noarch/custom.package-1.0.0-1-
    │   .conda)
    ├─▶ client error (SendRequest)
    ├─▶ connection error
    ╰─▶ peer closed connection without sending TLS close_notify: https://
        docs.rs/rustls/latest/rustls/manual/_03_howto/index.html#unexpected-eof

This is most likely due to networking issues on the first download.
It would be nice if pixi tried multiple times and only fail after the 5th time or so. Maybe also configurable in the global config?

@pavelzw pavelzw added the ✨ enhancement Feature request label Sep 6, 2024
@wolfv
Copy link
Member

wolfv commented Sep 6, 2024

To be sure, this is upon pixi install or something along those lines?
In general, we used to have a retry client. It's possible that it was lost during refactor.

The retry client does only retry on certain network errors (e.g. 50x I think). It could be that certain network errors aren't retried because it assumes something bigger is faulty :D

@pavelzw
Copy link
Contributor Author

pavelzw commented Sep 6, 2024

Yes, this is happening during pixi install --locked

@pavelzw
Copy link
Contributor Author

pavelzw commented Sep 6, 2024

Is there any way to see retroactively if pixi retried or not? Maybe it might makes sense if pixi wrote a warning to stderr 🤔

@wolfv
Copy link
Member

wolfv commented Sep 6, 2024

This is the one we're using: https://docs.rs/reqwest-retry/latest/reqwest_retry/

I do think we can customize the function in order to add logging.

@pavelzw
Copy link
Contributor Author

pavelzw commented Sep 10, 2024

From what I can see, pixi install should do retries on some error types.
What's interesting is that in all my failures, I haven't seen WARN failed to download and extract ... in my logs (which should be there in the default log level)

https://github.com/conda/rattler/blob/4885895f8af18321a50b8c376c5d42231a3d3743/crates/rattler_cache/src/package_cache/mod.rs#L241

So it seems to me that it doesn't retry for the issues that I have 🤔

@pavelzw
Copy link
Contributor Author

pavelzw commented Sep 10, 2024

@baszalmstra was conda/rattler#837 in any way related to this issue?

@pavelzw
Copy link
Contributor Author

pavelzw commented Sep 10, 2024

we experience these issues with pixi 0.28.0, haven't tried newer pixi versions yet... i'll check 0.29.0 in the coming days

@pavelzw
Copy link
Contributor Author

pavelzw commented Sep 26, 2024

So i tried it out with 0.29.0 and the errors are still there unfortunately.

image
image
image

Since i didn't see any warning messages (which should be included in the default log level, right?) I'm assuming that pixi is still not retrying on these errors 🤔
https://github.com/conda/rattler/blob/1a463eb5bd17eb6d5a7df1622e258aac09d982e0/crates/rattler_cache/src/package_cache/mod.rs#L241

@pavelzw
Copy link
Contributor Author

pavelzw commented Oct 3, 2024

i got some debug logs with -vv where the issue occurs:

...
   WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/linux-64/libevent-2.1.12-hf998b51_1.conda to /home/runner/.cache/rattler/cache/pkgs/libevent-2.1.12-hf998b51_1: an io error occurred. Retry #1, Sleeping 1.642122237s until the next attempt...
   WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/linux-64/pynacl-1.5.0-py312h98912ed_3.conda to /home/runner/.cache/rattler/cache/pkgs/pynacl-1.5.0-py312h98912ed_3: an io error occurred. Retry #1, Sleeping 777.906958ms until the next attempt...
   WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-14.1.0-hc0a3c3a_1.conda to /home/runner/.cache/rattler/cache/pkgs/libstdcxx-14.1.0-hc0a3c3a_1: an io error occurred. Retry #1, Sleeping 1.765293491s until the next attempt...
   WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/linux-64/tornado-6.4.1-py312h66e93f0_1.conda to /home/runner/.cache/rattler/cache/pkgs/tornado-6.4.1-py312h66e93f0_1: an io error occurred. Retry #1, Sleeping 177.068926ms until the next attempt...
   WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/noarch/hiplot-0.1.33-pyhd8ed1ab_0.tar.bz2 to /home/runner/.cache/rattler/cache/pkgs/hiplot-0.1.33-pyhd8ed1ab_0: an io error occurred. Retry #1, Sleeping 1.763704962s until the next attempt...
   WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/noarch/pip-24.2-pyh8b19718_1.conda to /home/runner/.cache/rattler/cache/pkgs/pip-24.2-pyh8b19718_1: an io error occurred. Retry #1, Sleeping 1.693506324s until the next attempt...
   WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/linux-64/libtiff-4.6.0-h46a8edc_4.conda to /home/runner/.cache/rattler/cache/pkgs/libtiff-4.6.0-h46a8edc_4: an io error occurred. Retry #1, Sleeping 1.720698875s until the next attempt...
   WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/linux-64/graphviz-12.0.0-hba01fac_0.conda to /home/runner/.cache/rattler/cache/pkgs/graphviz-12.0.0-hba01fac_0: an io error occurred. Retry #1, Sleeping 1.004458089s until the next attempt...
   WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/linux-64/pixman-0.43.2-h59595ed_0.conda to /home/runner/.cache/rattler/cache/pkgs/pixman-0.43.2-h59595ed_0: an io error occurred. Retry #1, Sleeping 55.454116ms until the next attempt...
...

the whole ci run failed within 10s; what do you think to do non-concurrent retries?

@pavelzw
Copy link
Contributor Author

pavelzw commented Oct 3, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ enhancement Feature request
Projects
None yet
Development

No branches or pull requests

2 participants