Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: add new "lazy-wheel" config option #8815

Merged
merged 9 commits into from
Jan 20, 2024

Conversation

radoering
Copy link
Member

Pull Request Check List

  • Added tests for changed code.
  • Updated documentation for changed code.

With #5509, this is only relevant for indexes that do not support PEP 658 - thus, probably for most indexes except for PyPI - or old wheels without PEP 658 metadata.

This PR adds a config option solver.lazy-wheel. If active and a server supports HTTP range requests, we do not have to download entire wheels to fetch metadata but can just download the METADATA file from the remote wheels with a couple of range requests. Especially with slow network connections this setting can speed up dependency resolution significantly. (If the cache has already been filled or the server does not support HTTP range requests, this setting makes no difference.)

This PR is based on pypa/pip#12208, which I learned about at PackagingCon, so all credits to the authors of this pip PR. 👏 I just had to make the interface fit for Poetry, write some tests and fix some special cases I encountered.

I measured the time of poetry lock of a largish project, once from a machine with a slow network connection (1.5 MB/s, 40 ms ping) and once from a machine with a fast network connection (95 MB/s, 1 ms ping) to the package index.

operation slow network fast network
cold cache without PR 825 s 75 s
cold cache with PR 190 s 60 s
warm cache without/with PR 24 s 16 s

What lazy-wheel basically does is:

  • try a range request with a negative offset in order to fetch the central directory of the wheel zip file
  • if negative range requests are not supported, send a HEAD request to get the size of the wheel and afterwards send a normal range request to fetch the last bytes of the wheel
  • if the central directory is quite large we may need an additional range request to fetch the rest of the central directory
  • look for the position of the METADATA file in the central directory and download it in one range request

In order to not try range requests again and again in vain, we keep track if a domain supports range requests at all and especially with negative offsets.

In contrast to the pip PR, we only fetch the METADATA file instead of the entire .dist-info directory. On the machine with a slow network connection this made a difference of 190 s to 330 s.

Further, I added some handling for special cases I encountered while trying range requests with different servers:

Server Accept-Ranges Negative Offset Negative Offset > entire wheel
pypi.org bytes 501 Unsupported client range 501 Unsupported client range
Internal devpi server 1 bytes1 206 Partial content ⚠️ 200 OK, entire wheel
Internal devpi server 2 - 200 OK, entire wheel 200 OK, entire wheel
Internal Artifactory bytes 206 Partial content, ⚠️ data from start of file 206 Partial content, entire wheel
Internal GitLab bytes 206 Partial content 206 Partial content, entire wheel
piwheels.org bytes 206 Partial content 206 Partial content, entire wheel
download.pytorch.org bytes 206 Partial content 206 Partial content, entire wheel

1 "Devpi server 1" hosts several indexes including a PyPI mirror. I noticed that range requests were supported for wheels that have been downloaded before, but were not supported for wheels that have not been downloaded before. In other words, a server might supports range requests for some but not all wheels. (This is handled in HTTPRepository.)

The behavior of the different servers can be interpreted as follows:

  • The internal GitLab, piwheels.org and download.pytorch.org behave as you would expect from a server that supports range requests with negative offsets.
  • PyPI behaves as you would expect from a server that supports range requests but not negative offsets.
  • The internal devpi server 2 behaves as you would expect from a server that does not support range requests.
  • The internal devpi server 1 has a quirk when the negative offset is greater than the entire wheel. It does return 200 OK instead of 206 Partial content, so you have to inspect the Accept-Ranges header to determine if it supports range requests or not.
  • The internal Artifactory has the most unexpected behavior. It does not support negative offsets but interprets it as positive. E.g. -100 is handled as 0-100.

In case you were wondering at the beginning why we should introduce a config option and not just always use range requests if possible, it's probably clearer now: Although various servers were tested and special cases were implemented, it's not unlikely that there are still servers with unexpected behavior that is not handled well.

@radoering radoering added the impact/docs Contains or requires documentation changes label Dec 23, 2023
Copy link

github-actions bot commented Dec 23, 2023

Deploy preview for website ready!

✅ Preview
https://website-5yh7exrea-python-poetry.vercel.app

Built with commit 48e0b57.
This pull request is being automatically deployed with vercel-action

src/poetry/inspection/lazy_wheel.py Outdated Show resolved Hide resolved
src/poetry/inspection/lazy_wheel.py Outdated Show resolved Hide resolved
src/poetry/inspection/lazy_wheel.py Outdated Show resolved Hide resolved
src/poetry/inspection/lazy_wheel.py Outdated Show resolved Hide resolved
tests/inspection/test_lazy_wheel.py Outdated Show resolved Hide resolved
tests/inspection/test_lazy_wheel.py Outdated Show resolved Hide resolved
tests/inspection/test_lazy_wheel.py Outdated Show resolved Hide resolved
tests/inspection/test_lazy_wheel.py Outdated Show resolved Hide resolved
tests/inspection/test_lazy_wheel.py Outdated Show resolved Hide resolved
tests/repositories/test_http_repository.py Outdated Show resolved Hide resolved
src/poetry/console/commands/config.py Outdated Show resolved Hide resolved
@radoering radoering force-pushed the lazy-wheel branch 2 times, most recently from 7e35c2b to be15429 Compare January 13, 2024 06:15
If active and a server supports range requests, we do not have to download entire wheels to fetch metadata but can just download the METADATA file from the remote wheel with a couple of range requests.
… `from_wheel_metadata` to `from_metadata`
Secrus
Secrus previously approved these changes Jan 20, 2024
Copy link
Member

@Secrus Secrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one small nitpick left.

src/poetry/inspection/lazy_wheel.py Outdated Show resolved Hide resolved
@radoering radoering merged commit 50a7723 into python-poetry:master Jan 20, 2024
33 checks passed
Copy link

github-actions bot commented Mar 3, 2024

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
impact/docs Contains or requires documentation changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants