-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perform 1-3 HTTP requests for each wheel using fast-deps #12208
base: main
Are you sure you want to change the base?
Conversation
44418d9
to
182b351
Compare
@dholth: got this working again; on my connection to pypi, this new code (adapted from your solution) is much much faster than multiple small range requests. A couple of things that popped up:
# download Keras wheel
> curl -L -O 'https://files.pythonhosted.org/packages/44/e1/dc0757b20b56c980b5553c1b5c4c32d378c7055ab7bfa92006801ad359ab/Keras-2.4.3-py2.py3-none-any.whl'
# this is normal; the dist-info is at the end
> unzip -l Keras-2.4.3-py2.py3-none-any.whl | tail -n10
143 2020-06-24 22:38 keras/utils/vis_utils.py
28 2020-06-24 22:38 keras/wrappers/__init__.py
117 2020-06-24 22:38 keras/wrappers/scikit_learn.py
1616 2020-06-24 22:40 Keras-2.4.3.dist-info/LICENSE
1496 2020-06-24 22:40 Keras-2.4.3.dist-info/METADATA
110 2020-06-24 22:40 Keras-2.4.3.dist-info/WHEEL
11 2020-06-24 22:40 Keras-2.4.3.dist-info/top_level.txt
6766 2020-06-24 22:40 Keras-2.4.3.dist-info/RECORD
--------- -------
73432 83 files
# download tensorflow-gpu wheel
> curl -L -O 'https://files.pythonhosted.org/packages/80/4d/3a008dc31225768318e7ba0f7f95aa4677b0936805be40e37036b7755d62/tensorflow_gpu-2.5.3-cp38-cp38-manylinux2010_x86_64.whl'
# hell
> unzip -l tensorflow_gpu-2.5.3-cp38-cp38-manylinux2010_x86_64.whl | head -n10
Archive: tensorflow_gpu-2.5.3-cp38-cp38-manylinux2010_x86_64.whl
Length Date Time Name
--------- ---------- ----- ----
111 2022-01-30 11:38 tensorflow_gpu-2.5.3.dist-info/WHEEL
11 2022-01-30 17:16 tensorflow_gpu-2.5.3.dist-info/top_level.txt
2810 2022-01-30 17:17 tensorflow_gpu-2.5.3.dist-info/METADATA
550 2022-01-30 17:16 tensorflow_gpu-2.5.3.dist-info/entry_points.txt
1046068 2022-01-30 11:38 tensorflow_gpu-2.5.3.dist-info/RECORD
13777 2022-01-30 17:17 tensorflow_gpu-2.5.3.dist-info/LICENSE
25049 2022-01-30 17:16 tensorflow/__init__.py So this change had to expand your approach to cover the case of dist-info dirs placed at the very start of a zip file. I'm assuming this was done intentionally by google in order to do zip file hacks like we're trying to do, but from the front of the file. Regardless, this also works now. |
33f2431
to
49ad8c5
Compare
cc @ewdurbin: are you familiar with why pypi might be failing to accept negative byte range requests (of the form |
Still marking this as draft until we get some test cases going, which will have to be sometime next week. |
|
Thanks so much! |
ec80945
to
a80dc04
Compare
4a03cc8
to
35ae8bc
Compare
ec4f803
to
e024a28
Compare
12c78b0
to
ac6cda5
Compare
The macOS runners cancelled themselves again, but I'm hoping they should work this time. |
@notatallshaw @pradyunsg: macOS runners have all appeared to cancel themselves again at 95% (https://github.com/pypa/pip/actions/runs/10046709654/job/27766793866?pr=12208, saying "Error: The operation was canceled."). I'm not quite sure how to debug this? |
That looks like it failed because the CI timed out. I'd suggest looking at the logs to find out which tests started but never finished. You'll be able to tell by checking which ones don't have a status reported about them within the raw logs. |
3f9d242
to
0d72998
Compare
So the tests that the logs fail at (https://github.com/pypa/pip/actions/runs/10046709654/job/27766794821?pr=12208) don't really seem to make sense; e.g. the 3.13 osx shard last displays However, I realized that this may well be about threaded server shutdown behavior which may well differ across OS, so in 0d72998 I tried setting |
- handle short files `416`; prefetch entire dist-info - lazy_wheel: translate BadZipfile to InvalidWheel - handle 501 on negative bytes range from pypi - catch UnsupportedWheel - make lazy wheel work against tensorflow-gpu - link to pypi discussion on negative byte ranges - check for case when server doesn't support byte ranges at all - remove _check_zip() method since we do that in prefetch_dist_info() now - clean up error handling code and don't warn when negative ranges fail - remove TODO, as Cache-Control: no-cache is the correct behavior - rearrange explanatory comments - specify how we handle 200 OK and interpret 405 as no range requests - rename LazyZipOverHTTP to LazyWheelOverHTTP because of the assumption of the structure of *.dist-info/ - stream the initial chunk request too, since it's 1MB - add note about fast-deps being on by default - add testing for request limiting - fix download metadata testing - factor out the laziness from the wheel-specific logic - factor out LazyRemoteResource from FixedSizeLazyResource - add metadata_first=True arg to make_wheel - reduce initial chunk size to 10K - remove hardcoded compilewheel and instead generate wheels - catch new requests decoding error - support NegativeRangeOverflowing - support SneakilyCoerceNegativeRange - time out test server thread joining within 3 seconds Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> - make FakePackageSource to abstract over generated and hardcoded whls - ensure InvalidWheel retains context from inner errors Co-authored-by: Randy Döring <30527984+radoering@users.noreply.github.com> - add link to perf evaluation from radoering
5d2ffcd
to
b7c6abc
Compare
Ok, that didn't work, and I can't immediately reproduce the error on my mac laptop. I've just tried reverting all of the changes to the |
b7c6abc
to
e4b2aaf
Compare
ideally, this should avoid collisions. no clue if this matters at all
Ok, I'm trying to skip unit tests on macos shards and seeing if they complete. That way we at least nail down that it's probably a test that's causing a hang and not some separate weird CI mechanic. |
Ugh!!!! So that did pass, which means it is definitely a specific test (which is actually much less confusing, but frustrating since I can't seem to reproduce it). Will try running the exact command in CI on my mac laptop now. |
Wow, that was quick! It appears to be |
The key was using |
Hm, it's actually probably |
e20c892
to
b06d73b
Compare
Ok, making a few fixtures use |
|
yes!!!!!!!! omg praise |
cc @pradyunsg @notatallshaw: no rush at all, but CI passes extremely quickly now and the fix was very simple. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a partial review - I haven't looked at the changes to lazy_wheel.py
itself yet. They are much more extensive, and I've not had time to review the context of all this. It's been a long while since I last looked at lazy_wheel, and given that PyPI now serves package metadata separately (and other indexes should do the same), I'd like to review how important the whole lazy_wheel code path even is these days, before spending too much time diving into the code itself.
If you have any current data on how often this code path gets hit in typical usage of pip, and how much of a speedup it provides, then that would be very useful. Not microbenchmarks showing that the code is x% faster than it was before, but real-world figures like how much does this improve installing Jupyter, or Django, or big packages with complex dependency trees such as Apache Airflow or Home Assistant?
@@ -25,12 +25,12 @@ | |||
} | |||
|
|||
|
|||
@pytest.fixture | |||
@pytest.fixture(scope="session") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the change in scope here (and in mypy_whl_no_range
)?
If this is an unrelated change, it should go in its own PR. If it's related, it could do with a comment explaining why it's needed.
if src_fp := source.fp: | ||
src_name = getattr(src_fp, "name", None) | ||
logger.debug("extracting entry '%s' from zip '%s'", path, src_name) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it's debugging output only indirectly related to this change. While it's not a bad change, it should probably be separate from this PR. Also, printing a raw None
when the source doesn't have a name attribute seems unfriendly in user-facing output.
link.netloc, | ||
str(e), | ||
) | ||
self._domains_without_http_range.add(link.netloc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the question here is, which is faster? To not use lazy wheel on sites where a range request ever failed, or to check every time and use lazy wheel whenever possible. A comment here explaining the trade-off and why you made the choice you did would be useful.
(I haven't looked at the three linked PRs, as IMO this PR should stand alone, so "this check will go away when further PRs are merged" isn't a reason to not document what's going on here).
@@ -362,34 +360,46 @@ def make_wheel( | |||
:param entry_points: | |||
:param record: if provided and None, then no RECORD file is generated; | |||
else if a string then sets the content of the RECORD file | |||
:param metadata_first: Put the .dist-info metadata at the front of the zip file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wheel spec explicitly states that archivers should put the wheel metadata at the end of the archive. We absolutely should not default to True
here, and I question why we should make this configurable at all. If it's to test edge cases where pip encounters wheels with the metadata in a non-recommended place, I'd prefer to see the main test suite unchanged, but separate, well-documented tests that explicitly create and test such wheels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this should not be configurable.
|
||
def __str__(self) -> str: | ||
return f"Wheel '{self.name}' located at {self.location} is invalid." | ||
suffix = f" ({self.context})" if self.context else "." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suffix = f" ({self.context})" if self.context else "." | |
suffix = f" ({self.context})." if self.context else "." |
(or better, keep the trailing full stop in the return statement rather than here, but I don't think github will let me add 2-line suggestions...)
Hiya! I don't have any great insight into the finer details, I'm just a user with > 30minute resolve excited for any performance improvements. Happy to help with any testing grunt work. @ b06d73b without fast-deps(cache cleared between runs)
@ b06d73b with fast-deps(cache cleared between runs)
So fast-deps appears slower against PyPI in this naive test. I admit I'm not sure what to make of these results or if I'm testing the right hing. My expectations is that fast-deps is only expected to be beneficial against indexes that do not support PEP 658+backfill (aka not PyPI) and would otherwise be no different in performance. Again I'm just wearing my user hat here, not sure if I'm understanding the case exactly right, and happy to run other tests if I can help. ( |
It would be a big help for me if you can you open a new issue as something like "Slow resolver performance" and give a reproducible example. I will profile it and use it as a target for improved resolution, which once pip vendors a new version of resolvelib I will be working on hopefully very big improvements to these kinds of cases. |
This needs to be tested with slow round-trip times and/or slow bandwidth. Sometimes a user who lives next to a pypi server (or an alternate index with full range request support) may not see the full story. Is there an easy way to simulate those conditions? |
There are lots of HTTP proxies available that allow you to add delay to responses/requests, .e.g https://github.com/ruwanta/delaying-proxy, https://github.com/KevCui/mitm-scripts/blob/master/mitm-delay-request.py, https://github.com/chimurai/http-proxy-middleware/blob/master/recipes/delay.md |
I reran the prior `time python3.10 -m pip install --report test.json --cache-dir=.cache-pr12208-fast --use-feature=fast-deps --dry-run --ignore-installed 'acryl-datahub[all]==0.13.2.3' tests, this time interleaving the tests. I think it confirms that variable network bandwidth makes this difficult to measure. Or perhaps that "download a bunch of stuff from PyPI" isn't the right framing. My apologies if the driv-by testing was not helpful.
|
Continued motivation for
fast-deps
While PEP 658 is the standards-compliant solution and metadata from there is already preferred when available,
--use-feature=fast-deps
avoids downloading wheels against--find-links
repos and any pypi index not supporting PEP 658 yet. Most non-pypi indices will be either of these, because it's very easy to expose those to pip with a simple file server, so improvingfast-deps
(and turning it on by default) is necessary to extend the benefits from the recent metadata resolve work to most users hosting their own index, especially corporations running pip in their internal CI.Problem
--use-feature=fast-deps
currently takes a while to perform multiple small range requests against pypi wheels which do not have PEP 658 metadata available, such astensorflow-gpu==2.5.3
. This is likely because of delays built into the pypi file host when responding to GET requests for very large files, to reduce the risk of a denial of service. This is pretty reasonable behavior on pypi's part, so we would like to minimize the number of range requests made, as described by @McSinyx in followup work at The fast-deps feature is not a fast way to obtain dependencies #8670.Solution
@dholth realized two optimizations we could perform:
Range
header accepts a negative valuebytes=-N
, which acts like negative slice indices in Python, returning a chunk from the end of the file. This avoids aHEAD
request to get the file length.*.dist-info/
directory is all that's going to be extracted from our lazy wheels, and this directory's contents form a contiguous substring of the total file content. After extracting the central directory from the end of the file with our first request, we can perform a single range request to populate the contents of every file in the*.dist-info/
directory in the lazy wheel, so no further HTTP requests need to be made to continue the resolution.Additional Fixes
Two additional issues have popped up since #11447:
tensorflow-gpu==2.5.3
have begun to put their own*.dist-info/
directories at the beginning of the zip file, possibly as a result of generating them from other build tools which sort zip file entries lexicographically.Both of these are considered to be reasonable behavior, and this change handles both cases gracefully.
Result
This halves the time to resolve dependencies from the below requirements for
pip install --dry-run --report
when the fixes of #12186 are merged:As with PEP 658 metadata, in pathological cases which involve lots of backtracking, this will avoid downloading more than a single version of each wheel even for
pip download
orpip install
without--dry-run
. If--use-feature=fast-deps
is enabled by default, this will also significantly improve performance of all resolves involvingtensorflow-gpu==2.5.3
and other wheels which do not have PEP 658 metadata available on pypi, or against indices which do not serve PEP 658 metadata. I therefore propose turning onfast-deps
by default, either in this PR or in #12186 which will be merged after this one.