-
-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get hash from link #1670
base: main
Are you sure you want to change the base?
Get hash from link #1670
Conversation
08301b7
to
4a5185a
Compare
4a5185a
to
464f996
Compare
We've used that now for a while in production with AWS CodeArtifact and it did not cause any problems. |
I locally patched pip-tools with this change and confirmed it eliminated a lot of downloading and hashing work when using pip-compile with AWS CodeArtifact. I still noticed there is a bit of extra time spent attempting to use the json api before falling back to the links. I wonder if we could opt out of the json api to skip making all those unnecessary api calls, or perhaps switch the order around to first use the hash in the link? Regardless, this additional small optimization would be tiny compared to the gains achieved by the change as-is. Would definitely like to see this in an upcoming pip-tools release! |
LGTM. Can it be rebased with the latest changes? @atugushev any thoughts on this? |
b78cafd
to
c36c51e
Compare
Done; but wasn't really necessary 🤷 |
Co-authored-by: Albert Tugushev <albert@tugushev.ru>
9fe198c
to
1e4aa42
Compare
@@ -389,6 +389,9 @@ def _get_matching_candidates( | |||
return candidates_by_version[matching_versions[0]] | |||
|
|||
def _get_file_hash(self, link: Link) -> str: | |||
if link.hash_name == FAVORITE_HASH: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried this with pip-22.2 and it works fine:
# requirements.in
--index-url https://m.devpi.net/root/pypi/+simple/
numpy
root@py311:/# pip --version
pip 22.2 from /usr/local/lib/python3.11/site-packages/pip (python 3.11)
root@py311:/# pip-compile --generate-hashes -v
Using indexes:
https://m.devpi.net/root/pypi/+simple/
ROUND 1
Current constraints:
numpy (from -r requirements.in (line 3))
Finding the best candidates:
found candidate numpy==1.23.5 (constraint was <any>)
Finding secondary dependencies:
numpy==1.23.5 requires -
------------------------------------------------------------
Result of round 1: stable, done
Generating hashes:
numpy
Getting hash from link numpy-1.23.5-cp311-cp311-macosx_11_0_arm64.whl
Getting hash from link numpy-1.23.5.tar.gz
Getting hash from link numpy-1.23.5-cp39-cp39-win_amd64.whl
Getting hash from link numpy-1.23.5-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
...
But with pip-22.3 it doesn't work:
root@py311:/# pip --version
pip 22.3.1 from /usr/local/lib/python3.11/site-packages/pip (python 3.11)
root@py311:/# pip-compile --generate-hashes -v
Using indexes:
https://m.devpi.net/root/pypi/+simple/
ROUND 1
Current constraints:
numpy (from -r requirements.in (line 3))
Finding the best candidates:
found candidate numpy==1.23.5 (constraint was <any>)
Finding secondary dependencies:
numpy==1.23.5 requires -
------------------------------------------------------------
Result of round 1: stable, done
Generating hashes:
numpy
Hashing numpy-1.23.5-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
|██ | 6%^C
Using git bisect
I've tracked down the problematic commit pypa/pip@bad03ef from pypa/pip#11111
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc'ing @cosmicexplorer for the insights. What could go wrong with Link.hash
in 22.3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a curious onlooker (not someone intimately familiar with pip internals), I think the above behaviour is coming from this part of the change: pypa/pip@bad03ef#diff-3919ca53335487395177edeffff8b60bd360f1ad18c95593287f6814b9ecefb8R403-L209.
In pip 22.3 and pip 22.2, I think the internal Link._hashes
dict contains the "sha256" key but as of 22.3, that inner dict is not checked for the hash in the Link.hash_name
property.
I think in the simple api, in the json form the hashes will come from the hashes dict, whereas in the http form it will be in the url fragment, and I think your above test (with index https://m.devpi.net/root/pypi/+simple/) is returning the json form of the simple api. Only for the latter will HashLink (as of pip 22.3) be populated with a hash/hash_name, so I think there is a regression in functionality for the simple api + json response introduced by pypa/pip#11111.
I think the check here could be if FAVORITE_HASH in link._hashes
and that would work for both pip 22.2 and pip 22.3. Accessing the value would be link._hashes[FAVORITE_HASH]
. Having to use that internal dict seems suboptimal :/
FWIW, the current implementation is still working on pip 22.3 with aws codeartifact as the index, which seems to be going down the path of responding not as json but as http and using Link.from_element
rather than Link.from_json
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into it and update the PR to support both pip < 22.3 and >= 22.3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pypa/pip@0233bf2 once again changes the internal logic for retrieving the hash_name which should restore the functionality as before.
Could we keep the change as-is and note that there is a still an inefficiency for affected pip versions and artifact stores that use the json form of the simple api? IMO, this is no worse than what we have today in this one specific exception and better in all other scenarios.
Alternatively we could use a workaround for the affected pip versions to have parity across all pip versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW this comment seems to confirm that hashes were missing from json pypa/pip#11692 (comment) and it was an issue in more ways for pip than just this proposed change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any update on this / interest in reviving it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, haven't had time to work on this. Our automation basically uses this branch therefore the pressure to get it upstream is low :/
I'll try to find some time, but I'm also happy if someone else wants to take over.
If the JSON API isn't available (eg. for devpi, Artifactory, AWS CodeArtifact), get the hash of a wheel/archive from the URL if available instead of hash.
This resolves #1135
Contributor checklist
Maintainer checklist
backwards incompatible
,feature
,enhancement
,deprecation
,bug
,dependency
,docs
orskip-changelog
as they determine changelog listing.