-
-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute hashes from all index servers #1556
base: main
Are you sure you want to change the base?
Conversation
Get json from all index servers instead of just the first, then computes hashes on all remaining index links not already returned from servers.
for more information, see https://pre-commit.ci
This is a review for #1536 |
This would also fix #1135. But the current implementation won't work with index URLs ending in a slash, see #1669. @stefansjs do you plan to continue with the PR? I'm happy to help getting it ready. :) |
Yes I would love to. I noticed that the latest master is a non trivial merge. I haven't looked deeply into what needs to change. I would really appreciate and help/guidance. It would be great to get this into master. |
) | ||
return package_links | ||
|
||
def _get_project(self, ireq: InstallRequirement) -> Optional[Dict[str, Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyPIRepository._get_project
is unused
pypi_json = { | ||
link.comes_from: self._get_json_from_index(link) | ||
for link in all_package_links | ||
} | ||
pypi_hashes = { | ||
url: self._get_hash_from_json(json_resp, ireq) | ||
for url, json_resp in pypi_json.items() | ||
if json_resp | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why dicts and not generators? This would only deduplicate something if someone used the same index multiple times. But even in this case _get_json_from_index
(which takes the most time here due to network IO) would be called multiple times.
return set( | ||
itertools.chain.from_iterable( | ||
self._get_hashes_for_link(candidate.link, pypi_hashes) | ||
for candidate in matching_candidates | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return set( | |
itertools.chain.from_iterable( | |
self._get_hashes_for_link(candidate.link, pypi_hashes) | |
for candidate in matching_candidates | |
) | |
) | |
return { | |
file_hash | |
for candidate in matching_candidates | |
for file_hash in self._get_hashes_for_link(candidate.link, pypi_hashes) | |
} |
This was a false assumption after skimming over the code and noticing the |
#1723 has been merged, @stefansjs do you want to rebase this per #1556 (comment)? |
Fixes #1536.
When pip-compile encounters one index server which supports the json API it does not go on to compute hashes on files from "simple" index servers. This change fetches hashes from all index servers, and computes hashes on all files that don't have one provided by a json API
Contributor checklist
Maintainer checklist
backwards incompatible
,feature
,enhancement
,deprecation
,bug
,dependency
,docs
orskip-changelog
as they determine changelog listing.