-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea: When backtracking prefer packages with METADATA file available #12035
Comments
I'm not sure that's particularly valuable. pypi.org will eventually backfill this metadata across all releases and, IIRC, we need to decide on the preference order before this information is available. We could prefer candidates ,but there's a well defined ordering for trying those so I don't think this is a tractable approach. 😅 |
I guess this is where I am a little confused, would PyPi be able to extract metadata from this release: https://pypi.org/project/awscli/0.8.0/#files? If so how? Pip seemingly can't.
If this approach would be helpful I think of possible workarounds, e.g. by making customizable what is to |
No, because it's sdist-only. But all wheels will ultimately have a metadata file, once the backfilling is complete. And we select wheels as candidates over sdists whenever possible already. Or are you suggesting that pip would install an older version with a wheel (and hence with a metadata file) in preference to a newer version with only a sdist? Because that would be a significant breaking change. It's basically the same as making |
But I don't think Pip does prefers one projects release over another based on whether it has a wheel or not, right?
No, I am suggesting when backtracking you need to make a choice on which project to backtrack on, in #12028 for example Pip needs to decide whether to backtrack on This will save build time as and download time as releases with sdist only will not be preferred. And I believe in general will more likely find a solution than not.
No, this would not be a breaking change, it would only affect the preference of how to find a solution when backtracking, it would not choose older versions of the same project over newer versions of the same project. |
Ah, I get you now - thanks for the clarification. Given backfilling, maybe we just prefer projects with (compatible) wheels over sdist-only ones? I'm handwaving a lot here as it's been a while since I reviewed that code and I need to refresh my memory on candidates vs projects vs releases vs... The main point I'm making is that as PyPI will ultimately backfill, let's plan for the future and think in terms of wheels rather than metadata files1. Even so, I don't know how effective such a preference might be - resolvelib's Footnotes |
Yeah I am talking about I will try and come up with a PoC PR once PyPi has backfilled, I may be wrong in thinking it's possible to get this information in time without wildly rearchitecting Pip. |
I was thinking a bit about this and I beleive it would be possible to do this in The problem now is by inspecting My idea is that if you were only interested in information that could be determined from the sdist/wheel filename or simple API then you would not actually need to download the package metadata and inspect it. If possible, I think this could be used in other parts of the resolution process to speed things up without worrying about giant wheel downloads or build failures. |
Now that PyPI has backfilled most (all?) metadata in past wheels, would this simply become preferring projects with wheels over those only publish sdists? I guess that’s still somewhat useful, but probably niche at best. |
I agree - the "downloading lots of large files" issue is solved by separate METADATA files, so unless there's evidence that this approach solves some other real-world issue, I'm not sure it's worth it. |
Yes, this is still very worthwhile, essentially the idea is that all other things equal metadata files are still less costly, large downloads was just one part of that, in fact I was convinced by @pfmoore that the distinction should be on wheels/sdists not metadata files/no metadata. When comparing two candidates, one an sdist, and the other a metadata file, sdists are significantly more expensive:
In the original example I gave, this likely would have saved the users resolution, which broke because it failed on trying to build an sdist when it could have chose a different project that had wheels: #12028 (comment). And it's not just pip, this approach would make sense generally for dependency resolution algorithms for python packages, uv suffers from different examples where it fails during backtracking on old sdists when it could have checked wheels first: astral-sh/uv#1560 To be clear though, this is not the same as "prefer binary" option. this does not find older version of the same package, thse candidate choices are always between different projects, and choosing one or the other candidate is otherwise consisered relatively equal, sdists are definetly most costly over metadata/wheels. |
Thinking about this a bit, preferring backtracking toward wheels may be a good idea regardless of the metadata situation. Versions without one are likely be ancient and “not actually what you want” in practice (with a few exceptions e.g. pyspark). But this is armchair deduction and requires real-world evidence. |
Well this is exactly what happens with both examples I linked to, one for pip, and one for uv (and uv's resolution isn't so significantly different that the same behavior can't happen with pip), so it would be easy to confirm with a real example if someone made a PR. My plan, if no one else attempts this first, is to make a PR to implement this within the current framework of resolvelib / pip, it would iterate candidates in But I think there are more impactful changes to the resolution to focus on first, so I don't know when I'll try tackling this. |
What's the problem this feature will solve?
A common user complaint when heavily backtracking is downloading lots of large files
Describe the solution you'd like
Currently PEP 658 has just been rolled out on PyPi, when it starts backfilling a lot of packages will hopefully have an associated METADATA file.
To some extent Pip could prefer those packages when backtracking, less important than whether the package is part of a conflict, but perhaps more important than most other backtracking preference criteria.
It may have a few positive benefits:
If we look at an example like #12028 (comment), then hopefully either error is arrived at much quicker or a requirements solution is found because it is preferred not to backtrack to awscli-0.8.0 because metadata can not be extracted.
Alternative Solutions
Do nothing, still likely to get lots of benefits from METADATA files being available
Additional context
I'm making a broad assumption that packages with accessible METADATA files are better than packages without for dependency resolution, it could be this is not true or at least not true in specific cases.
Code of Conduct
The text was updated successfully, but these errors were encountered: