-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pull preparer logic out of the resolver to consume metadata-only dists in commands #12186
base: main
Are you sure you want to change the base?
pull preparer logic out of the resolver to consume metadata-only dists in commands #12186
Conversation
720b0c2
to
6f4da4c
Compare
6f4da4c
to
5d0f24c
Compare
q @pradyunsg @sbidoul @ anyone else: as I identify in the OP, this PR is less of a "bugfix", since we had not agreed on the exact behavior to commit to, and can almost be viewed more as new functionality, where we are now augmenting the guarantee of |
Question, would this be easier if we remove |
I don't believe so. As I went on at depth about in #11478 (comment), the new logic we added to implement The part that has proven error-prone, imo, is just the intrinsic difficulty of the problem of retrofitting these "virtual" dists into a codebase that wasn't architected for lazy computation (because it didn't need to). I believe that this change improves that situation, by rearranging the methods of the requirement preparer to make it very explicit where the "virtual" dists are finally hydrated into real installed dists. I am aware the diff looks quite large, but the vast majority of that is from transplanting the metadata tests to use Personally, I think that ...however, the fantastic work done to decouple the actual zip operations from the requirement preparer also means that it wouldn't be difficult to remove I also believe the "virtual" dists workstream has been made more difficult than usual because it has inspired a lot of excitement about improving pip, so pip maintainers have had to piece together stuff written by many different people over time. I created this as a separate PR because I recognized there were several things to fix at once with the one I based it off of, and I thought it would be easier to have a single person with more context on the issue (me) take ownership. |
240414f
to
b9d851c
Compare
Did a quick skim over the PR, as I'm wrapping up to head to bed -- not taking a closer look right now since CI isn't particularly happy with this right now. One thing about fast-deps -- there's low-hanging fruit there seemingly, based on the final few things stated in #8670 that I've not looked into. |
There’s likely some details this isn’t getting correct now, but the general direction looks right to me. Also thanks for the “finalize” rename, that needed to be done. Should we also rename the |
21b77fc
to
d7d15be
Compare
dc4582a
to
d6da334
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a Pip maintainer, so am not able to approve the PR, however, took a look in the hope that another pair of eyes might help increase the confidence level of others approving the PR later.
This change looks great to me - the only nit I could find to leave a review comment about (when reviewing commit by commit) ended up being fixed in the typo fix in d6da334 - so I couldn't actually find anything on which I could leave a comment 😄
The thorough code comments, PR/commit descriptions and neatly split up commit sequence made reviewing this much easier too, thank you!
I'd love to see this land so it can then unblock some of your other performance improvement PRs (#12256, #12257).
I don't suppose one of the pip maintainers have a chance to review this soon? I've reviewed it myself (above) fairly carefully - and the PR already had some review passes earlier from pip maintainers, so hopefully the final review now from someone shouldn't be too time consuming :-) |
I'll be honest, I've avoided looking at this (in spite of the fact that I'm generally in favour of the idea) because it (and @cosmicexplorer's other PRs) seemed to be changing very rapidly - there were a lot of force-pushed changes in a short period, meaning that things were changing as I looked at them 🙁 In addition, this is a big change affecting a lot of files, so reviewing isn't a fast process (especially when you have limited time, as I do). I'm concerned that the preparer is one of the least "obvious" parts of the resolution process - it's full of generic terms (not least of which is "prepare" itself!) which don't really give much clue as to what precisely is going on. Every time I need to deal with the preparer, I have to do a bunch of research to remind myself what's happening before I can proceed. I'm worried that if we delegate chunks of the "prepare" task to individual commands, we will end up making that problem worse, with everyone needing to keep track of what's happening. To give an example, I'm sorry, I know that I'm making this PR somehow responsible for the big maintainability mess that is the preparer, and that's not entirely fair. But one of the reasons this PR isn't getting eyes on it for review is because of that maintainability mess, and I honestly think we need to start paying back some of that technical debt if we're to have any chance of progressing on bigger (and important!) work like this. In particular, performance improvements tend to increase technical debt (you're explicitly trading speed against maintainability) and I think we need to be cautious here, given the current struggle we have with maintainer time.
I'm not sure this will unblock things as much as we'd hope. The problem is simply that all of these changes are huge - 21 files for this one, 38 and 43 for the two you linked. Getting time to review changes on this sort of scale is a problem in itself (for me, at least - I can't speak for the other maintainers). A better approach might be to refactor the PRs into simpler, step-by-step changes, each of which has a small, manageable, scope and which can be accepted individually, both reducing the technical debt and incrementally moving towards these end goals. For example a change that renamed and documented |
The other two PRs are stacked PRs (so they don't change as many files as that) - see the PR description for the messages like:
|
@pfmoore I've read through your last comment a few times and I'm not sure I understand if there is a way forward or not? Would you prefer the PRs to be further broken down into something smaller or would you like something bigger that fixes a lot of the genericness of the preparer logic? Or are you saying that there's too much technical debt in this part of pip for anyone other than a maintainer to touch it? I had some optimization ideas that are orthagonal to the ones created by this series or PRs. But it would touch a lot of the same logic and I didn't want the PRs to step on each others toes. |
Neither. I'd prefer a series of smaller commits that make the preparer more maintainable. Is that what you mean by "fixes a lot of the genericness"? I was thinking about steps (each in its own PR) like renaming functions to be more meaningful, adding/improving comments and docstrings to explain the logic, factoring out complex code into (named, easier to understand) standalone functions, encapsulating conditionals in functions so that the calling code needs less branching, etc. Standard "refactoring" types of improvements. Once those have been done, I'd hope that this PR can be refactored into a number of similar smaller PRs, each tackling a part of the task without needing the whole thing to be done in a "big bang".
No, I'm saying that there's so much technical debt, it's hard for anyone to touch it. Getting reviews is hard because it needs a second person to put the effort into understanding, and that's hard because of the technical debt. If we were in a better situation, committer reviews would be easier because the committers would be sufficiently comfortable with the code to do a review without needing to re-learn the underlying logic. In fact, I think refactoring to reduce technical debt might be easier for an external committer, because they are coming at the code with no preconceptions, and with a fresh viewpoint. When I look at the code, all I think is "ugh, what a mess" 🙁
That's precisely the sort of over-coupling that the complexity of the existing code results in. I don't want to make "fix the technical debt" into some sort of prerequisite or "tax" that must be paid by someone like yourself who simply wants to contribute improvements to pip. We definitely can't afford to discourage that sort of work! But I appreciate the frustration you feel with how hard it is to get reviews, and this is the best way I can think of for moving the "getting a review" problem forward. If you'd prefer not to get caught up in the technical debt issue, by all means ignore those parts of my comments, and simply wait until one of the core maintainers (which might well be me) gets time to do a proper review of the code here. |
d6da334
to
15a82ca
Compare
@pfmoore thanks so much for the clear feedback. The amount of merge conflicts alone (10 separate files before rebasing just now) is a strong indicator that this change should be split apart. I am focusing particularly on your comments at #12186 (comment):
This is definitely something I am hoping to improve with the series of changes I will split out from this PR.
This part is crystal clear to me--the refactoring + new functionality (resolving #12603) is absolutely far too many things to change in one PR. I think I can separate this into at least three changes:
|
15a82ca
to
57a33d6
Compare
57a33d6
to
8b336e3
Compare
I was able to split this in two, and I believe the result cleanly separates refactoring from feature work. Please take a look at #12863 for the refactoring change with effective documentation; I have set the current change to depend on it and have set this into draft mode until we've all had time to reach consensus on #12863. Thanks again for the wonderful feedback; I really do love contributing to this project! |
8b336e3
to
7fd7e49
Compare
28a0db3
to
684dc32
Compare
684dc32
to
53f3a24
Compare
53f3a24
to
e9849f8
Compare
When performing `install --dry-run` and PEP 658 .metadata files are available to guide the resolve, do not download the associated wheels. Rather use the distribution information directly from the .metadata files when reporting the results on the CLI and in the --report file. - describe the new --dry-run behavior - finalize linked requirements immediately after resolve - introduce is_concrete - funnel InstalledDistribution through _get_prepared_distribution() too - add test for new install --dry-run functionality (no downloading)
This PR is on top of #12871; see the
+337/-23
diff against it at https://github.com/cosmicexplorer/pip/compare/refactor-requirement-preparer...cosmicexplorer:pip:metadata_only_resolve_no_whl?expand=1.Continuation of #11512, now with test cases and a lot of comments.
Problem
The use case proposed in #53 uses pip just to execute its resolution logic, in order to generate e.g. a lockfile, without pulling down any dists (which may be quite large). We eventually arrived at
pip install --report --dry-run
as the way to enable this. However, the workstream to improve pip performance has been somewhat fractured and distributed across multiple efforts (as often occurs in open source projects), and by the time pypi had managed to enable PEP 658 metadata, it turns out we had failed to implement the caching behavior we wanted, as described in #11512.Solution
RequirementPreparer#prepare_linked_requirements_more()
to.finalize_linked_requirements(..., require_dist_files=True)
.require_dist_files=False
, the requirement preparer will not download wheels if it already has metadata. This resolves Pip install--dry-run
shouldn't download full wheels when metadata file available #12603.test_download_metadata()
tests into a new filetest_install_metadata.py
, because the metadata tests were supposed to be done against the install command anyway.