Returning a RequirementSet from Resolver.resolve #7638

pradyunsg · 2020-01-23T07:41:37Z

Description
Currently, pip's Resolver.resolve method is expected to mutate the RequirementSet provided to it; which results in a slightly difficult to follow control flow.

Desired behavior

Resolver.resolve(req_set: RequirementSet) -> RequirementSet

Known Challenges
pip's install, download and wheel commands, have a requirement_set.cleanup_files() call in a try-finally, which means that the existing requirement set is clearly assumed to be the only one that exists. We'd likely want to construct a new set in the new resolver, so figuring out how we'd handle cleanups is the main thing to do here.

The text was updated successfully, but these errors were encountered:

pradyunsg · 2020-01-23T07:58:01Z

This came up in #7571 (comment)

pfmoore · 2020-01-23T15:36:49Z

Working notes

Mostly a brain dump of my investigations, so it's recorded somewhere public. I've just found a few other conversations that I need to follow up on, though, which may put all of the following in a different light.

Review this against #7571 (comment), #6607 (comment) (and followup)

Analysis

Looking at RequirementSet's API:

Internal use only
check_supported_wheels
add_{named,unnamed}_requirement

Read-only from clients
unnamed_requirements
requirements

Modified by clients
successfully_downloaded [resolver]
reqs_to_cleanup [resolver]

Other Methods
has_requirement - called by resolver
get_requirement - called by install command to check if pip was installed
cleanup_files - called at end of wheel, install and download commands
add_requirement - called by resolver and req_command (populate_requirement_set)

Conclusion

Two things are tightly coupled here - successfully_downloaded and reqs_to_cleanup. Maybe both can be made into attributes of the requirement? cleanup_files could remain a convenience method on the requirement set or (better) made into a utility function.

So, two initial refactorings:

Move successfully_downloaded to the requirement.
Move needs cleanup attribute onto the requirement.

That leaves RequirementSet as mostly just a container, with the messy add_requirement method to review and tidy up. That's where the control flow gets nasty (checking the to-be-added req against an existing req, and potentially mutating the existing req).

pradyunsg · 2020-01-23T20:39:46Z

Note regarding #6607 comment: pip._internal.operations ended up being where we put the "cleaner abstractions" (they're basically the corresponding functions).

I'm not against cleaning up add_requirement, it's just that it's extremely mixed in with the current resolution logic, so it might be a bit tricky to do that.

I'm not sure how you're thinking of moving successfully_downloaded to requirements - are you thinking of iterating through a requirement set and filtering "this was downloaded" from that?

pradyunsg · 2020-01-23T20:40:19Z

I officially hate the GitHub mobile UI.

pfmoore · 2020-01-23T21:50:42Z

I'm not sure how you're thinking of moving successfully_downloaded to requirements - are you thinking of iterating through a requirement set and filtering "this was downloaded" from that?

That's what I was considering. I'm not sure how much improvement that would be, though. But having said that, I'm not actually sure it's important to do that refactoring at all. I'm still trying to understand how "make resolve return a RequirementSet" actually helps improve the control flow (especially if you're lukewarm on cleaning up add_requirement).

I guess I'm still trying to understand why doing this is a good thing.

I officially hate the GitHub mobile UI.

You trendy youngsters doing open source on mobile :-)

chrahunt · 2020-01-24T01:17:28Z

The refactoring helps because as I see it we have two options for implementing the new resolver while preserving the old one:

Reimplement the current resolution logic in terms of the new resolver once it's done
Draw a line somewhere in the code to split it into:
1. Before using resolver
2. Use old resolver or new resolver
3. After using resolver

For the sake of understanding why the refactoring would be good let's assume we don't want to go with 1, so we want to go with 2.

Then we can say a few things:

the more logic that we have in "Before" and "After" the better, because then we do not have to worry about two different implementations of something (like downloading files) being broken in one or the other
the less coupling between "Before", "After", and the resolver the better, because then it will be easier to consider them separately for the purposes of fixing bugs and implementing features

In that context "make resolve return a RequirementSet" helps because it reduces the coupling between the process of doing resolution and the surrounding code.

chrahunt · 2020-01-24T01:19:11Z

I'm not sure how you're thinking of moving successfully_downloaded to requirements - are you thinking of iterating through a requirement set and filtering "this was downloaded" from that?

At this point since all downloads are attached as paths to the InstallRequirements with lifetimes that last until pip exits, we can just move the copying of the already-downloaded files to the download folder out of RequirementPreparer and into pip download or pip wheel, respectively. Then the individual commands will know which files were downloaded - it will be whatever files they just copied.

pfmoore · 2020-01-24T09:04:16Z

In that context "make resolve return a RequirementSet" helps because it reduces the coupling between the process of doing resolution and the surrounding code.

Got it, thanks. So the key here is that we don't want the RequirementSet to act as mutable state shared between the "Before" and "After" code (and hence as a back-channel for passing hidden data between them). That makes perfect sense to me.

On the other hand it does mean that simplistically moving state to the Requirement probably won't help, as it just changes what holds the shared state. But maybe it's OK if Requirement (or longer term the proposed project object) becomes the mutable object where all state goes. I'm still catching up on the thinking around this area, I'll have to consider this some more (and trawl through the issues again to make sure I've found all the relevant discussions - I still feel like there might be some key conversations I've missed).

chrahunt · 2020-01-24T09:51:18Z

My thinking is that "Before" deals strictly with something that represents a parsed requirement, and "After" deals with the Project objects. Like:

For new resolver
1. Turn parsed input requirements into Projects (I don't think we'll actually go directly to Projects here, but that's an implementation detail - externally to this function it will look the same)
2. Do resolution
3. Return result
For old resolver
1. Turn parsed input requirements into InstallRequirement
2. Do resolution
3. Wrap resulting InstallRequirements in an adapter that makes them look like Projects
4. Return result

If you haven't already, check out the refactoring that was done over the past few months in req.constructors and req.req_file. The approach above was part of my motivation for that reorganization. See also #7019 which takes it a step further.

chrahunt · 2020-01-24T10:11:20Z

But maybe it's OK if Requirement (or longer term the proposed project object) becomes the mutable object where all state goes.

It sounds bad when you say it like that. :) At the end we want the resulting objects to represent a thing that we can install or save. I would not rely on much more intimation than what is already exposed in ProjectInterface.

It's not critical for this discussion, but a few notes we'll need for later:

Projects probably don't need an uninstall method
we will need to capture a "force reinstall" option in the resolver result

pfmoore · 2020-01-30T16:45:57Z

we will need to capture a "force reinstall" option in the resolver result

Can you elaborate on this? I can see that we may need to look at how we tell the resolver that the user has specified --force-reinstall, although I'm not entirely convinced it's something the resolver needs to know. But that's not the point I want to clarify here. The question is why would the resolver result need to include a "force install" option? The resolver just returns a list of what resolvelib calls "Candidates" - essentially project name and version. With that data we can then go on and do the needed installs - and --force-reinstall just means "if it's already there, reinstall it" at that point.

Maybe the resolver would know it's already installed and be able to save us rechecking, but (a) I'd rather have less coupling and an extra check, and (b) I don't actually think the resolver would know that (or rather, it wouldn't have any better ability to deduce it than we would, and it has no need to know for its own purposes).

One proviso here - depending on the upgrade strategy, part of the information the resolver will use to prioritise which version to coose will be "what is currently installed". But again, that's data we provide to the resolver, not data the resolver can return to us.

Have I misunderstood what you're meaning by this comment?

chrahunt · 2020-01-31T01:49:10Z

why would the resolver result need to include a "force install" option?

It wouldn't need to, you're right. As long as we enrich the resolver result with information indicating that some of the things already installed will need to be force-reinstalled then I think that's good enough. The resolver doesn't need to know about a "force install" option. Above I was imagining it from the perspective of a function that would take a "resolver result" (and an install scheme) and do the required installations with no other input.

The resolver just returns a list of what resolvelib calls "Candidates" - essentially project name and version.

In my view the resolver will return Projects (or some wrapper around them), or objects that we use as an index into a lookup table of the Projects that have been used during the resolution process.

The rest of your statements sound reasonable to me.

pfmoore · 2020-02-04T09:20:04Z

I've unassigned myself from this because I'm struggling to understand what it actually means. I get the broader goal of improving the control flow, and decoupling "before the resolve" from "after the resolve", but I'm not clear how this specific change would work in that context. So rather than block someone else from looking at it, I've unassigned myself.

pradyunsg added the type: maintenance Related to Development and Maintenance Processes label Jan 23, 2020

pradyunsg changed the title ~~Returning a RequirementSet from pip's dependency resolver~~ Returning a RequirementSet from Resolver.resolve Jan 23, 2020

pfmoore self-assigned this Jan 23, 2020

pradyunsg closed this as completed Jan 23, 2020

pradyunsg reopened this Jan 23, 2020

chrahunt mentioned this issue Jan 31, 2020

Globally manage and track some temp build directories #7677

Merged

pfmoore removed their assignment Feb 4, 2020

This was referenced Feb 5, 2020

Globally manage InstallRequirement.source_dir #7696

Merged

Reduce usage of RequirementSet in commands #7704

Merged

xavfernandez closed this as completed in #7704 Feb 6, 2020

lock bot added the auto-locked Outdated issues that have been locked by automation label Mar 10, 2020

lock bot locked as resolved and limited conversation to collaborators Mar 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Returning a RequirementSet from Resolver.resolve #7638

Returning a RequirementSet from Resolver.resolve #7638

pradyunsg commented Jan 23, 2020

pradyunsg commented Jan 23, 2020

pfmoore commented Jan 23, 2020

pradyunsg commented Jan 23, 2020 •

edited

Loading

pradyunsg commented Jan 23, 2020

pfmoore commented Jan 23, 2020

chrahunt commented Jan 24, 2020

chrahunt commented Jan 24, 2020

pfmoore commented Jan 24, 2020

chrahunt commented Jan 24, 2020 •

edited

Loading

chrahunt commented Jan 24, 2020

pfmoore commented Jan 30, 2020

chrahunt commented Jan 31, 2020

pfmoore commented Feb 4, 2020

Returning a RequirementSet from Resolver.resolve #7638

Returning a RequirementSet from Resolver.resolve #7638

Comments

pradyunsg commented Jan 23, 2020

pradyunsg commented Jan 23, 2020

pfmoore commented Jan 23, 2020

Working notes

Analysis

Conclusion

pradyunsg commented Jan 23, 2020 • edited Loading

pradyunsg commented Jan 23, 2020

pfmoore commented Jan 23, 2020

chrahunt commented Jan 24, 2020

chrahunt commented Jan 24, 2020

pfmoore commented Jan 24, 2020

chrahunt commented Jan 24, 2020 • edited Loading

chrahunt commented Jan 24, 2020

pfmoore commented Jan 30, 2020

chrahunt commented Jan 31, 2020

pfmoore commented Feb 4, 2020

pradyunsg commented Jan 23, 2020 •

edited

Loading

chrahunt commented Jan 24, 2020 •

edited

Loading