Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning a RequirementSet from Resolver.resolve #7638

Closed
pradyunsg opened this issue Jan 23, 2020 · 13 comments · Fixed by #7704
Closed

Returning a RequirementSet from Resolver.resolve #7638

pradyunsg opened this issue Jan 23, 2020 · 13 comments · Fixed by #7704
Labels
auto-locked Outdated issues that have been locked by automation type: maintenance Related to Development and Maintenance Processes

Comments

@pradyunsg
Copy link
Member

Description
Currently, pip's Resolver.resolve method is expected to mutate the RequirementSet provided to it; which results in a slightly difficult to follow control flow.

Desired behavior

Resolver.resolve(req_set: RequirementSet) -> RequirementSet

Known Challenges
pip's install, download and wheel commands, have a requirement_set.cleanup_files() call in a try-finally, which means that the existing requirement set is clearly assumed to be the only one that exists. We'd likely want to construct a new set in the new resolver, so figuring out how we'd handle cleanups is the main thing to do here.

@pradyunsg pradyunsg added the type: maintenance Related to Development and Maintenance Processes label Jan 23, 2020
@pradyunsg pradyunsg changed the title Returning a RequirementSet from pip's dependency resolver Returning a RequirementSet from Resolver.resolve Jan 23, 2020
@pradyunsg
Copy link
Member Author

This came up in #7571 (comment)

@pfmoore pfmoore self-assigned this Jan 23, 2020
@pfmoore
Copy link
Member

pfmoore commented Jan 23, 2020

Working notes

Mostly a brain dump of my investigations, so it's recorded somewhere public. I've just found a few other conversations that I need to follow up on, though, which may put all of the following in a different light.

Review this against #7571 (comment), #6607 (comment) (and followup)

Analysis

Looking at RequirementSet's API:

Internal use only
check_supported_wheels
add_{named,unnamed}_requirement

Read-only from clients
unnamed_requirements
requirements

Modified by clients
successfully_downloaded [resolver]
reqs_to_cleanup [resolver]

Other Methods
has_requirement - called by resolver
get_requirement - called by install command to check if pip was installed
cleanup_files - called at end of wheel, install and download commands
add_requirement - called by resolver and req_command (populate_requirement_set)

Conclusion

Two things are tightly coupled here - successfully_downloaded and reqs_to_cleanup. Maybe both can be made into attributes of the requirement? cleanup_files could remain a convenience method on the requirement set or (better) made into a utility function.

So, two initial refactorings:

  1. Move successfully_downloaded to the requirement.
  2. Move needs cleanup attribute onto the requirement.

That leaves RequirementSet as mostly just a container, with the messy add_requirement method to review and tidy up. That's where the control flow gets nasty (checking the to-be-added req against an existing req, and potentially mutating the existing req).

@pradyunsg
Copy link
Member Author

pradyunsg commented Jan 23, 2020

Note regarding #6607 comment: pip._internal.operations ended up being where we put the "cleaner abstractions" (they're basically the corresponding functions).

I'm not against cleaning up add_requirement, it's just that it's extremely mixed in with the current resolution logic, so it might be a bit tricky to do that.

I'm not sure how you're thinking of moving successfully_downloaded to requirements - are you thinking of iterating through a requirement set and filtering "this was downloaded" from that?

@pradyunsg pradyunsg reopened this Jan 23, 2020
@pradyunsg
Copy link
Member Author

I officially hate the GitHub mobile UI.

@pfmoore
Copy link
Member

pfmoore commented Jan 23, 2020

I'm not sure how you're thinking of moving successfully_downloaded to requirements - are you thinking of iterating through a requirement set and filtering "this was downloaded" from that?

That's what I was considering. I'm not sure how much improvement that would be, though. But having said that, I'm not actually sure it's important to do that refactoring at all. I'm still trying to understand how "make resolve return a RequirementSet" actually helps improve the control flow (especially if you're lukewarm on cleaning up add_requirement).

I guess I'm still trying to understand why doing this is a good thing.

I officially hate the GitHub mobile UI.

You trendy youngsters doing open source on mobile :-)

@chrahunt
Copy link
Member

The refactoring helps because as I see it we have two options for implementing the new resolver while preserving the old one:

  1. Reimplement the current resolution logic in terms of the new resolver once it's done
  2. Draw a line somewhere in the code to split it into:
    1. Before using resolver
    2. Use old resolver or new resolver
    3. After using resolver

For the sake of understanding why the refactoring would be good let's assume we don't want to go with 1, so we want to go with 2.

Then we can say a few things:

  1. the more logic that we have in "Before" and "After" the better, because then we do not have to worry about two different implementations of something (like downloading files) being broken in one or the other
  2. the less coupling between "Before", "After", and the resolver the better, because then it will be easier to consider them separately for the purposes of fixing bugs and implementing features

In that context "make resolve return a RequirementSet" helps because it reduces the coupling between the process of doing resolution and the surrounding code.

@chrahunt
Copy link
Member

I'm not sure how you're thinking of moving successfully_downloaded to requirements - are you thinking of iterating through a requirement set and filtering "this was downloaded" from that?

At this point since all downloads are attached as paths to the InstallRequirements with lifetimes that last until pip exits, we can just move the copying of the already-downloaded files to the download folder out of RequirementPreparer and into pip download or pip wheel, respectively. Then the individual commands will know which files were downloaded - it will be whatever files they just copied.

@pfmoore
Copy link
Member

pfmoore commented Jan 24, 2020

In that context "make resolve return a RequirementSet" helps because it reduces the coupling between the process of doing resolution and the surrounding code.

Got it, thanks. So the key here is that we don't want the RequirementSet to act as mutable state shared between the "Before" and "After" code (and hence as a back-channel for passing hidden data between them). That makes perfect sense to me.

On the other hand it does mean that simplistically moving state to the Requirement probably won't help, as it just changes what holds the shared state. But maybe it's OK if Requirement (or longer term the proposed project object) becomes the mutable object where all state goes. I'm still catching up on the thinking around this area, I'll have to consider this some more (and trawl through the issues again to make sure I've found all the relevant discussions - I still feel like there might be some key conversations I've missed).

@chrahunt
Copy link
Member

chrahunt commented Jan 24, 2020

My thinking is that "Before" deals strictly with something that represents a parsed requirement, and "After" deals with the Project objects. Like:

  1. For new resolver
    1. Turn parsed input requirements into Projects (I don't think we'll actually go directly to Projects here, but that's an implementation detail - externally to this function it will look the same)
    2. Do resolution
    3. Return result
  2. For old resolver
    1. Turn parsed input requirements into InstallRequirement
    2. Do resolution
    3. Wrap resulting InstallRequirements in an adapter that makes them look like Projects
    4. Return result

If you haven't already, check out the refactoring that was done over the past few months in req.constructors and req.req_file. The approach above was part of my motivation for that reorganization. See also #7019 which takes it a step further.

@chrahunt
Copy link
Member

But maybe it's OK if Requirement (or longer term the proposed project object) becomes the mutable object where all state goes.

It sounds bad when you say it like that. :) At the end we want the resulting objects to represent a thing that we can install or save. I would not rely on much more intimation than what is already exposed in ProjectInterface.

It's not critical for this discussion, but a few notes we'll need for later:

  • Projects probably don't need an uninstall method
  • we will need to capture a "force reinstall" option in the resolver result

@pfmoore
Copy link
Member

pfmoore commented Jan 30, 2020

we will need to capture a "force reinstall" option in the resolver result

Can you elaborate on this? I can see that we may need to look at how we tell the resolver that the user has specified --force-reinstall, although I'm not entirely convinced it's something the resolver needs to know. But that's not the point I want to clarify here. The question is why would the resolver result need to include a "force install" option? The resolver just returns a list of what resolvelib calls "Candidates" - essentially project name and version. With that data we can then go on and do the needed installs - and --force-reinstall just means "if it's already there, reinstall it" at that point.

Maybe the resolver would know it's already installed and be able to save us rechecking, but (a) I'd rather have less coupling and an extra check, and (b) I don't actually think the resolver would know that (or rather, it wouldn't have any better ability to deduce it than we would, and it has no need to know for its own purposes).

One proviso here - depending on the upgrade strategy, part of the information the resolver will use to prioritise which version to coose will be "what is currently installed". But again, that's data we provide to the resolver, not data the resolver can return to us.

Have I misunderstood what you're meaning by this comment?

@chrahunt
Copy link
Member

why would the resolver result need to include a "force install" option?

It wouldn't need to, you're right. As long as we enrich the resolver result with information indicating that some of the things already installed will need to be force-reinstalled then I think that's good enough. The resolver doesn't need to know about a "force install" option. Above I was imagining it from the perspective of a function that would take a "resolver result" (and an install scheme) and do the required installations with no other input.

The resolver just returns a list of what resolvelib calls "Candidates" - essentially project name and version.

In my view the resolver will return Projects (or some wrapper around them), or objects that we use as an index into a lookup table of the Projects that have been used during the resolution process.

The rest of your statements sound reasonable to me.

@pfmoore
Copy link
Member

pfmoore commented Feb 4, 2020

I've unassigned myself from this because I'm struggling to understand what it actually means. I get the broader goal of improving the control flow, and decoupling "before the resolve" from "after the resolve", but I'm not clear how this specific change would work in that context. So rather than block someone else from looking at it, I've unassigned myself.

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Mar 10, 2020
@lock lock bot locked as resolved and limited conversation to collaborators Mar 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation type: maintenance Related to Development and Maintenance Processes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants