Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency resolution: Limit unnecessary backtracking by inferring constraints from dependencies #11575

Closed
1 task done
GideonBear opened this issue Nov 5, 2022 · 16 comments
Closed
1 task done
Labels
type: feature request Request for a new feature

Comments

@GideonBear
Copy link

What's the problem this feature will solve?

Currently, I depend on A>=1.0 and B>=1.0. A depends on B<2.0.
When I install my dependencies, pip first downloads B-2.0 and A-1.0, as those are the latests versions that match my requirements. Then it sees that A-1.0 wants B<2.0, so it downloads B-1.0.
This is unnecessary backtracking, as A always has B<2.0 as constraint.
In my case, pip was even backtracking package C just to make it compatible with B-2.0, which isn't even compatible in the first place!

Real-life example:
I run pip install "pylint>=2.16.dev" "perflint>=0.7.3".
perflint depends on pylint<3.0, but pylint-3.0.0a5 is still installed.
After that it begins backtracking astroid and different versions of pylint-3.0.0, after it's done it notices that pylint-3.0.0 isn't compatible with perflint, after which it tries pylint2.16.0.dev1, which works perfectly.
Complete output:

pip install "pylint>=2.16.dev" "perflint>=0.7.3"
Collecting pylint>=2.16.dev
  Using cached pylint-3.0.0a5-py3-none-any.whl (485 kB)
Collecting perflint>=0.7.3
  Using cached perflint-0.7.3-py2.py3-none-any.whl (11 kB)
Collecting tomlkit>=0.10.1
  Using cached tomlkit-0.11.6-py3-none-any.whl (35 kB)
Collecting tomli>=1.1.0
  Using cached tomli-2.0.1-py3-none-any.whl (12 kB)
Collecting dill>=0.2
  Using cached dill-0.3.6-py3-none-any.whl (110 kB)
Collecting mccabe<0.8,>=0.6
  Using cached mccabe-0.7.0-py2.py3-none-any.whl (7.3 kB)
Collecting typing-extensions>=3.10.0
  Using cached typing_extensions-4.4.0-py3-none-any.whl (26 kB)
Collecting isort<6,>=4.2.5
  Using cached isort-5.10.1-py3-none-any.whl (103 kB)
Collecting platformdirs>=2.2.0
  Using cached platformdirs-2.5.2-py3-none-any.whl (14 kB)
Collecting astroid<=2.12.0-dev0,>=2.11.5
  Using cached astroid-2.11.7-py3-none-any.whl (251 kB)
Collecting pylint>=2.16.dev
  Using cached pylint-2.16.0.dev0-py3-none-any.whl (506 kB)
INFO: pip is looking at multiple versions of perflint to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of <Python from Requires-Python> to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of pylint to determine which version is compatible with other requirements. This could take a while.
  Using cached pylint-3.0.0a4-py3-none-any.whl (371 kB)
Collecting toml>=0.7.1
  Using cached toml-0.10.2-py2.py3-none-any.whl (16 kB)
Collecting astroid<2.7,>=2.6.1
  Using cached astroid-2.6.6-py3-none-any.whl (231 kB)
Collecting mccabe<0.7,>=0.6
  Using cached mccabe-0.6.1-py2.py3-none-any.whl (8.6 kB)
Requirement already satisfied: setuptools>=20.0 in ./venv/lib/python3.9/site-packages (from astroid<2.7,>=2.6.1->pylint>=2.16.dev) (65.5.1)
Collecting lazy-object-proxy>=1.4.0
  Using cached lazy_object_proxy-1.8.0-cp39-cp39-linux_x86_64.whl
Collecting wrapt<1.13,>=1.11
  Using cached wrapt-1.12.1.tar.gz (27 kB)
  Preparing metadata (setup.py) ... done
INFO: pip is looking at multiple versions of astroid to determine which version is compatible with other requirements. This could take a while.
Collecting astroid<2.7,>=2.6.1
  Using cached astroid-2.6.5-py3-none-any.whl (231 kB)
  Using cached astroid-2.6.4-py3-none-any.whl (231 kB)
  Using cached astroid-2.6.3-py3-none-any.whl (231 kB)
  Using cached astroid-2.6.2-py3-none-any.whl (228 kB)
  Using cached astroid-2.6.1-py3-none-any.whl (227 kB)
Collecting pylint>=2.16.dev
  Using cached pylint-3.0.0a3-py3-none-any.whl (357 kB)
INFO: pip is looking at multiple versions of perflint to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of astroid to determine which version is compatible with other requirements. This could take a while.
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C.
Collecting astroid<2.7,>=2.5.4
  Using cached astroid-2.6.0-py3-none-any.whl (227 kB)
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C.
  Using cached astroid-2.5.8-py3-none-any.whl (223 kB)
  Using cached astroid-2.5.7-py3-none-any.whl (222 kB)
  Using cached astroid-2.5.6-py3-none-any.whl (219 kB)
  Using cached astroid-2.5.5-py3-none-any.whl (219 kB)
  Using cached astroid-2.5.4-py3-none-any.whl (219 kB)
Collecting pylint>=2.16.dev
  Using cached pylint-3.0.0a1-py3-none-any.whl (346 kB)
Collecting astroid<2.7,>=2.5.2
  Using cached astroid-2.5.3-py3-none-any.whl (226 kB)
  Using cached astroid-2.5.2-py3-none-any.whl (222 kB)
Collecting pylint>=2.16.dev
  Using cached pylint-3.0.0a0-py3-none-any.whl (346 kB)
Collecting astroid<=2.14.0-dev0,>=2.12.8
  Using cached astroid-2.12.12-py3-none-any.whl (264 kB)
Collecting wrapt<2,>=1.11
  Using cached wrapt-1.14.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (77 kB)
Installing collected packages: wrapt, typing-extensions, tomlkit, tomli, platformdirs, mccabe, lazy-object-proxy, isort, dill, astroid, pylint, perflint
Successfully installed astroid-2.12.12 dill-0.3.6 isort-5.10.1 lazy-object-proxy-1.8.0 mccabe-0.7.0 perflint-0.7.3 platformdirs-2.5.2 pylint-2.16.0.dev0 tomli-2.0.1 tomlkit-0.11.6 typing-extensions-4.4.0 wrapt-1.14.1

Describe the solution you'd like

Pip could always download packages which have only one matching version before packages with multiple matching versions, because it knows for sure it doesn't have to (or can't) backtrack them.
Then, for every package pip downloaded all the versions of with the same set of requirements (which includes the packages with only one matching version) that it knows of for sure that package is actually required (for example from user requirements), handle them the same way as requirements from the user (or at least, stop them from backtracking).
This should reduce backtracking, as it's essentially the same as specifying B<2.0 in the requirement, because it can infer B<2.0 is actually needed (just like it can infer that from the user requirements) without installing B.

Alternative Solutions

You could always just let it run (it takes only a couple backtracking rounds to resolve it correctly), but an easy workaround is to specify the requirement of package B for A in the requirement, which results in no backtracking at all:

pip install "pylint>=2.16.dev,<3.0" "perflint>=0.7.3"
Collecting pylint<3.0,>=2.16.dev
  Using cached pylint-2.16.0.dev0-py3-none-any.whl (506 kB)
Collecting perflint>=0.7.3
  Using cached perflint-0.7.3-py2.py3-none-any.whl (11 kB)
Collecting typing-extensions>=3.10.0
  Using cached typing_extensions-4.4.0-py3-none-any.whl (26 kB)
Collecting tomlkit>=0.10.1
  Using cached tomlkit-0.11.6-py3-none-any.whl (35 kB)
Collecting tomli>=1.1.0
  Using cached tomli-2.0.1-py3-none-any.whl (12 kB)
Collecting astroid<=2.14.0-dev0,>=2.12.8
  Using cached astroid-2.12.12-py3-none-any.whl (264 kB)
Collecting isort<6,>=4.2.5
  Using cached isort-5.10.1-py3-none-any.whl (103 kB)
Collecting mccabe<0.8,>=0.6
  Using cached mccabe-0.7.0-py2.py3-none-any.whl (7.3 kB)
Collecting dill>=0.2
  Using cached dill-0.3.6-py3-none-any.whl (110 kB)
Collecting platformdirs>=2.2.0
  Using cached platformdirs-2.5.2-py3-none-any.whl (14 kB)
Collecting wrapt<2,>=1.11
  Using cached wrapt-1.14.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (77 kB)
Collecting lazy-object-proxy>=1.4.0
  Using cached lazy_object_proxy-1.8.0-cp39-cp39-linux_x86_64.whl
Installing collected packages: wrapt, typing-extensions, tomlkit, tomli, platformdirs, mccabe, lazy-object-proxy, isort, dill, astroid, pylint, perflint
Successfully installed astroid-2.12.12 dill-0.3.6 isort-5.10.1 lazy-object-proxy-1.8.0 mccabe-0.7.0 perflint-0.7.3 platformdirs-2.5.2 pylint-2.16.0.dev0 tomli-2.0.1 tomlkit-0.11.6 typing-extensions-4.4.0 wrapt-1.14.1

Additional context

It seems to me this is possible, but I don't know how dependency resolution works, and it's possible my logic is flawed, so sorry if this is an impossible request.
If any part of my request is unclear, please let me know.
Pip version: pip 22.3 from ... (python 3.9)

Code of Conduct

@GideonBear GideonBear added S: needs triage Issues/PRs that need to be triaged type: feature request Request for a new feature labels Nov 5, 2022
@GideonBear GideonBear changed the title Limit unnecessary backtracking by inferring constraints from dependencies Dependency resolution: Limit unnecessary backtracking by inferring constraints from dependencies Nov 5, 2022
@notatallshaw
Copy link
Member

notatallshaw commented Nov 6, 2022

Pip could always download packages which have only one matching version before packages with multiple matching versions, because it knows for sure it doesn't have to (or can't) backtrack them.

I do think it makes sense to scan the top level requirements and collect the requirements that have only one matching version. But I'm not sure how easily this would to implement? Would need to check all possible locations Pip could get packages from first (e.g. all indexes, all file locations, etc.). But otherwise seems reasonable to me, I'm not Pip maintainer but I'm sure if someone submits a PR it will be considered seriously.

Then, for every package pip downloaded all the versions of with the same set of requirements (which includes the packages with only one matching version) that it knows of for sure that package is actually required (for example from user requirements), handle them the same way as requirements from the user (or at least, stop them from backtracking).

Maybe I'm misunderstanding you but I don't think this logic holds beyond top level requirements as above, e.g.

Let's say your top level requirement is A>1, B>1 and the latest versions are A=3 and B=3. Now let's say B has a requirement on C>1 and the latest version is C=3, and C=3 has a requirement on A<3, let's say the latest version of A<3 is A=2 so then A=2 has to be downloaded, but let's say A=2 has a requirement that C<3, and let's say C=2 is the latest version of C<3 but it turns out C=2 has no dependency on A, so A=3 can be used but only C=2 can be used even though A=3 has no requirement on C and C=2 has no requirement on A we had to perform backtracking to find this solution.

This kind of situation where the requirements across versions of dependencies and transitive dependencies are not stable is what stops you from being able to make lots of simplifying assumptions.

@GideonBear
Copy link
Author

Would need to check all possible locations Pip could get packages from first (e.g. all indexes, all file locations, etc.).

AFAIK pip already does this. If pip complains about a version constraint that it couldn't satisfy, it will show all versions:

ERROR: Could not find a version that satisfies the requirement perflint>2.0 (from versions: 0.0.1, 0.0.2, 0.0.3, 0.0.4, 0.0.5, 0.0.6, 0.0.7, 0.0.9, 0.1.0, 0.1.1, 0.2.0, 0.3.0, 0.4.0, 0.4.1, 0.5.0, 0.5.1, 0.6.0, 0.7.0, 0.7.1, 0.7.2, 0.7.3)

So it has already collected the list of versions.

Maybe I'm misunderstanding you but I don't think this logic holds beyond top level requirements as above, e.g.

What I mean (and what you didn't use in your example if I understand it correctly) is that it can handle the dependencies from a package as top-level (or whatever it is called; requirements that pip knows for sure of they need to be installed, like user requirements) if that package is also top-level, and it has only one possible set of requirements (which it has if there's only one version available).
In my example, what I didn't say but did mean, is that A only has one version that matches the top-level requirements: A-1.0. That means that the dependencies of A can be inferred to be top-level, if the requirement for A is top-level.

I'm now using the term top-level for requirements that pip knows for sure are needed, but correct me if that's not what you meant with that. If not, what is the correct term for that?

@notatallshaw
Copy link
Member

notatallshaw commented Nov 6, 2022

AFAIK pip already does this. If pip complains about a version constraint that it couldn't satisfy, it will show all versions:

Right but that's if Pip can't find any version that satisfies requirements. But I don't think if the user specifies 10 indexes and Pip finds 1 matching version on the first index that Pip will then check the next 9 indexes. But if Pip wants to guarantee there is only 1 matching version it will need to check all 10 indexes to make sure there is exactly only 1 version available.

So in some cases at least this will slow down finding a solution.

In my example, what I didn't say but did mean, is that A only has one version that matches the top-level requirements: A-1.0. That means that the dependencies of A can be inferred to be top-level, if the requirement for A is top-level.

I still don't understand what benefit we can extract from this knowledge? A has it's dependencies cached and will be applied when searching through some transitive package T that depends on A.

Also we can not assume that adding some version of T which depends on A will not further restrict or break any dependencies currently defined for A, e.g A's dependencies was B=3.0 but the latest version of T depends on C=1.0 which depends on B=2.0

We also can not assume that just because the latest version if T depends on A that any previous version of T will also depend on A, so T and T's dependencies may need to be carefully backtracked through even if they specify A.

I'm now using the term top-level for requirements that pip knows for sure are needed, but correct me if that's not what you meant with that. If not, what is the correct term for that?

Yes, I use top level requirements to mean everything that was initially provided that can not change, e.g. a requirements file the user directly provided.

@GideonBear
Copy link
Author

But I don't think if the user specifies 10 indexes and Pip finds 1 matching version on the first index that Pip will then check the next 9 indexes.

Does it? Doesn't it try to get the highest version available, even if that means searching through multiple indexes?

I still don't understand what benefit we can extract from this knowledge? A has it's dependencies cached and will be applied when searching through some transitive package T that depends on A.
Also we can not assume that adding some version of T which depends on A will not further restrict or break any dependencies currently defined for A, e.g A's dependencies was B=3.0 but the latest version of T depends on C=1.0 which depends on B=2.0

If I run pip install A==1.0, the dependencies of A-1.0 can be inferred to be top-level.
The point is that if the dependencies of A are top-level, it can use those to restrict what versions need to be backtracked, as without it, it apparently backtracks trough versions (of B) that are incompatible with the dependencies of A.

We cannot assume anything, but if any requirement breaks any top-level constraint, it's automatically invalid, instead of possibly being backtracked. That's the benefit of having A's dependencies as top-level requirements.

We also can not assume that just because the latest version if T depends on A that any previous version of T will also depend on A, so T and T's dependencies may need to be carefully backtracked through even if they specify A.

That's why it should only infer dependencies as top-level if all the versions have the same dependencies. If it's a single available version it always has the same dependencies, so it can always be inferred as top-level (if it itself is top-level, at least)

@pfmoore
Copy link
Member

pfmoore commented Nov 6, 2022

I'm now using the term top-level for requirements that pip knows for sure are needed, but correct me if that's not what you meant with that. If not, what is the correct term for that?

"Top level" means specified by the user on the command line or in a requirements file. Everything else is "dependency data". In a sense, "top level" requirements are ones that pip "knows for sure" are needed - but dependency data can refine or further restrict the top level requirement on a project, so that adds complexity. In particular, the requirements that pip's resolver code works with internally often aren't identifiable as "top level or not". That's not to say that we couldn't track that information, but it could be a lot harder than you think, as multiple requirements for a single project (some of which may be top level, others of which might not) get combined into a single requirement deeper in the code.

It seems to me this is possible, but I don't know how dependency resolution works, and it's possible my logic is flawed, so sorry if this is an impossible request.

It's certainly possible that pip is doing unnecessary work. The problem is that it's far harder than you'd imagine to determine if pip knows something is unnecessary - you (as a human, with an understanding of the packages involved) might know, but pip has a much more limited view, and typically the problem is that pip simply doesn't know something that you know. It might be possible to change the code so that pip has access to that extra knowledge, but that's often not as easy as it seems at first.

Currently, I depend on A>=1.0 and B>=1.0. A depends on B<2.0.

Here's a good example of this. You simply can't say "A depends on B < 2.0". Dependencies are not defined per project, but per individual version of a project (per candidate to use the term pip's code uses). Sure, A 1.0 may depend on B < 2.0. But A 2.0 may depend on something completely different, not even depending on B at all. And pip won't know this without downloading A 2.0 and extracting its dependency data (a costly operation, so one we avoid doing until the last possible moment).

Pip may not even know if versions of A other than 1.0 exist. The key function in the resolver that works out "what is available" returns a list of candidates (in priority order) that satisfy a requirement. That code is called in a context where it's not possible to tell if the requirement is a "top level" requirement (one that the user specified, so will always have to be satisfied) or a dependency. And indeed, as I noted above, it might even be a combination of both.

So while you might be right, and pip can't usefully get any benefit from trying B 1.0, the problem lies in making pip aware of that information. And equally importantly, doing so in a way that doesn't slow down other, simpler, sets of requirements. (The problem that the resolver is solving is, in theory, exponentially hard, so there will always be cases that are unacceptably slow. We can't avoid that even in theory, so the best we can do is make a set of compromises that mean the bad cases aren't ones people care about in real life. So "does it slow down other cases" is a critical measure of whether an optimisation is acceptable.)

Unfortunately, it really isn't possible to evaluate a suggestion like this without doing a pretty deep dive into how the resolver works. Most of the "low hanging fruit" has been picked off by now, I'm afraid. If you are interested in looking further into this, the places to look are the source of the resolvelib library, and pip's resolver code in pip._internal.resolution.resolvelib. People here will be happy to help and advise if you do want to start digging 🙂

Currently, I depend on A>=1.0 and B>=1.0. A depends on B<2.0.
When I install my dependencies, pip first downloads B-2.0 and A-1.0, as those are the latests versions that match my requirements. Then it sees that A-1.0 wants B<2.0, so it downloads B-1.0.
This is unnecessary backtracking, as A always has B<2.0 as constraint.

I'm going to assume you made a typo here, as A 1.0, B 1.0 is a valid solution to the requirements you gave (you need A>=1.0 and B>=1.0, and A depends on B<2.0). It doesn't affect the points I make above.

@notatallshaw
Copy link
Member

notatallshaw commented Nov 6, 2022

Does it? Doesn't it try to get the highest version available, even if that means searching through multiple indexes?

I might be wrong but I don't think so, Pip has no guarantees on the "highest version available" just that "some" solution is found given your requirements if that solution exists.

If I run pip install A==1.0, the dependencies of A-1.0 can be inferred to be top-level. The point is that if the dependencies of A are top-level, it can use those to restrict what versions need to be backtracked, as without it, it apparently backtracks trough versions (of B) that are incompatible with the dependencies of A.

We cannot assume anything, but if any requirement breaks any top-level constraint, it's automatically invalid, instead of possibly being backtracked. That's the benefit of having A's dependencies as top-level requirements.

Ah yes what you're getting at is clear to me now, this totally makes sense and I agree that on the whole this would probably be an improvement.

In practice though I don't think any special step would need to be taken to make these dependencies top level requirements beyond "collect the requirements of top level packages that have only 1 match first". Because they would be the last set of requirements to remove when backtracking I think those requirements become effectively top level requirements for free.

And I think it would make sense to do this step of "collecting the requirements and going through the ones that have only 1 match" at each step where a new set of requirements are added. For at least these cases where 1 matches are found it could potentially significantly speed up backtracking.

@GideonBear were you looking at creating a PR or is this just a suggestion?

@uranusjr Any input on if you think this is a good idea? And any opinions on how someone might go about implementing this?

I guess need to see if this need to be done with resolvelib or does Pip have enough autonomy to choose itself what should be collected first? I've been thinking about a similar optimization where where the downstream would need to be able tell resolvelib to reorder how it should backtrack, maybe this could be the same or similar API.

@pfmoore
Copy link
Member

pfmoore commented Nov 6, 2022

If I run pip install A==1.0, the dependencies of A-1.0 can be inferred to be top-level.

That's not what we mean by "top level", which might explain some of the confusion.

You're right, it could be possible to (carefully!) transitively expand the set of "known to be mandatory" requirements as we go. That's not how resolvelib works at the moment, as far as I am aware, but it may be possible to add something of this nature. We'd probably have to benchmark such a change to see if it was an improvement, particularly as it might trigger extra package metadata extraction or index requests (which are costly). But in principle it sounds like a possibility. It might be quite a significant change to resolvelib, though...

@notatallshaw
Copy link
Member

notatallshaw commented Nov 6, 2022

We'd probably have to benchmark such a change to see if it was an improvement, particularly as it might trigger extra package metadata extraction or index requests (which are costly). But in principle it sounds like a possibility. It might be quite a significant change to resolvelib, though...

I think if implemented correctly it shouldn't require any extra metadata because at some point Pip has to match against these packages anyway (because there either is only 1 match or they are the dependencies of 1 match)

However the potentially extra index lookups seems like a cause for concern to me, and difficult to benchmark against because it's mostly going to affect people with private index setups. Maybe only apply this optimization when there is just one index? (that should be most users)

And problematically Pip doesn't have any benchmarks for backtracking anyway so beyond collecting some real world examples and showing it improves them there isn't a good way to show improvements right now.

@GideonBear
Copy link
Author

It's certainly possible that pip is doing unnecessary work. The problem is that it's far harder than you'd imagine to determine if pip knows something is unnecessary - you (as a human, with an understanding of the packages involved) might know, but pip has a much more limited view, and typically the problem is that pip simply doesn't know something that you know. It might be possible to change the code so that pip has access to that extra knowledge, but that's often not as easy as it seems at first.

I understand it might be really hard, I just opened this issue to let you know my thoughts about what might improve the dependency resolver.

You simply can't say "A depends on B < 2.0". Dependencies are not defined per project, but per individual version of a project (per candidate to use the term pip's code uses). Sure, A 1.0 may depend on B < 2.0. But A 2.0 may depend on something completely different, not even depending on B at all. And pip won't know this without downloading A 2.0 and extracting its dependency data (a costly operation, so one we avoid doing until the last possible moment).
Pip may not even know if versions of A other than 1.0 exist. The key function in the resolver that works out "what is available" returns a list of candidates (in priority order) that satisfy a requirement. That code is called in a context where it's not possible to tell if the requirement is a "top level" requirement (one that the user specified, so will always have to be satisfied) or a dependency. And indeed, as I noted above, it might even be a combination of both.

Yes, sorry, with that I meant "A-1.0 depends on B<2.0 and A-1.0 is the only available version of A"
But of course, pip would need to know that.

I'm going to assume you made a typo here, as A 1.0, B 1.0 is a valid solution to the requirements you gave (you need A>=1.0 and B>=1.0, and A depends on B<2.0). It doesn't affect the points I make above.

That wasn't a typo. Yes, A-1.0, B-1.0 is a valid resolution, but it only comes to that after downloading B-2.0, even though B<2.0 can be inferred to be mandatory. If pip can first collect all matching versions, then notice that A only has one version, collect that (as it's mandatory) then infer the dependencies of A to be mandatory, then it can install B-1.0.

Pip has no guarantees on the "highest version available" just that "some" solution is found given your requirements if that solution exists.

@pfmoore Do you know how pip does this?

were you looking at creating a PR or is this just a suggestion?

This was just a suggestion, but I would be interested in at least looking into this, but I can't guarantee that I can actually do something with it, as I've never even looked at the resolver.

BTW, thanks both for your thoughts about this!

@pfmoore
Copy link
Member

pfmoore commented Nov 7, 2022

That wasn't a typo. Yes, A-1.0, B-1.0 is a valid resolution, but it only comes to that after downloading B-2.0, even though B<2.0 can be inferred to be mandatory.

Ah, I see.

If pip can first collect all matching versions, then notice that A only has one version, collect that (as it's mandatory) then infer the dependencies of A to be mandatory, then it can install B-1.0.

You're correct, if pip can do this, it will avoid checking B 2.0.

To an extent, what you're proposing here is a hybrid between two different types of resolver. Pip's current resolver (resolvelib) is a backtracking resolver, which sorks its way progressively through the "tree" of requirements, backtracking if it hits a requirement it can't satisfy. Other types of resolver (SAT solvers or PubGrub, for example) work by looking "globally" at the full dependency tree.

The problem for Python is that extracting dependency metadata is a costly operation, to the point where building the full dependency tree in advance is prohibitively expensive. So we have to use an algorithm that builds the tree "on demand", and backtracking algorithms are the only ones that do that.

Your idea, to partially examine the parts of the tree that we know from the start (the top level requirements) and build as much of the tree from there as we can without ever making a decision that we may need to backtrack over, effectively pre-builds some of the tree for the backtracking algorithm, without doing extra work that may be unnecessary. So I agree with @notatallshaw that this shouldn't incur extra cost in the resolve phase.

Looking at your suggestion this way also suggests a possible way of implementing this to me. The resolver has a mechanism to let it determine a "preference" for what candidate to consider next when faced with a choice. If we could feed the information "this is the only possibility" into that mechanism, that may prefer the "mandatory" parts of the tree without pre-building them (which would have the same effect in practice). The problem is the preference mechanism is rather a "black art" - even the code's author isn't 100% clear on how it affects the algorithm!

@pfmoore Do you know how pip does this?

Without digging into the code, I don't have much more than I posted above. The place to start is in pip._internal.resolution.resolvelib, and the find_candidates method in factory.py. That leads to iter_index_candidate_infos which calls the "finder" to return the available candidates from the index (or indices). The finder is in pip._internal.index.package_finder, and the relevant function is find_best_candidate.

Having dug that far, I see that find_best_candidate calls find_all_candidates, so it will get every file available. However, it then filters that list to return just the ones that match the specifier. That call sequence is a bit of a mess, though. Part of the problem here is that the "legacy" resolver still exists in the codebase, and uses the finder in a very different way. When we finally remove the legacy resolver code, we will be able to refactor the finder, and in the process might be able to make all of this a lot simpler.

So I guess the tl; dr here is that we do fetch a list of everything, and that list might get passed up the call chain to the point where the resolver could use it. But the call chain is a mess because of the legacy resolver, and making use of that information could be more difficult than we'd like because of that.

This was just a suggestion, but I would be interested in at least looking into this, but I can't guarantee that I can actually do something with it, as I've never even looked at the resolver.

Welcome to the fun! The resolver is one of the most tricky bits of pip, but it's also one of the most interesting, in that it's full of genuinely complex and interesting issues to work on. (It's also full of pain-in-the-a** workarounds and edge cases to handle situations that frankly I wish we could just reject, but that's life, I guess 😉)

BTW, thanks both for your thoughts about this!

No problem, it's nice to get back into the resolver code 🙂

@notatallshaw
Copy link
Member

This was just a suggestion, but I would be interested in at least looking into this, but I can't guarantee that I can actually do something with it, as I've never even looked at the resolver.

Well I've only ever made one significant contribution to Pip's code and it was the resolver, it sure took a lot of work and understanding but it was fun and I learnt a lot, and I get to feel proud because that code speeds up a lot of people's work without them even know. So I would say worth looking at, also I'll take a look and try and come up with some ideas on how to implement this but I can't commit to anything right now.

@pradyunsg
Copy link
Member

❯ pip install --dry-run -I -r test-requirements.txt
Collecting pylint>=2.16
  Downloading pylint-2.16.1-py3-none-any.whl (530 kB)
Collecting perflint>=0.7.3
  Downloading perflint-0.7.3-py2.py3-none-any.whl (11 kB)
Collecting platformdirs>=2.2.0
  Downloading platformdirs-2.6.2-py3-none-any.whl (14 kB)
Collecting typing-extensions>=3.10.0
  Downloading typing_extensions-4.4.0-py3-none-any.whl (26 kB)
Collecting tomli>=1.1.0
  Using cached tomli-2.0.1-py3-none-any.whl (12 kB)
Collecting mccabe<0.8,>=0.6
  Using cached mccabe-0.7.0-py2.py3-none-any.whl (7.3 kB)
Collecting astroid<=2.16.0-dev0,>=2.14.1
  Downloading astroid-2.14.1-py3-none-any.whl (273 kB)
Collecting tomlkit>=0.10.1
  Downloading tomlkit-0.11.6-py3-none-any.whl (35 kB)
Collecting dill>=0.2
  Downloading dill-0.3.6-py3-none-any.whl (110 kB)
Collecting isort<6,>=4.2.5
  Downloading isort-5.12.0-py3-none-any.whl (91 kB)
Collecting lazy-object-proxy>=1.4.0
  Downloading lazy_object_proxy-1.9.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (62 kB)
Collecting wrapt<2,>=1.11
  Using cached 3c660fa34c8106aa9719f2a6636c1c3ea7afd5931ae665eb197fdf4def84/wrapt-1.14.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (77 kB)
Would install astroid-2.14.1 dill-0.3.6 isort-5.12.0 lazy-object-proxy-1.9.0 mccabe-0.7.0 perflint-0.7.3 platformdirs-2.6.2 pylint-2.16.1 tomli-2.0.1 tomlkit-0.11.6 typing_extensions-4.4.0 wrapt-1.14.1

This is resolved with sarugaku/resolvelib#113.

@GideonBear
Copy link
Author

GideonBear commented Feb 8, 2023

Hi @pradyunsg! Just making sure you understand my issue correctly; in test-requirements.txt, do you have pylint>=2.16 (like the output appears to say) or pylint>=2.16.dev? The former will not trigger the backtracking, since it doesn't allow 3.0.0a at all.

When using the main branch of pip, it's not exactly the same output as yours, but it only backtracks once(!), unlike the latest release:

$ pip install --dry-run -I "pylint>=2.16.dev" "perflint>=0.7.3"
Collecting pylint>=2.16.dev
  Using cached pylint-3.0.0a5-py3-none-any.whl (485 kB)
Collecting perflint>=0.7.3
  Using cached perflint-0.7.3-py2.py3-none-any.whl (11 kB)
Collecting dill>=0.2
  Using cached dill-0.3.6-py3-none-any.whl (110 kB)
Collecting tomlkit>=0.10.1
  Using cached tomlkit-0.11.6-py3-none-any.whl (35 kB)
Collecting isort<6,>=4.2.5
  Using cached isort-5.12.0-py3-none-any.whl (91 kB)
Collecting astroid<=2.12.0-dev0,>=2.11.5
  Using cached astroid-2.11.7-py3-none-any.whl (251 kB)
Collecting platformdirs>=2.2.0
  Using cached platformdirs-3.0.0-py3-none-any.whl (14 kB)
Collecting mccabe<0.8,>=0.6
  Using cached mccabe-0.7.0-py2.py3-none-any.whl (7.3 kB)
Collecting tomli>=1.1.0
  Using cached tomli-2.0.1-py3-none-any.whl (12 kB)
Collecting pylint>=2.16.dev
  Using cached pylint-2.16.1-py3-none-any.whl (530 kB)
Collecting astroid<=2.16.0-dev0,>=2.14.1
  Using cached astroid-2.14.1-py3-none-any.whl (273 kB)
Collecting typing-extensions>=4.0.0
  Using cached typing_extensions-4.4.0-py3-none-any.whl (26 kB)
Collecting wrapt<2,>=1.11
  Using cached wrapt-1.14.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (77 kB)
Collecting lazy-object-proxy>=1.4.0
  Using cached lazy_object_proxy-1.9.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (63 kB)
Would install astroid-2.14.1 dill-0.3.6 isort-5.12.0 lazy-object-proxy-1.9.0 mccabe-0.7.0 perflint-0.7.3 platformdirs-3.0.0 pylint-2.16.1 tomli-2.0.1 tomlkit-0.11.6 typing_extensions-4.4.0 wrapt-1.14.1

I don't know if it's actually because of the PR you linked (because it seems like it isn't released yet?), but it's certainly solved!
Thanks everyone!

@notatallshaw
Copy link
Member

When using the main branch of pip, it's not exactly the same output as yours, but it only backtracks once(!), unlike the latest release:

FYI the main branch of Pip does not yet include sarugaku/resolvelib#113 which will be a huge improvement to backtracking.

It does however include sarugaku/resolvelib#111 which seems to have itself made some big improvements in certain situations.

@GideonBear
Copy link
Author

the main branch of Pip does not yet include sarugaku/resolvelib#113

I thought so, yeah. But the issue was resolved anyway.

@notatallshaw
Copy link
Member

notatallshaw commented Feb 9, 2023

This is resolved with sarugaku/resolvelib#113.

@pradyunsg After some testing the current behavior improvement is improved by sarugaku/resolvelib#111, perhaps sarugaku/resolvelib#113 would have independently improved it but there is no improvement in the backtracking with sarugaku/resolvelib#113 applied on top of current pip main for this specific set of requirements.

Hi @pradyunsg! Just making sure you understand my issue correctly; in test-requirements.txt, do you have pylint>=2.16 (like the output appears to say) or pylint>=2.16.dev? The former will not trigger the backtracking, since it doesn't allow 3.0.0a at all.

...

I don't know if it's actually because of the PR you linked (because it seems like it isn't released yet?), but it's certainly solved! Thanks everyone!

Yes agreed, I tested with Pip main and with Pip main but sarugaku/resolvelib#113 applied and there is no differences. The one backtrack will always be required because Pip will try and grab the latest information version, do a BFS with the user requirements, and then backtrack on any conflicting requirements.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 12, 2023
@pradyunsg pradyunsg removed the S: needs triage Issues/PRs that need to be triaged label Mar 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

4 participants