Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For n packages there are O(n^2) calls to _is_current_pin_satisfying #147

Open
notatallshaw opened this issue Dec 2, 2023 · 4 comments
Open

Comments

@notatallshaw
Copy link
Collaborator

notatallshaw commented Dec 2, 2023

As the number of packages to resolve increases there appears to be an O(n2) calls to _is_current_pin_satisfying and I think, at least in some circumstances, it's possible to optimize this away.

I am using steps to reproduce from pypa/pip#12320 and the branch from this PR to avoid network calls in the call graph pypa/pip#12327 and not have O(n2) calls to Pip collecting packages from local directory. In the future I plan to be able to show this performance issue using https://github.com/pradyunsg/pip-resolver-benchmarks

In this example there are ~1300 packages to install and _is_current_pin_satisfying is called ~2.5 million times, I generated the call graph using cProfile and gprof2dot:

output

I have not yet investigated how one might optimize this.

This is motivated from a real world usage report: pypa/pip#12314.

@notatallshaw notatallshaw changed the title For n packages there are O(n<sup>2</sup>) calls to _is_current_pin_satisfying For n packages there are O(n^2) calls to _is_current_pin_satisfying Dec 2, 2023
@notatallshaw
Copy link
Collaborator Author

An idea I will investigate if no one else pipes in, but maybe it's possible to keep track of satified and unsatisfied names in the state, rather than needing to recalculate them each time.

@notatallshaw
Copy link
Collaborator Author

notatallshaw commented Dec 3, 2023

An initial observation, unsatisfied_names:

unsatisfied_names = [
    key
    for key, criterion in self.state.criteria.items()
    if not self._is_current_pin_satisfying(key, criterion)
]

Is almost always equal to:

self.state.criteria.keys() - self.state.mapping.keys()

Very rarely there is an additional name due to this part of _is_current_pin_satisfying:

all(
    self._p.is_satisfied_by(requirement=r, candidate=current_pin)
    for r in criterion.iter_requirement()
 )

And this is what is the expensive part of the call is, but possible a trick that could be applied is just assume unsatisfied_names = self.state.criteria.keys() - self.state.mapping.keys(), and at completion this approximate unsatisfied names will equal the real unsatisfied names (i.e. it will be 0).

@notatallshaw
Copy link
Collaborator Author

FYI I have not made any meaningful progress on this and doesn't expect to any time soon.

I tried avoiding calling _is_current_pin_satisfying in most cases of calculating unsatisfied_names , but I got lots of test failures that I beleive to be valid, it feels like it should be possible so possibly there was just an unidentified bug in my code. I will take a look at this again when I have a chance.

@notatallshaw
Copy link
Collaborator Author

FYI, a workaround is for the provider to cache the calls, a PR has been raised on Pip side to do that: pypa/pip#12453

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant