Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer backtracking on dependencies involved in the most recent conflict #10481

Merged
merged 5 commits into from
Oct 9, 2021

Conversation

notatallshaw
Copy link
Member

@notatallshaw notatallshaw commented Sep 16, 2021

PR for #10479

I actually think it would make sense to prefer all the ancestors of the failures, but I couldn't figure out a simple way to do that.

Depends on sarugaku/resolvelib#84

@notatallshaw notatallshaw changed the title Prefer failures Prefer failures causes when backtracking Sep 16, 2021
@notatallshaw notatallshaw changed the title Prefer failures causes when backtracking Prefer failure causes when backtracking Sep 16, 2021
@notatallshaw
Copy link
Member Author

notatallshaw commented Sep 17, 2021

I actually think it would make sense to prefer all the ancestors of the failures, but I couldn't figure out a simple way to do that.

After a bit of thought I realized that older ancestors are all going to be pinned so preferring all ancestors would probably at best only help situations that required deep backtracking.

I did manage to implement a relatively simple version and it worked well with all my known test cases, but didn't seem to provide any additional benefit. Further it seems like it isn't strong enough to tackle the Airflow 1.10.13 case. So I think trying to prefer all the ancestor adds a bit of complexity for no provable benefit. Can revisit in the future though.

@uranusjr
Copy link
Member

I did manage to implement a relatively simple version and it worked well with all my known test cases, but didn't seem to provide any additional benefit.

I think I may not be reading this correctly, but what are the additional benefits you feel should be expected? The patch only modifies the backtracking behaviour from I can tell (because failure_causes is always empty outside the backtracking context), so it makes sense to me it does not provide any benefits outside its intended target.

@notatallshaw
Copy link
Member Author

notatallshaw commented Sep 17, 2021

I did manage to implement a relatively simple version and it worked well with all my known test cases, but didn't seem to provide any additional benefit.

I think I may not be reading this correctly, but what are the additional benefits you feel should be expected? The patch only modifies the backtracking behaviour from I can tell (because failure_causes is always empty outside the backtracking context), so it makes sense to me it does not provide any benefits outside its intended target.

You are correct that this patch only modifies the backtracking behavior. Sorry I did not explain myself very well, let me attempt to do better:

Say you had 3 requirements "X, Y, Z" and X depended on A which depended on B, and Y depended on C which depends on B. But the latest version of C and the latest version of A's dependency on B is incompatible. This creates a failure in attempting to pin at some point and causes Pip to backtrack,

In this scenario the patch I provide is supposed to prefer up to B, C, and A. As A and C disagree about what version of B they should have, the assumption is resolving this conflict before anything else will speed up total backtracking time.

What I thought (and crossed out in my original comment now) is it might make even more sense in some circumstances to prefer the whole ancestor chain, so the combination of X, A, B, and Y, C, B. My reasoning for this in some circumstances it could be that X and A as well as Y and C are tightly coupled and therefore preferring A and C may not be helpful as as you need to backtrack to earlier versions of X or Y.

I created a new patch which implemented this "prefer all ancestors of failure causes" but after experimenting with it I could not find an example where it actually helped more than this current patch I have submitted to you. I wanted to communicate this negative result to be clear where my thoughts on this optimization are and what I've tried.

Hope that explanation makes more sense? Seems like these ideas are somewhat difficult to communicate textually, but I am happy to answer any questions you have.

pfmoore
pfmoore previously approved these changes Sep 20, 2021
Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this -- this is a very smart way to solve the problem of actively backtracking on the items that are implicated in the conflicts.

A couple of minor comments.

return (
not requires_python,
delay_this,
not direct,
not pinned,
not responsible_for_failure,
Copy link
Member

@pradyunsg pradyunsg Sep 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should likely be the first item in this tuple. When a candidate is implicated in the failure, the most important thing is that we need to iterate on other candidates of that identifier IMO.

Copy link
Member Author

@notatallshaw notatallshaw Sep 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are likely correct. My reasoning was python, direct, and pinned should all be singular package requirements, and I didn't want to mess with pinning them first. But as "responsible_for_failures" only becomes relevant once there are failures I guess it makes sense to put it first. I will do some sanity testing though that this doesn't cause problems.

As for "delay_this" I actually personally feel that this setuptools hack can be removed with this change, as it won't be responsible for failures and therefore pip won't be "accidentally" backtracking on setuptools anymore. I just wasn't brave enough to do it on this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have a --use-deprecated=setuptools-resolve-hack that enables delay_this and point people report to a tracking issue when they use this so we can get an idea if it’s actually useful after this. But I agree this should be a separate PR.

Meanwhile, I would put not responsible_for_failure after delay_this (requires_python should always be the first one to make unresolvable Python version conflits bubble up as soon as possible).

Copy link
Member Author

@notatallshaw notatallshaw Sep 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am running through my known test cases and I found a requirements file where the proposed change to the return order here makes it significantly slower to resolve (in fact it hasn't finished resolving):

# Python 3.7 only
poetry2setup
wheel
twine

Without the proposed change it significantly backtracks on "keyring" to "19.2.0", then after 1 more significant backtrack on "importlib_metadata" it resolves quickly.

With the proposed change the resolver backtracks on "keyring" to "15.1.0", backtracks significantly on "importlib_metadata" (but much slower), then backtracks significantly on "packaging", and that is as far as I've been able to get after an hour of backtracking.

I have not yet been able to figure out all the specific details. But I hypothesize it is very efficient to check single requirements (e.g. direct and pinned), even versus following backtrack failures.

Let me know what you think.

src/pip/_internal/resolution/resolvelib/provider.py Outdated Show resolved Hide resolved
@uranusjr
Copy link
Member

(Also, please fix the linter errors first so the actual tests can run.)

@notatallshaw
Copy link
Member Author

notatallshaw commented Sep 21, 2021

Added requested function and changed name to "backtrack_causes" instead of "failure_causes", I think this makes more sense in context, as these aren't failures of the current state but of the state that the resolver backtracked from.

@notatallshaw
Copy link
Member Author

I'm realizing this is going to require modifying a lot of test cases or at least test set-up. In most cases though it should just require adding: backtrack_causes=[] on to calling get_preference.

Could also make the backtrack_causes argument optional, but that feels like it could lead to accidental bugs.

But before proceeding either way I'll wait to see if you think the approach is correct, maybe there's a way I should do this without adding an extra argument.

@uranusjr
Copy link
Member

I think adding a new (required, not optional) argument is the way to go. Not entirely sure backtrack_causes is the best name though, let’s not change the tests until we land the ResolveLib changes first (which would include deciding the argument name to use since ResolveLib always call provider methods with named parameters).

news/10479.feature.rst Outdated Show resolved Hide resolved
@pradyunsg
Copy link
Member

Not gonna run the CI, since we know it's gonna fail until we update the vendored resolvelib, which needs a resolvelib release. :)

@pradyunsg
Copy link
Member

FWIW, @notatallshaw -- would it be possible for you to share the list of small requirements.txt files that trigger the backtracking behaviour, so that others can see how this behaves?

@notatallshaw
Copy link
Member Author

notatallshaw commented Sep 26, 2021

FWIW, @notatallshaw -- would it be possible for you to share the list of small requirements.txt files that trigger the backtracking behaviour, so that others can see how this behaves?

Yes, been meaning to do this for a while but it ends up being pretty time consuming to go back and recheck everything.

Where I've written Linux or Windows only it's usually because system dependencies are also required. I've also made note of where the requirements file end up in ResolutionImpossible as I found it interesting even in these cases this optimization is still highly effective at reducing backtracking.

Requirements File Notes Reported Location
requirements_1.txt #10201
requirements_2.txt #10201
requirements_3.txt #10373
requirements_4.txt Python 3.7 only #9215
requirements_5.txt Linux only #9187 (comment)
requirements_6.txt Windows Only, I haven't reproduced #10479 (comment)
requirements_7.txt ResolutionImpossible neurolibre/neurolibre#13
requirements_8.txt #10230
requirements_9.txt ResolutionImpossible #9187 (comment)
requirements_10.txt ResolutionImpossible, Linux Only, Must install Six first #9187

I would say I mostly tested against 1, 2, and 4 as in my testing they seem to represent a good range of challenges when it comes to finding a solution. Also I normally test in a clean virtual environment using the command python -m pip download -r {requirements_file} -d downloads, this usually saves me from having to wipe my environment each time I need to test.

Also I came across several more problematic requirements files but these were the ones I was able to reproduce today (except where noted).

@github-actions github-actions bot added the needs rebase or merge PR has conflicts with current master label Sep 26, 2021
@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label Sep 27, 2021
@notatallshaw
Copy link
Member Author

notatallshaw commented Sep 28, 2021

Question on Pip process: Once there is a resolvelib release what is the process for vendoring the new version? Is is a separate PR? Do I include the updated config in this PR and then it automatically happens in the CI pipeline? Or do I run the vendoring sync locally and include the resolvelib changes in this PR?

@pfmoore
Copy link
Member

pfmoore commented Sep 28, 2021

Vendoring a new release of resolvelib (or indeed any vendored package) should be done as a separate PR. There's developer docs on how to do it, I believe, but basically it's a matter of updating vendor.txt and running tox -e vendoring.

@notatallshaw
Copy link
Member Author

notatallshaw commented Sep 28, 2021

Vendoring a new release of resolvelib (or indeed any vendored package) should be done as a separate PR. There's developer docs on how to do it, I believe, but basically it's a matter of updating vendor.txt and running tox -e vendoring.

Thanks I wasn't sure because a lot of tests will break with a new version of resolvelib until this PR lands due to the additional argument in the get_preference method.

@pfmoore
Copy link
Member

pfmoore commented Sep 28, 2021

Ah, I see. I still think it should be a separate PR, but merging the two PRs should probably be co-ordinated, so I'll leave that to @pradyunsg or @uranusjr, as they've been the most directly involved with this work.

@uranusjr
Copy link
Member

uranusjr commented Sep 28, 2021

It's easiest to do it in this PR. We should ge the current code kind-of working (I think it's close, just need confirmation from @pradyunsg), do one commit to update resolvelib, and then merge.

On second thought, since the release will be done by either me or @pradyunsg, it could actually be easier for whoever does the release to also do the vendoring update, and rebase this PR to the branch instead.

@edmorley
Copy link
Contributor

edmorley commented Oct 2, 2021

Hi! Many thanks for everyone's hard work here :-) I'm hoping that this PR makes it to the 21.3 release - and from reading the above, I'm presuming that's the intention? If so, should this PR be added to the 21.3 milestone?

@pradyunsg pradyunsg added this to the 21.3 milestone Oct 2, 2021
@pradyunsg
Copy link
Member

I'll add it in. I'm looking at our set of vendored dependencies right now, and will likely end up picking this up this weekend as well.

@notatallshaw
Copy link
Member Author

notatallshaw commented Oct 2, 2021

Once resolvelib lands I'll need to rebase and fix this test I recently added in a related PR: https://github.com/pypa/pip/blob/main/tests/unit/resolution_resolvelib/test_provider.py#L29

But then I think the PR will be good to go pending some more thoughts and suggestions from the PIP maintainers. I'll also re-run through all the test cases I can still reproduce.

@pradyunsg pradyunsg added the !release blocker Hold a release until this is resolved label Oct 3, 2021
@pradyunsg
Copy link
Member

Alrighty. The resolvelib 0.8.0 release has been made. Could you go ahead and update this PR @notatallshaw?

Here's what I would suggest doing, to update this PR to use resolvelib 0.8.0:

  • Rebase on the current main branch, while squashing all the commits currently in this PR into a single commit -- this should be doable with a single git rebase -i command.
  • Modify src/pip/_vendor/vendor.txt to use resolvelib==0.8.0.
  • Run nox -s vendoring OR tox -s vendoring -- this will bring in the newer version of resolvelib.
  • Add a news/resolvelib.vendor.rst with the content: "Update resolvelib to 0.8.0".
  • Commit these changes, a commit message similar to "Update resolvelib to 0.8.0".
  • Make modifications to the code to adapt for changes in resolvelib 0.8.0, as appropriate, in separate follow up commits. :)

@notatallshaw
Copy link
Member Author

* Rebase on the current `main` branch, while squashing all the commits currently in this PR into a single commit -- this should be doable with a single `git rebase -i` command.

I tried doing this but it seemed to add even more commits on to this pull request. Not exactly sure what the goal is here, my knowledge of using git in such a large project is very limited.

@pradyunsg
Copy link
Member

Would it be fine if I did squashing instead then?

@pradyunsg pradyunsg changed the title Prefer failure causes when backtracking Prefer backtracking on dependencies involved in the most recent conflict Oct 9, 2021
@pradyunsg
Copy link
Member

You should be able to run:

git stash --include-untracked
git fetch origin prefer_failures
git reset --hard FETCH_HEAD

And, that should get you synced up with the rebased version of this branch. :)

@pradyunsg
Copy link
Member

Hmm... I'm gonna fiddle with this for a bit. Gimem a few minutes. :)

notatallshaw and others added 5 commits October 9, 2021 09:41
This can significantly reduce the amount of backtracking required, by
avoiding backtracking on unrelated packages in the dependency graph.

Co-authored-by: Pradyun Gedam <pradyunsg@gmail.com>
@@ -66,12 +66,13 @@ def __init__(
def identify(self, requirement_or_candidate: Union[Requirement, Candidate]) -> str:
return requirement_or_candidate.name

def get_preference(
def get_preference( # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update resolvelib's type annotations, to fix this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies I missed this. I've never worked on a project before that splits the components in to different repos like this, I should of tried to run the test cases all together before merge in to resolvelib.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. The fact that this was missed is more a symptom of a bad process, and that's not something you need to apologise for. :)

@pradyunsg
Copy link
Member

Alrighty. I'm done.

@notatallshaw
Copy link
Member Author

Would it be fine if I did squashing instead then?

Yes, absolutely, I'm on a flight right now to a friend's wedding and may not be able to spend much time over the next few days, so feel free to push whatever needs pushing.

I don't have any ownership feelings over this PR I just want to help Pip. But if you do need me to do something I'll catch up as soon as I can.

@pradyunsg pradyunsg dismissed stale reviews from pfmoore and themself October 9, 2021 13:54

Made these changes.

@pradyunsg pradyunsg requested a review from a team October 9, 2021 13:54
@pradyunsg pradyunsg mentioned this pull request Oct 9, 2021
@pradyunsg pradyunsg merged commit 9f18a40 into pypa:main Oct 9, 2021
@pradyunsg
Copy link
Member

Thanks for the review @uranusjr! And, @notatallshaw, for all the work investigating and implementing this! ^>^

@notatallshaw
Copy link
Member Author

notatallshaw commented Oct 10, 2021

Thanks for the review @uranusjr! And, @notatallshaw, for all the work investigating and implementing this! ^>^

Here's to hoping it doesn't cause more issues than it solves!

Feel free to ping me on 21.3+ heavy backtracking reports. I am very invested at this point to keep resolving this issue

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
!release blocker Hold a release until this is resolved
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants