Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer direct conflicting causes when backtracking #12499

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

notatallshaw
Copy link
Member

@notatallshaw notatallshaw commented Jan 30, 2024

Fixes: #12498

This PR introduces enhancements to Pip's dependency resolution logic. Below is a description of the three main functions implemented in this PR:

  • _causes_with_conflicting_parent

    • This function identifies causes of conflict where a cause's parent requirement conflicts with another cause, or is not satisfied by them. It helps pinpoint critical conflicts, allowing the resolver to focus on significant barriers in the dependency graph.
  • _first_causes_with_no_candidates

    • This function finds at least one pair of causes whose combined specifiers have no possible candidates. It groups causes by name to minimize comparisons, then evaluates if combined specifiers lead to a scenario where no candidates are available. Due to the complexity of statically evaluating Python packaging specifiers, this function dynamically tests the combined specifier against potential candidates. Since this evaluation can be resource-intensive, the function exits early as soon as one incompatible pair is found.
  • narrow_requirement_selection

    • This method filters all potential causes to unsatisfied names that are direct conflicts identified by the new functions. Implementing this logic via get_preferences would lead to an adverse performance impact, introducing expensive new calls and potentially creating O(n^2) situations.

@notatallshaw
Copy link
Member Author

I have updated this PR based on the feedback in #12497 and updates in sarugaku/resolvelib#145 and

In summary I added more detailed comments and docstrings, improved the naming, simplified some of the logic to make it more readable, and removed the commit related to moving over existing functionality to this potentially new API method from resolvelib (I will make a seperate PR for that) and now this PR just focuses on the issue described in #12498

@notatallshaw notatallshaw force-pushed the prefer-conflicting-causes branch from e2e3f24 to 9aff0d0 Compare February 11, 2024 16:05
Copy link
Member

@pfmoore pfmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review the resolvelib changes, as I assume these are only temporary to get this patch working for now. They should be removed, and an updated version of resolvelib vendored in, before this PR is merged.

Also, can we have some tests for this code? It's complex enough that I don't love the idea of leaving it untested. Unit tests should be sufficient. (And reasonably-commented unit tests should help give a sense of what the code is doing).


:params causes: An iterable of PreferenceInformation

Returns a set of strings, each representing the name of a requirement or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Returns a set of strings, each representing the name of a requirement or
Returns a set of strings, representing the name of a requirement and

The code adds both the name and the parent's name if there's a parent. (There may be a better way of re-wording this, I don't think Github's "suggestion" mechanism allows for multi-line changes...)

) -> List["PreferenceInformation"]:
"""
Identifies causes that conflict because their parent package requirements
are not satisfied by another cause, or vice versa.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still struggling to understand the longic here. I'd suggest 2 improvements.

  1. Clarify the above sentence - I don't really understand what "their parent package requirements are not satisfied by another cause" means, or why the parent package requirements should be satisfied by another cause.
  2. Describe the data structure of "causes" here, explaining the elements that we're using. Something like "each cause contains a requirement specifier, and a "parent", which is the candidate which has that requirement. The list of causes only contains elements that have been detected by the resolver to be in conflict somehow." (I'm assuming that explanation is at least in part wrong, as it's not enough for me to be able to make sense of the function code...)


# If there are 2 or less causes then finding conflicts between
# them is not required as there will always be a minumum of two
# conflicts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd reword this as "unless there are more than 2 conflicts, there's nothing to simplify (because a conflict always involves at least 2 causes)".

if len(backtrack_causes) < 3:
return identifiers

# First, try to resolve direct causes based on conflicting parent packages
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# First, try to resolve direct causes based on conflicting parent packages
# If some of the causes have conflicting parents, focus on them.

# First, try to resolve direct causes based on conflicting parent packages
direct_causes = _causes_with_conflicting_parent(backtrack_causes)
if not direct_causes:
# If no conflicting parent packages found try to find some causes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# If no conflicting parent packages found try to find some causes
# Otherwise, try to find some causes

direct_causes = _causes_with_conflicting_parent(backtrack_causes)
if not direct_causes:
# If no conflicting parent packages found try to find some causes
# that share the same requirement name but no common candidate,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# that share the same requirement name but no common candidate,
# that share the same requirement name but no common candidate.

if not direct_causes:
# If no conflicting parent packages found try to find some causes
# that share the same requirement name but no common candidate,
# we take the first one of these as iterating through candidates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# we take the first one of these as iterating through candidates
# Stop after finding the first one of these as iterating through candidates

direct_causes = _first_causes_with_no_candidates(
backtrack_causes, candidates
)
if direct_causes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if direct_causes:
# If we found some causes worth focusing on, use them.
if direct_causes:

@notatallshaw notatallshaw marked this pull request as draft March 30, 2024 18:21
@notatallshaw
Copy link
Member Author

I am still working on this but I am marking it as a draft until I have addressed all of @pfmoore's comments and until resolvelib decides whether to accept sarugaku/resolvelib#145 and make a release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize Backtracking in Pip's Dependency Resolution By Prioritizing Direct Conflicts
2 participants