Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle restricted dependencies as implicit multiple-constraints dependencies #6969

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

radoering
Copy link
Member

Pull Request Check List

Resolves: #5506

  • Added tests for changed code.
  • Updated documentation for changed code.

Although I think that this PR makes the solver more correct it comes with a massive performance regression that is far from acceptible.

I carried out some measurements with example pyproject.toml files from other PRs. If locking succeeds without this PR, the same lock file is generated with this PR, it just takes longer...

Times for poetry lock with a warm cache:

pyproject.toml from ... time without PR time with PR
#3367 0.8 s 2.8 s
#4670 1.8 s 4.2 s
#4870 71 s 11800 s (not a typo)
#5506 error after 4.5 s 1090 s
shootout example 3.9 s 250 s

Number of overrides:

pyproject.toml from ... number of overrides without PR number of overrides with PR
#3367 4 10
#4670 16 19
#4870 46 4179
#5506 - 288
shootout example 0 69

The data shows that the time seems to correlate with the number of overrides. Thus, I assume a more sophisticated algorithm to reduce the number of overrides or even a complete overhaul of how to handle multiple-constraints dependencies might be necessary. I can imagine to make the VersionSolver marker aware so that a version conflict is only a conflict if the intersection of markers is not empty. This way, overrides would not be necessary anymore and everything could be solved at once. However, that's probably a huge task.

@jeertmans
Copy link

Thanks for linking this to #8670 @radoering :-)

Maybe we should rewrite Poetry in Rust if speed is an issue ^^'

Jokes aside, having a resolving time this long is really an issue..

Copy link

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some notes on potential performance improvements; perhaps it could help speed things up :)

inverted_marker_dep = deps[0].with_constraint(EmptyConstraint())
inverted_marker_dep.marker = inverted_marker
deps.append(inverted_marker_dep)
return [dep for deps in by_name.values() for dep in deps]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return [dep for deps in by_name.values() for dep in deps]
return itertools.chain.from_iterable(by_name.values())

self,
dependencies: Iterable[Dependency],
active_extras: Collection[NormalizedName] | None,
) -> list[Dependency]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value here is used only for _get_dependencies_with_overrides, which (unlike the annotations suggest), should accept any Iterable[Dependency].
So it doesn't need to return a list; any iterable will do:

Suggested change
) -> list[Dependency]:
) -> Iterable[Dependency]:

With this, you can avoid creating the entire dependency list, e.g. using itertools, or by turning this method into a generator .

Comment on lines +882 to +894
by_name: dict[str, list[Dependency]] = defaultdict(list)
for dep in dependencies:
by_name[dep.name].append(dep)
for _name, deps in by_name.items():
marker = marker_union(*[d.marker for d in deps])
if marker.is_any():
continue
inverted_marker = marker.invert()
if self._is_relevant_marker(inverted_marker, active_extras):
# Set constraint to empty to mark dependency as "not required".
inverted_marker_dep = deps[0].with_constraint(EmptyConstraint())
inverted_marker_dep.marker = inverted_marker
deps.append(inverted_marker_dep)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These loops could be merged if 1) you use itertools.groupby, with e.g. operators.attrgetter('name') as key function, and 2) turn this method into a generator (e.g. with a yield from in the first if statement, and yield in the second).
This way you can avoid creating temporary lists altogether, for a significant speedup.

Comment on lines +886 to +888
marker = marker_union(*[d.marker for d in deps])
if marker.is_any():
continue

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is marker_union also needed when e.g. len(deps) == 1?
Because, at a glance, marker_union looks like a rather expensive function call.

@@ -570,6 +570,9 @@ def complete_package(
continue
self.search_for_direct_origin_dependency(dep)

active_extras = None if package.is_root() else dependency.extras
_dependencies = self._add_implicit_dependencies(_dependencies, active_extras)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since _dependencies is only used once, it's probably better to skip the variable assignment, by inlining in into the _add_implicit_dependencies call

# any other dependency for sure.
for i, dep in enumerate(dependencies):
if dep.constraint.is_empty():
new_dependencies.append(dependencies.pop(i))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list.pop method can be very slow operation, and I think that it can be avoided here, by using a "blacklist" approach, e.g.

blacklist = set()
for dep in dependencies:
    if dep.constraint.is_empty():
        blacklist.add(dep)
        break

Then later on in itertools.product use
repeat=len(dependencies) - len(blacklist).
And when looping over dep in dependencies again, simply skip it if dep in blacklist.

This avoids the list.pop operation, which has a time-complexity O(n), by relying on set.__contains__, which is only O(1).

Comment on lines +2107 to +2111
("python_version < '3.7'", "python_version >= '3.7'"),
("sys_platform == 'linux'", "sys_platform != 'linux'"),
(
"python_version < '3.7' and sys_platform == 'linux'",
"python_version >= '3.7' and sys_platform == 'linux'",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think python<3.7 is relevant anymore

@dimbleby
Copy link
Contributor

I left some notes on potential performance improvements; perhaps it could help speed things up :)

I think it is likely that you are micro-optimizing essentially irrelevant parts of the code. If you want to make performance improvements - recommend that the first thing to do is to profile, so that you spend your time optimizing the right things

But perhaps I am wrong, and you are now seeing results much better than those in the comment at the top of the thread? If so - submit a merge request!

@jorenham
Copy link

I left some notes on potential performance improvements; perhaps it could help speed things up :)

I think it is likely that you are micro-optimizing essentially irrelevant parts of the code.

I don't agree that improvements to the runtime complexity are the same as "micro-optimizing".

Plus, my suggestions will also result in fewer lines of code, without harming readability. So even if the performance benefits are minimal, at the very least there are no disadvantages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Solver breaks with related dependencies that are both conditional
4 participants