-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle restricted dependencies as implicit multiple-constraints dependencies #6969
base: main
Are you sure you want to change the base?
Conversation
617846c
to
224f6b3
Compare
224f6b3
to
5ee2526
Compare
Thanks for linking this to #8670 @radoering :-) Maybe we should rewrite Poetry in Rust if speed is an issue ^^' Jokes aside, having a resolving time this long is really an issue.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some notes on potential performance improvements; perhaps it could help speed things up :)
inverted_marker_dep = deps[0].with_constraint(EmptyConstraint()) | ||
inverted_marker_dep.marker = inverted_marker | ||
deps.append(inverted_marker_dep) | ||
return [dep for deps in by_name.values() for dep in deps] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return [dep for deps in by_name.values() for dep in deps] | |
return itertools.chain.from_iterable(by_name.values()) |
self, | ||
dependencies: Iterable[Dependency], | ||
active_extras: Collection[NormalizedName] | None, | ||
) -> list[Dependency]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value here is used only for _get_dependencies_with_overrides
, which (unlike the annotations suggest), should accept any Iterable[Dependency]
.
So it doesn't need to return a list; any iterable will do:
) -> list[Dependency]: | |
) -> Iterable[Dependency]: |
With this, you can avoid creating the entire dependency list, e.g. using itertools
, or by turning this method into a generator .
by_name: dict[str, list[Dependency]] = defaultdict(list) | ||
for dep in dependencies: | ||
by_name[dep.name].append(dep) | ||
for _name, deps in by_name.items(): | ||
marker = marker_union(*[d.marker for d in deps]) | ||
if marker.is_any(): | ||
continue | ||
inverted_marker = marker.invert() | ||
if self._is_relevant_marker(inverted_marker, active_extras): | ||
# Set constraint to empty to mark dependency as "not required". | ||
inverted_marker_dep = deps[0].with_constraint(EmptyConstraint()) | ||
inverted_marker_dep.marker = inverted_marker | ||
deps.append(inverted_marker_dep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These loops could be merged if 1) you use itertools.groupby
, with e.g. operators.attrgetter('name')
as key function, and 2) turn this method into a generator (e.g. with a yield from
in the first if
statement, and yield
in the second).
This way you can avoid creating temporary lists altogether, for a significant speedup.
marker = marker_union(*[d.marker for d in deps]) | ||
if marker.is_any(): | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is marker_union
also needed when e.g. len(deps) == 1
?
Because, at a glance, marker_union
looks like a rather expensive function call.
@@ -570,6 +570,9 @@ def complete_package( | |||
continue | |||
self.search_for_direct_origin_dependency(dep) | |||
|
|||
active_extras = None if package.is_root() else dependency.extras | |||
_dependencies = self._add_implicit_dependencies(_dependencies, active_extras) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since _dependencies
is only used once, it's probably better to skip the variable assignment, by inlining in into the _add_implicit_dependencies
call
# any other dependency for sure. | ||
for i, dep in enumerate(dependencies): | ||
if dep.constraint.is_empty(): | ||
new_dependencies.append(dependencies.pop(i)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list.pop
method can be very slow operation, and I think that it can be avoided here, by using a "blacklist" approach, e.g.
blacklist = set()
for dep in dependencies:
if dep.constraint.is_empty():
blacklist.add(dep)
break
Then later on in itertools.product
use
repeat=len(dependencies) - len(blacklist)
.
And when looping over dep in dependencies
again, simply skip it if dep in blacklist
.
This avoids the list.pop
operation, which has a time-complexity O(n)
, by relying on set.__contains__
, which is only O(1)
.
("python_version < '3.7'", "python_version >= '3.7'"), | ||
("sys_platform == 'linux'", "sys_platform != 'linux'"), | ||
( | ||
"python_version < '3.7' and sys_platform == 'linux'", | ||
"python_version >= '3.7' and sys_platform == 'linux'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think python<3.7
is relevant anymore
I think it is likely that you are micro-optimizing essentially irrelevant parts of the code. If you want to make performance improvements - recommend that the first thing to do is to profile, so that you spend your time optimizing the right things But perhaps I am wrong, and you are now seeing results much better than those in the comment at the top of the thread? If so - submit a merge request! |
I don't agree that improvements to the runtime complexity are the same as "micro-optimizing". Plus, my suggestions will also result in fewer lines of code, without harming readability. So even if the performance benefits are minimal, at the very least there are no disadvantages. |
Pull Request Check List
Resolves: #5506
Although I think that this PR makes the solver more correct it comes with a massive performance regression that is far from acceptible.
I carried out some measurements with example pyproject.toml files from other PRs. If locking succeeds without this PR, the same lock file is generated with this PR, it just takes longer...
Times for
poetry lock
with a warm cache:pyproject.toml
from ...Number of overrides:
pyproject.toml
from ...The data shows that the time seems to correlate with the number of overrides. Thus, I assume a more sophisticated algorithm to reduce the number of overrides or even a complete overhaul of how to handle multiple-constraints dependencies might be necessary. I can imagine to make the
VersionSolver
marker aware so that a version conflict is only a conflict if the intersection of markers is not empty. This way, overrides would not be necessary anymore and everything could be solved at once. However, that's probably a huge task.