Merge identical forks #5405

konstin · 2024-07-24T12:16:12Z

Consider these requirements from pylint 3.2.5:

Requires-Dist: dill >=0.3.6 ; python_version >= "3.11"
Requires-Dist: dill >=0.3.7 ; python_version >= "3.12"

We will split on the python version, but then we may pick a version of dill that's >=0.3.7 in both branches and also have an otherwise identical resolution in both forks. In this case, we merge both forks and store only their conjoined markers.

Consider these requirements from pylint 3.2.5: ``` Requires-Dist: dill >=0.3.6 ; python_version >= "3.11" Requires-Dist: dill >=0.3.7 ; python_version >= "3.12" ``` We will split on the python version, but then we may pick a version of `dill` that's `>=0.3.7` in both branches and also have an otherwise identical resolution in both forks. In this case, we merge both forks and store only their conjoined markers.

konstin · 2024-07-24T12:16:31Z

crates/uv-resolver/src/resolver/mod.rs

@@ -2426,6 +2462,30 @@ pub(crate) struct ResolutionDependencyEdge {
 }

 impl Resolution {
+    fn universal() -> Self {


That method gets removed upstack

BurntSushi · 2024-07-24T13:13:22Z

crates/uv-resolver/src/resolver/mod.rs

+                            .map(ResolverMarkers::Fork)
+                            .unwrap_or(ResolverMarkers::Universal);
+                        continue 'FORK;
+                    }


I buy this, but two thoughts come to mind.

Firstly, is it possible for more than one duplicate resolution to exist at any given point in time? If so, this would I believe only find one of them. But, I do not think this is case, since this is run for every resolution before it is "saved." So it should never be the case that more than one duplicate resolution appears.

Secondly, this is doing an exhaustive search over all existing resolutions to find a possible duplicate. And I suspect that the Resolution::same_graph routine is itself not especially cheap. I think this ends up being quadratic in the number of forks (which are themselves exponential in the number of dependencies I think? or possibly in the depth in the dependency tree). I don't have a good feel for how big of an issue that is in practice. Do we have a sense of what the common case is? I would guess the common case is that there aren't any duplicates. So perhaps we can optimize for that path. (To be clear, I don't mean to suggest that be done in this PR.)

Firstly, is it possible for more than one duplicate resolution to exist at any given point in time? If so, this would I believe only find one of them. But, I do not think this is case, since this is run for every resolution before it is "saved." So it should never be the case that more than one duplicate resolution appears.

We fork every time we see conflicting markers, but in many of those cases the requirements themselves are not conflicting (say numpy >= 1.16 for one and numpy >= 1.19 for the other). When forking, we can't yet know whether we'll find a compatible numpy for both of. I've also seen cases where we end up rejecting the package version we forked on in both branches, removing the conflicting requirements. By copying over preferences from previous forks, we try to coerce two forks to resolving the same package version. Basically, our strategy is to fork often to avoid failing on avoidable conflicts, but still having a solution with as few divergences as possibles.

Re perf: I agree that this is potentially costly, but i think we have to do this to get desirable resolution. We have some short-cuts that we get from std that makes this cheaper: When two forks have a different number of packages, the check is a single usize comparison. We also usually have a small number of forks (and the more specific a fork is, the more likely it is we skip future fork points because we're already more specific), so i see this more as a fixed cost of maybe 10^2/2=50 checks. There are of course pathological cases, for those i think we just have to be a bit slow here avoid redundant forks in the lockfile.

## Summary This was added in #5405 but is now the cause of an instability in `github_wikidata_bot`. Specifically, on the initial run, we fork in `pydantic==2.8.2`, via: ``` Requires-Dist: typing-extensions>=4.12.2; python_version >= '3.13' Requires-Dist: typing-extensions>=4.6.1; python_version < '3.13' ``` In the end, we resolve a single version of `typing-extensions` (`4.12.2`)... But we don't recognize the two resolutions as the "same graph", because we propagate the fork markers, and so the "edges" have different markers on them... In the second run through, when we have the forks in advance, we don't split on Pydantic... We just try to solve from the root with the current forks. This is fundamentally different and I fear it will be the cause of many instabilities. But removing this graph check fixes the proximate issue. I don't really understand why this was added since there was no test coverage in the PR.

konstin added the preview Experimental behavior label Jul 24, 2024

konstin requested a review from BurntSushi July 24, 2024 12:16

konstin commented Jul 24, 2024

View reviewed changes

BurntSushi approved these changes Jul 24, 2024

View reviewed changes

konstin merged commit 93fb28f into main Jul 25, 2024
55 checks passed

konstin deleted the konsti/merge-identical-forks branch July 25, 2024 09:54

charliermarsh mentioned this pull request Aug 14, 2024

Remove same-graph merging in resolver #6077

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge identical forks #5405

Merge identical forks #5405

konstin commented Jul 24, 2024

konstin Jul 24, 2024

BurntSushi Jul 24, 2024

konstin Jul 25, 2024

Merge identical forks #5405

Merge identical forks #5405

Conversation

konstin commented Jul 24, 2024

konstin Jul 24, 2024

Choose a reason for hiding this comment

BurntSushi Jul 24, 2024

Choose a reason for hiding this comment

konstin Jul 25, 2024

Choose a reason for hiding this comment