Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode mutually-incompatible pairs of markers #9444

Merged
merged 5 commits into from
Dec 7, 2024
Merged

Conversation

charliermarsh
Copy link
Member

Summary

This is an alternative to #9344. If accepted, I need to audit the codebase and call sites to apply it everywhere, but the basic idea is: rather than encoding mutually-incompatible pairs of markers in the representation itself, we have an additional method on MarkerTree that expands the false-y definition to take into account assumptions about which markers can be true alongside others. We then check if the the current marker implies that at least one of them is true.

So, for example, we know that sys_platform == 'win32' and platform_system == 'Darwin' are mutually exclusive. When given a marker expression like python_version >= '3.7', we test if python_version >= '3.7' and sys_platform != 'win32' or platform_system != 'Darwin' are disjoint, i.e., if the following can't be satisfied:

python_version >= '3.7' and (sys_platform != 'win32' or platform_system != 'Darwin')

Since, if this can't be satisfied, it implies that the left-hand expression requires sys_platform == 'win32' and platform_system == 'Darwin' to be true at the same time.

I think the main downsides here are:

  1. We can't simplify markers based on these implications. So we'd still write markers like sys_platform == 'win32' and platform_system != 'Darwin', even though we know the latter expression is redundant.
  2. It might be expensive? I'm not sure. I don't think we test for falseness that often though.

Closes #7760.
Closes #9275.

@charliermarsh charliermarsh added bug Something isn't working enhancement New feature or improvement to existing functionality labels Nov 26, 2024
value: "Linux".to_string(),
},
),
// os_name == 'posix' and platform_system == 'Windows'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the bright side, this is highly flexible so we can encode more known incompatibilities, like this.

@@ -494,7 +494,7 @@ impl<'d> Forker<'d> {
let mut envs = vec![];
{
let not_marker = self.marker.negate();
if !env_marker.is_disjoint(&not_marker) {
if !env_marker.is_disjoint(&not_marker) && !env_marker.is_conflicting() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to do this in more places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a pain... Right now, env_marker.is_disjoint(&not_marker) returns true if either of the expressions are false, but that doesn't include the is_conflicting definition here -- it only looks at the FALSE node.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could make is_conflicting into simplify_conflicting(...) -> MarkerTree that returns a FALSE node if it matched one of the impossible cases, that might simplify

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think the approach makes sense though?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'd have to try it out, but it does sound more elegant in theory

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

andrew's idea is even better than that

] {
let mut a = MarkerTree::expression(a);
let b = MarkerTree::expression(b);
a.and(b);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to cache those, @ibraheemdev will know the perf characteristics

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in a OnceCell so it should be good.

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea here is sound.

My main concern is probably what you might expect: that you have to remember to call this in a bunch of places around the code. I suppose the mitigation to that is that this isn't a correctness issue but a simplification issue.

Is there any way to do this simplification via MarkerTree's smart constructors? Before @ibraheemdev's work, we had this problem of markers possibly being simplified or not, and we'd call the simplification routine in various spots. But now we have a nice guarantee that markers are always simplified, as guaranteed by the smart constructors. Maybe we can do that in our MarkerTree::and and MarkerTree::or methods?

@charliermarsh
Copy link
Member Author

Yeah I like that idea.

@ibraheemdev
Copy link
Member

ibraheemdev commented Nov 27, 2024

I wonder if you could even define MarkerTree::TRUE and MarkerTree::FALSE to be the set of known impossible states.

@charliermarsh
Copy link
Member Author

Ok, the version I pushed does this in the smart constructors...

@@ -695,12 +697,14 @@ impl MarkerTree {
#[allow(clippy::needless_pass_by_value)]
pub fn and(&mut self, tree: MarkerTree) {
self.0 = INTERNER.lock().and(self.0, tree.0);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to hold the lock and re-use it in simplify, or re-lock() as I'm doing here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably re-use it, these are pretty much only used by a single thread.

@charliermarsh charliermarsh force-pushed the charlie/dnf branch 2 times, most recently from b2415d6 to a1357ef Compare November 28, 2024 03:07
@charliermarsh
Copy link
Member Author

I think what I have here works, but I'm worried about the performance costs and complexity.

fn simplify(&mut self) {
if !self.0.is_false() {
let mutex = &*MUTUAL_EXCLUSIONS;
if INTERNER.lock().is_disjoint(self.0, mutex.0) {
Copy link
Member

@konstin konstin Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a while to understand that this is is_disjoint and not !is_disjoint because MUTUAL_EXCLUSIONS is the negated exclusions, so it's the valid space.

image

Plus when reading the code you need to consider we call this every time while building the marker, so when only B from A or B (we previously saw in a lockfile) lies in the excluded area, we still get A because B was converted to FALSE upon construction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I be changing something in response to this comment? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an inline comment like:

MUTUAL_EXCLUSIONS is the possible space, if the markers are is_disjoint with it, they are solely in the impossible space.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'll leave the smart constructor review to andrew/ibraheem)

@charliermarsh
Copy link
Member Author

Ok, I think this version should have a minimal performance hit. I moved the logic into the algebra itself, and we now only perform this check when merging nodes that could yield a conflict (i.e., two marker trees that contain variables that could conflict). Per @ibraheemdev suggestion, I made those variables "highest-priority", so if they aren't present at the top of the marker tree, we know they aren't present anywhere beneath it.

@charliermarsh charliermarsh force-pushed the charlie/dnf branch 4 times, most recently from 255fae9 to 7580a50 Compare November 29, 2024 19:03

// Create the output node.
guard.create_node(func, children)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need the above methods because I need versions that don't recursively call .exclusions(). If I just use and and or here, I create an infinite loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.


if let Some(exclusions) = self.state.exclusions {
return exclusions;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it's fine to just use an Option here rather than a OnceCell?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Rust wouldn't let you mess this up.

return true;
}

let (x, y) = (self.shared.node(xi), self.shared.node(yi));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge fan of having this entirely separate method. But it's equivalent to is_disjoint, except it doesn't do the mutually-incompatible marker check. If it did, we'd create an infinite loop, since is_disjoint calls and in that case which then calls disjointness.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I feel like a little copying here is probably the way to go. I don't see a simple way of unifying them without making it more obscure.

Might be worth moving this comment into the code, for future wranglers wondering about it.

// Determine whether the conjunction _could_ contain a conflict. As an optimization, we only
// have to perform this check at the top-level, since these variables are given higher
// priority in the tree. In other words, if they're present, they _must_ be at the top.
let conflicts = x.var.conflicts() && y.var.conflicts();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to give the "possibly-conflicting" markers highest priority, so they're always at the top of the tree.

As such, when we AND two expressions, we only have to perform the "possibly-conflicting" check if they both have a conflicting variable at the very top.

@charliermarsh charliermarsh force-pushed the charlie/dnf branch 6 times, most recently from 13c46e1 to d8f094a Compare December 1, 2024 14:25
@charliermarsh
Copy link
Member Author

Ok, I added some sorting to the DNF to minimize the amount of churn in the expressions.

@charliermarsh charliermarsh force-pushed the charlie/dnf branch 3 times, most recently from b42371b to 7b81730 Compare December 2, 2024 01:46
Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense to me. And I like that this doesn't require any sort of knowledge on the callers part to explicitly simplify the markers. Nice work. :-)

@@ -493,6 +494,19 @@ pub enum MarkerExpression {
},
}

/// The kind of a [`MarkerExpression`].
#[derive(Clone, Copy, Debug, Eq, Hash, PartialEq, PartialOrd, Ord)]
pub(crate) enum MarkerExpressionKind {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only used for backcompat sorting? If so, it might be useful to add a comment indicating as such. And that the order here matters?

return true;
}

let (x, y) = (self.shared.node(xi), self.shared.node(yi));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I feel like a little copying here is probably the way to go. I don't see a simple way of unifying them without making it more obscure.

Might be worth moving this comment into the code, for future wranglers wondering about it.


// Create the output node.
guard.create_node(func, children)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.


if let Some(exclusions) = self.state.exclusions {
return exclusions;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Rust wouldn't let you mess this up.

@charliermarsh charliermarsh enabled auto-merge (squash) December 7, 2024 01:44
@charliermarsh charliermarsh merged commit 508a6bc into main Dec 7, 2024
64 checks passed
@charliermarsh charliermarsh deleted the charlie/dnf branch December 7, 2024 01:51
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Dec 12, 2024
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [astral-sh/uv](https://github.com/astral-sh/uv) | patch | `0.5.7` -> `0.5.8` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>astral-sh/uv (astral-sh/uv)</summary>

### [`v0.5.8`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#058)

[Compare Source](astral-sh/uv@0.5.7...0.5.8)

**This release does not include the `powerpc64le-unknown-linux-musl` target due to a build issue. See [#&#8203;9793](astral-sh/uv#9793) for details. If this change affects you, please file an issue with your use-case.**

##### Enhancements

-   Omit empty resolution markers in lockfile  ([#&#8203;9738](astral-sh/uv#9738))
-   Add `--install-dir` to to `uv python install` and `uninstall` commands ([#&#8203;7920](astral-sh/uv#7920))
-   Add `--show-urls` and `--only-downloads` to `uv python list` ([#&#8203;8062](astral-sh/uv#8062))
-   Add `uv python list --all-arches` ([#&#8203;9782](astral-sh/uv#9782))
-   Add `uv run --gui-script` flag for running Python scripts with `pythonw.exe` ([#&#8203;9152](astral-sh/uv#9152))
-   Allow `--gui-script` on Unix ([#&#8203;9787](astral-sh/uv#9787))
-   Allow download of Python distribution variants optimized for newer x86\_64 microarchitectures ([#&#8203;9781](astral-sh/uv#9781))
-   Allow execution of `pyw` files on Unix ([#&#8203;9759](astral-sh/uv#9759))
-   Allow users to specify URLs in `project.dependencies` and `tool.uv.sources` ([#&#8203;9718](astral-sh/uv#9718))
-   Encode mutually-incompatible pairs of markers ([#&#8203;9444](astral-sh/uv#9444))
-   Improve the error message when a Python install request is not valid ([#&#8203;9783](astral-sh/uv#9783))
-   Preserve directory-level standalone build symlinks ([#&#8203;9723](astral-sh/uv#9723))
-   Add support for `uv publish --index <name>` ([#&#8203;9694](astral-sh/uv#9694))
-   Reframe `--locked` and `--frozen` as `--check` operations for `uv lock` ([#&#8203;9662](astral-sh/uv#9662))
-   Rename Python install scratch directory from `.cache` -> `.temp` ([#&#8203;9756](astral-sh/uv#9756))
-   Enable `uv tool uninstall uv` on Windows ([#&#8203;8963](astral-sh/uv#8963))
-   Improve self-dependency hint to make shadowing clear ([#&#8203;9716](astral-sh/uv#9716))
-   Refactor unavailable metadata to shrink the resolver ([#&#8203;9769](astral-sh/uv#9769))
-   Show 'depends on itself' for proxy packages ([#&#8203;9717](astral-sh/uv#9717))
-   Show a dedicated error for missing subdirectories ([#&#8203;9761](astral-sh/uv#9761))
-   Show a dedicated hint for missing `git+` prefixes ([#&#8203;9789](astral-sh/uv#9789))

##### Performance

-   Eagerly error when parsing `pyproject.toml` requirements ([#&#8203;9704](astral-sh/uv#9704))
-   Use copy-on-write when normalizing paths ([#&#8203;9710](astral-sh/uv#9710))

##### Bug fixes

-   Avoid enforcing non-conflicts in `uv export` ([#&#8203;9751](astral-sh/uv#9751))
-   Don't drop comments between items in TOML tables ([#&#8203;9784](astral-sh/uv#9784))
-   Don't fail with `--no-build` when static metadata is available ([#&#8203;9785](astral-sh/uv#9785))
-   Don't filter non-patch registry version ([#&#8203;9736](astral-sh/uv#9736))
-   Don't read metadata from stale `.egg-info` files ([#&#8203;9760](astral-sh/uv#9760))
-   Enforce correctness of self-dependencies ([#&#8203;9705](astral-sh/uv#9705))
-   Fix projects's typo in resolver error messages ([#&#8203;9708](astral-sh/uv#9708))
-   Ignore `.` prefixed directories during managed Python installation discovery ([#&#8203;9786](astral-sh/uv#9786))
-   Improve handling of invalid virtual environments during interpreter discovery ([#&#8203;8086](astral-sh/uv#8086))
-   Normalize relative paths when `--project` is specified ([#&#8203;9709](astral-sh/uv#9709))
-   Respect self-constraints on recursive extras ([#&#8203;9714](astral-sh/uv#9714))
-   Respect user settings for tracing coloring ([#&#8203;9733](astral-sh/uv#9733))
-   Retry on tar extraction errors ([#&#8203;9753](astral-sh/uv#9753))
-   Add conflict markers to the lock file ([#&#8203;9370](astral-sh/uv#9370))
-   De-duplicate resolution markers ([#&#8203;9780](astral-sh/uv#9780))
-   Avoid 403 error hint for PyTorch URLs ([#&#8203;9750](astral-sh/uv#9750))
-   Avoid treating non-existent `--find-links` as relative URLs ([#&#8203;9720](astral-sh/uv#9720))
-   Omit Windows Store `python3.13.exe` et al ([#&#8203;9679](astral-sh/uv#9679))
-   Replace executables with broken symlinks during `uv python install` ([#&#8203;9706](astral-sh/uv#9706))

##### Documentation

-   Fix build failure links ([#&#8203;9740](astral-sh/uv#9740))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40NDAuNyIsInVwZGF0ZWRJblZlciI6IjM3LjQ0MC43IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or improvement to existing functionality
Projects
None yet
4 participants