Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lockfile-normalized marker representation #5179

Closed
konstin opened this issue Jul 18, 2024 · 2 comments
Closed

Lockfile-normalized marker representation #5179

konstin opened this issue Jul 18, 2024 · 2 comments
Assignees
Labels
enhancement New feature or improvement to existing functionality preview Experimental behavior

Comments

@konstin
Copy link
Member

konstin commented Jul 18, 2024

We need to decide on which normalization we want for markers in the lockfile, and implement the algorithm that performs this normalization. For example, #5163 introduces:

(python_full_version <= '3.10' and python_version == '3.10') or (python_full_version <= '3.10' and python_version < '3.10') or (python_full_version <= '3.10' and python_version < '3.10' and python_version > '3.10') or (python_full_version <= '3.10' and python_version > '3.10') or (python_full_version > '3.10' and python_version == '3.10') or (python_full_version > '3.10' and python_version < '3.10') or (python_full_version > '3.10' and python_version < '3.10' and python_version > '3.10') or (python_full_version > '3.10' and python_version > '3.10')

This marker can be (completely) collapsed, in the current form it's not readable.

Since the marker representation changes the lockfile, we need a stable representation.

@konstin konstin added enhancement New feature or improvement to existing functionality preview Experimental behavior labels Jul 18, 2024
ibraheemdev added a commit that referenced this issue Aug 9, 2024
## Summary

This PR rewrites the `MarkerTree` type to use algebraic decision
diagrams (ADD). This has many benefits:
- The diagram is canonical for a given marker function. It is impossible
to create two functionally equivalent marker trees that don't refer to
the same underlying ADD. This also means that any trivially true or
unsatisfiable markers are represented by the same constants.
- The diagram can handle complex operations (conjunction/disjunction) in
polynomial time, as well as constant-time negation.
- The diagram can be converted to a simplified DNF form for user-facing
output.

The new representation gives us a lot more confidence in our marker
operations and simplification, which is proving to be very important
(see #5733 and
#5163).

Unfortunately, it is not easy to split this PR into multiple commits
because it is a large rewrite of the `marker` module. I'd suggest
reading through the `marker/algebra.rs`, `marker/simplify.rs`, and
`marker/tree.rs` files for the new implementation, as well as the
updated snapshots to verify how the new simplification rules work in
practice. However, a few other things were changed:
- [We now use release-only comparisons for `python_full_version`, where
we previously only did for
`python_version`](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/algebra.rs#L522).
I'm unsure how marker operations should work in the presence of
pre-release versions if we decide that this is incorrect.
- [Meaningless marker expressions are now
ignored](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/parse.rs#L502).
This means that a marker such as `'x' == 'x'` will always evaluate to
`true` (as if the expression did not exist), whereas we previously
treated this as always `false`. It's negation however, remains `false`.
- [Unsatisfiable markers are written as `python_version <
'0'`](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/tree.rs#L1329).
- The `PubGrubSpecifier` type has been moved to the new `uv-pubgrub`
crate, shared by `pep508-rs` and `uv-resolver`. `pep508-rs` also depends
on the `pubgrub` crate for the `Range` type, we probably want to move
`pubgrub::Range` into a separate crate to break this, but I don't think
that should block this PR (cc @konstin).

There is still some remaining work here that I decided to leave for now
for the sake of unblocking some of the related work on the resolver.
- We still use `Option<MarkerTree>` throughout uv, which is unnecessary
now that `MarkerTree::TRUE` is canonical.
- The `MarkerTree` type is now interned globally and can potentially
implement `Copy`. However, it's unclear if we want to add more
information to marker trees that would make it `!Copy`. For example, we
may wish to attach extra and requires-python environment information to
avoid simplifying after construction.
- We don't currently combine `python_full_version` and `python_version`
markers.
- I also have not spent too much time investigating performance and
there is probably some low-hanging fruit. Many of the test cases I did
run actually saw large performance improvements due to the markers being
simplified internally, reducing the stress on the old `normalize`
routine, especially for the extremely large markers seen in
`transformers` and other projects.

Resolves #5660,
#5179.
zanieb pushed a commit that referenced this issue Aug 9, 2024
This PR rewrites the `MarkerTree` type to use algebraic decision
diagrams (ADD). This has many benefits:
- The diagram is canonical for a given marker function. It is impossible
to create two functionally equivalent marker trees that don't refer to
the same underlying ADD. This also means that any trivially true or
unsatisfiable markers are represented by the same constants.
- The diagram can handle complex operations (conjunction/disjunction) in
polynomial time, as well as constant-time negation.
- The diagram can be converted to a simplified DNF form for user-facing
output.

The new representation gives us a lot more confidence in our marker
operations and simplification, which is proving to be very important
(see #5733 and
#5163).

Unfortunately, it is not easy to split this PR into multiple commits
because it is a large rewrite of the `marker` module. I'd suggest
reading through the `marker/algebra.rs`, `marker/simplify.rs`, and
`marker/tree.rs` files for the new implementation, as well as the
updated snapshots to verify how the new simplification rules work in
practice. However, a few other things were changed:
- [We now use release-only comparisons for `python_full_version`, where
we previously only did for
`python_version`](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/algebra.rs#L522).
I'm unsure how marker operations should work in the presence of
pre-release versions if we decide that this is incorrect.
- [Meaningless marker expressions are now
ignored](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/parse.rs#L502).
This means that a marker such as `'x' == 'x'` will always evaluate to
`true` (as if the expression did not exist), whereas we previously
treated this as always `false`. It's negation however, remains `false`.
- [Unsatisfiable markers are written as `python_version <
'0'`](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/tree.rs#L1329).
- The `PubGrubSpecifier` type has been moved to the new `uv-pubgrub`
crate, shared by `pep508-rs` and `uv-resolver`. `pep508-rs` also depends
on the `pubgrub` crate for the `Range` type, we probably want to move
`pubgrub::Range` into a separate crate to break this, but I don't think
that should block this PR (cc @konstin).

There is still some remaining work here that I decided to leave for now
for the sake of unblocking some of the related work on the resolver.
- We still use `Option<MarkerTree>` throughout uv, which is unnecessary
now that `MarkerTree::TRUE` is canonical.
- The `MarkerTree` type is now interned globally and can potentially
implement `Copy`. However, it's unclear if we want to add more
information to marker trees that would make it `!Copy`. For example, we
may wish to attach extra and requires-python environment information to
avoid simplifying after construction.
- We don't currently combine `python_full_version` and `python_version`
markers.
- I also have not spent too much time investigating performance and
there is probably some low-hanging fruit. Many of the test cases I did
run actually saw large performance improvements due to the markers being
simplified internally, reducing the stress on the old `normalize`
routine, especially for the extremely large markers seen in
`transformers` and other projects.

Resolves #5660,
#5179.
@ibraheemdev
Copy link
Member

We now markers in a simplified DNF form. Our simplification is not perfect, but it seems to work well in practice. See #5992 for whether we should write as CNF in some cases, but for now I think we're in a good place.

@charliermarsh
Copy link
Member

Awesome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or improvement to existing functionality preview Experimental behavior
Projects
No open projects
Status: Done
Development

No branches or pull requests

3 participants