Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate knowledge of disjoint markers into simplification #7760

Closed
charliermarsh opened this issue Sep 28, 2024 · 4 comments · Fixed by #9444
Closed

Incorporate knowledge of disjoint markers into simplification #7760

charliermarsh opened this issue Sep 28, 2024 · 4 comments · Fixed by #9444
Assignees
Labels
performance Potential performance improvement

Comments

@charliermarsh
Copy link
Member

There are some markers that we "know" cannot be true at the same time, despite the fact that it's not encoded in the grammar or schema at all.

For example, this is never true: platform_system == 'Linux' and sys_platform == 'darwin'.

If we encode knowledge about the markers that we "know" are disjoint, we can omit more forks, wheels, etc.

@charliermarsh charliermarsh added the performance Potential performance improvement label Sep 28, 2024
@charliermarsh
Copy link
Member Author

\cc @ibraheemdev

@charliermarsh charliermarsh self-assigned this Nov 20, 2024
@charliermarsh
Copy link
Member Author

charliermarsh commented Nov 20, 2024

I have this working, but it does mean all expressions like sys_platform == 'linux' get expanded to sys_platform == 'linux' and platform_system == 'Linux'" in the lockfile.

@charliermarsh
Copy link
Member Author

I could just replace sys_platform == 'linux' with platform_system == 'Linux', but then we wouldn't detect disjointness in cases like:

flask ; sys_platform == 'linux'
flask ; sys_platform == 'java1.8.0_51'

@mitsuhiko
Copy link
Contributor

I have this working, but it does mean all expressions like sys_platform == 'linux' get expanded to sys_platform == 'linux' and platform_system == 'Linux'" in the lockfile.

Could you not keep the knowledge around what the original marker expression was and serialize that into the lockfile instead. At read it should be rather trivial to expand it out when the lockfile is read? Performance wise this will be slightly worse but you presumably there won't be that many cases where that is necessary.

charliermarsh added a commit that referenced this issue Dec 7, 2024
## Summary

This is an alternative to #9344. If accepted, I need to audit the
codebase and call sites to apply it everywhere, but the basic idea is:
rather than encoding mutually-incompatible pairs of markers in the
representation itself, we have an additional method on `MarkerTree` that
expands the false-y definition to take into account assumptions about
which markers can be true alongside others. We then check if the the
current marker implies that at least one of them is true.

So, for example, we know that `sys_platform == 'win32'` and
`platform_system == 'Darwin'` are mutually exclusive. When given a
marker expression like `python_version >= '3.7'`, we test if
`python_version >= '3.7'` and `sys_platform != 'win32' or
platform_system != 'Darwin'` are disjoint, i.e., if the following can't
be satisfied:

```
python_version >= '3.7' and (sys_platform != 'win32' or platform_system != 'Darwin')
```

Since, if this can't be satisfied, it implies that the left-hand
expression requires `sys_platform == 'win32'` and `platform_system ==
'Darwin'` to be true at the same time.

I think the main downsides here are:

1. We can't _simplify_ markers based on these implications. So we'd
still write markers like `sys_platform == 'win32' and platform_system !=
'Darwin'`, even though we know the latter expression is redundant.
2. It might be expensive? I'm not sure. I don't think we test for
falseness _that_ often though.

Closes #7760.
Closes #9275.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Potential performance improvement
Projects
None yet
2 participants