-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[red-knot] optimize building large unions of literals #17403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
CodSpeed Performance ReportMerging #17403 will improve performances by ×41Comparing Summary
Benchmarks breakdown
|
sharkdp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks. The fact that we now show union elements grouped by literal type also seems like a UX improvement, even if we potentially disrupt the original source code order. It seems unlikely that someone wants to preserve the order of Literal[1] | Literal["a"] | Literal[2].
I guess that's pretty good... |
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
## Summary Now that we've made the large-unions benchmark fast, let's make it slow again! This adds a following operation (checking `len`) on the large union, which is slow, even though building the large union is now fast. (This is also observed in a real-world code sample.) It's slow because for every element of the union, we fetch its `__len__` method and check it for compatibility with `Sized`. We can make this fast by extending the grouped-types approach, as discussed in #17403, so that we can do this `__len__` operation (which is identical for every literal string) just once for all literal strings, instead of once per literal string type in the union. Until we do that, we can make this acceptably fast again for now by setting a lowish limit on union size, which we can increase in the future when we make it fast. This is what I'll do in the next PR. ## Test Plan `cargo bench --bench red_knot`
* main: (44 commits) [`airflow`] Extend `AIR311` rules (#17422) [red-knot] simplify union size limit handling (#17429) [`airflow`] Extract `AIR311` from `AIR301` rules (`AIR301`, `AIR311`) (#17310) [red-knot] set a size limit on unions of literals (#17419) [red-knot] make large-union benchmark slow again (#17418) [red-knot] optimize building large unions of literals (#17403) [red-knot] Fix comments in type_api.md (#17425) [red-knot] Do not assume that `x != 0` if `x` inhabits `~Literal[0]` (#17370) [red-knot] make large-union benchmark more challenging (#17416) [red-knot] Acknowledge that `T & anything` is assignable to `T` (#17413) Update Rust crate clap to v4.5.36 (#17381) Raise syntax error when `\` is at end of file (#17409) [red-knot] Add regression tests for narrowing constraints cycles (#17408) [red-knot] Add some knowledge of `__all__` to `*`-import machinery (#17373) Update taiki-e/install-action digest to be7c31b (#17379) Update Rust crate mimalloc to v0.1.46 (#17382) Update PyO3/maturin-action action to v1.49.1 (#17384) Update Rust crate anyhow to v1.0.98 (#17380) dependencies: switch from `chrono` to `jiff` Update Rust crate bstr to v1.12.0 (#17385) ...
…alsy (#17451) In #17403 I added a comment asserting that all same-kind literal types share all the same super-types. This is true, with two notable exceptions: the types `AlwaysTruthy` and `AlwaysFalsy`. These two types are super-types of some literal types within a given kind and not others: `Literal[0]`, `Literal[""]`, and `Literal[b""]` inhabit `AlwaysFalsy`, while other literals inhabit `AlwaysTruthy`. This PR updates the literal-unions optimization to handle these types correctly. Fixes #17447 Verified locally that `QUICKCHECK_TESTS=100000 cargo test -p red_knot_python_semantic -- --ignored types::property_tests::stable` now passes again.
Summary
Special-case literal types in
UnionBuilderto speed up building large unions of literals.This optimization is extremely effective at speeding up building even a very large union (it improves the large-unions benchmark by 41x!). The problem we can run into is that it is easy to then run into another operation on the very large union (for instance, narrowing may add it to an intersection, which then distributes it over the intersection) which is still slow.
I think it is possible to avoid this by extending this optimized "grouped" representation throughout not just
UnionBuilder, but all of our union and intersection representations. I have some work in this direction, but rather than spending more time on it right now, I'd rather just land this much, along with a limit on the size of these unions (to avoid building really big unions quickly and then hitting issues where they are used.)Test Plan
Existing tests and benchmarks.