Skip to content

Conversation

@carljm
Copy link
Contributor

@carljm carljm commented Apr 15, 2025

Summary

Special-case literal types in UnionBuilder to speed up building large unions of literals.

This optimization is extremely effective at speeding up building even a very large union (it improves the large-unions benchmark by 41x!). The problem we can run into is that it is easy to then run into another operation on the very large union (for instance, narrowing may add it to an intersection, which then distributes it over the intersection) which is still slow.

I think it is possible to avoid this by extending this optimized "grouped" representation throughout not just UnionBuilder, but all of our union and intersection representations. I have some work in this direction, but rather than spending more time on it right now, I'd rather just land this much, along with a limit on the size of these unions (to avoid building really big unions quickly and then hitting issues where they are used.)

Test Plan

Existing tests and benchmarks.

@github-actions
Copy link
Contributor

github-actions bot commented Apr 15, 2025

mypy_primer results

No ecosystem changes detected ✅

@codspeed-hq
Copy link

codspeed-hq bot commented Apr 15, 2025

CodSpeed Performance Report

Merging #17403 will improve performances by ×41

Comparing cjm/bigunions3 (a99bdc3) with main (13ea4e5)

Summary

⚡ 1 improvements
✅ 32 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
red_knot_micro[many_string_assignments] 3,211.2 ms 79 ms ×41

@AlexWaygood AlexWaygood added performance Potential performance improvement ty Multi-file analysis & type inference labels Apr 15, 2025
Copy link
Contributor

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks. The fact that we now show union elements grouped by literal type also seems like a UX improvement, even if we potentially disrupt the original source code order. It seems unlikely that someone wants to preserve the order of Literal[1] | Literal["a"] | Literal[2].

@dcreager
Copy link
Member

will improve performances by ×41

I guess that's pretty good...

carljm and others added 2 commits April 16, 2025 06:43
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
@carljm carljm enabled auto-merge (squash) April 16, 2025 13:55
@carljm carljm merged commit a1f3619 into main Apr 16, 2025
20 checks passed
@carljm carljm deleted the cjm/bigunions3 branch April 16, 2025 13:55
carljm added a commit that referenced this pull request Apr 16, 2025
## Summary

Now that we've made the large-unions benchmark fast, let's make it slow
again!

This adds a following operation (checking `len`) on the large union,
which is slow, even though building the large union is now fast. (This
is also observed in a real-world code sample.) It's slow because for
every element of the union, we fetch its `__len__` method and check it
for compatibility with `Sized`.

We can make this fast by extending the grouped-types approach, as
discussed in #17403, so that we
can do this `__len__` operation (which is identical for every literal
string) just once for all literal strings, instead of once per literal
string type in the union.

Until we do that, we can make this acceptably fast again for now by
setting a lowish limit on union size, which we can increase in the
future when we make it fast. This is what I'll do in the next PR.

## Test Plan

`cargo bench --bench red_knot`
dcreager added a commit that referenced this pull request Apr 16, 2025
* main: (44 commits)
  [`airflow`] Extend `AIR311` rules (#17422)
  [red-knot] simplify union size limit handling (#17429)
  [`airflow`] Extract `AIR311` from `AIR301` rules (`AIR301`, `AIR311`) (#17310)
  [red-knot] set a size limit on unions of literals (#17419)
  [red-knot] make large-union benchmark slow again (#17418)
  [red-knot] optimize building large unions of literals (#17403)
  [red-knot] Fix comments in type_api.md (#17425)
  [red-knot] Do not assume that `x != 0` if `x` inhabits `~Literal[0]` (#17370)
  [red-knot] make large-union benchmark more challenging (#17416)
  [red-knot] Acknowledge that `T & anything` is assignable to `T` (#17413)
  Update Rust crate clap to v4.5.36 (#17381)
  Raise syntax error when `\` is at end of file (#17409)
  [red-knot] Add regression tests for narrowing constraints cycles (#17408)
  [red-knot] Add some knowledge of `__all__` to `*`-import machinery (#17373)
  Update taiki-e/install-action digest to be7c31b (#17379)
  Update Rust crate mimalloc to v0.1.46 (#17382)
  Update PyO3/maturin-action action to v1.49.1 (#17384)
  Update Rust crate anyhow to v1.0.98 (#17380)
  dependencies: switch from `chrono` to `jiff`
  Update Rust crate bstr to v1.12.0 (#17385)
  ...
carljm added a commit that referenced this pull request Apr 18, 2025
…alsy (#17451)

In #17403 I added a comment asserting that all same-kind literal types
share all the same super-types. This is true, with two notable
exceptions: the types `AlwaysTruthy` and `AlwaysFalsy`. These two types
are super-types of some literal types within a given kind and not
others: `Literal[0]`, `Literal[""]`, and `Literal[b""]` inhabit
`AlwaysFalsy`, while other literals inhabit `AlwaysTruthy`.

This PR updates the literal-unions optimization to handle these types
correctly.

Fixes #17447

Verified locally that `QUICKCHECK_TESTS=100000 cargo test -p
red_knot_python_semantic -- --ignored types::property_tests::stable` now
passes again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Potential performance improvement ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants