[ty] Heterogeneous unpacking support for unions #20377

AlexWaygood · 2025-09-14T16:46:21Z

Summary

This PR adds precise heterogeneous unpacking support for unions.

For a tuple such as tuple[int, str], we've long recognised that if you unpack this tuple, the first element will be an int and the second will be a str. But the same has not been true for tuple[Literal[42], str] | tuple[Literal[56], str] -- if a user unpacked this union of tuples, we would infer that both first and second elements were of type Literal[42, 56] | str. This PR fixes that: we now infer that the first element will be of type Literal[42, 56] and the second element will be of type str.

This doesn't add much complexity to our iteration logic, fixes a number of false positives in the ecosystem, and (surprisingly!) leads to a nice performance boost on the colour-science benchmark.

Test Plan

Mdtests added

github-actions · 2025-09-14T16:48:20Z

Diagnostic diff on typing conformance tests

No changes detected when running ty on typing conformance tests ✅

github-actions · 2025-09-14T19:31:44Z

`mypy_primer` results

Changes were detected when running on open source projects

scrapy (https://github.com/scrapy/scrapy)
- scrapy/http/headers.py:39:21: error[not-iterable] Object of type `AnyStr@update` may not be iterable
- scrapy/utils/datatypes.py:91:66: error[not-iterable] Object of type `AnyStr@update` may not be iterable
- Found 1067 diagnostics
+ Found 1065 diagnostics

vision (https://github.com/pytorch/vision)
- references/classification/utils.py:420:5: error[invalid-assignment] Object of type `tuple[type | Unknown, ...]` is not assignable to `list[type] | None`
+ references/classification/utils.py:420:5: error[invalid-assignment] Object of type `tuple[type, ...] | tuple[Unknown, ...]` is not assignable to `list[type] | None`

xarray (https://github.com/pydata/xarray)
- xarray/tests/test_groupby.py:3059:32: error[parameter-already-assigned] Multiple values provided for parameter `freq` of function `date_range`
- Found 1617 diagnostics
+ Found 1616 diagnostics

koda-validate (https://github.com/keithasaurus/koda-validate)
- koda_validate/generic.py:236:21: error[not-iterable] Object of type `ListOrTupleOrSetAny@UniqueItems` may not be iterable
- Found 69 diagnostics
+ Found 68 diagnostics

scikit-learn (https://github.com/scikit-learn/scikit-learn)
- sklearn/utils/tests/test_multiclass.py:410:27: warning[possibly-missing-attribute] Attribute `toarray` on type `(Unknown & SparseABC) | (list[Unknown | list[Unknown | int]] & SparseABC) | (list[Unknown | list[Unknown | str]] & SparseABC) | ... omitted 11 union elements` may be missing
+ sklearn/utils/tests/test_multiclass.py:410:27: warning[possibly-missing-attribute] Attribute `toarray` on type `(Unknown & SparseABC) | (list[Unknown | list[Unknown | int]] & SparseABC) | (_NotAnArray & SparseABC) | ... omitted 11 union elements` may be missing

pandas (https://github.com/pandas-dev/pandas)
- pandas/tests/util/test_assert_extension_array_equal.py:108:41: error[invalid-argument-type] Argument to function `assert_extension_array_equal` is incorrect: Expected `bool | Literal["equiv"]`, found `@Todo | SparseArray`
- pandas/tests/util/test_assert_extension_array_equal.py:108:41: error[invalid-argument-type] Argument to function `assert_extension_array_equal` is incorrect: Expected `str`, found `@Todo | SparseArray`
- Found 3389 diagnostics
+ Found 3387 diagnostics

core (https://github.com/home-assistant/core)
- homeassistant/components/lovelace/websocket.py:58:46: error[invalid-argument-type] Argument to bound method `send_error` is incorrect: Expected `dict[str, Any] | None`, found `str`
- Found 13756 diagnostics
+ Found 13755 diagnostics

No memory usage changes detected ✅

github-actions · 2025-10-11T12:36:20Z

`ecosystem-analyzer` results

Lint rule	Added	Removed	Changed
`invalid-argument-type`	0	6	0
`parameter-already-assigned`	0	1	0
Total	0	7	0

Full report with detailed diff (timing results)

codspeed-hq · 2025-10-11T12:37:54Z

CodSpeed Performance Report

Merging #20377 will improve performances by 6.26%

_{Comparing alex/tuplespec-union (8dde577) with main (4b7f184)}

Summary

⚡ 1 improvement
✅ 20 untouched
⏩ 30 skipped¹

Benchmarks breakdown

	Mode	Benchmark	`BASE`	`HEAD`	Change
⚡	WallTime	`medium[colour-science]`	11.2 s	10.5 s	+6.26%

30 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

AlexWaygood · 2025-10-11T13:48:34Z

crates/ty_python_semantic/src/types.rs

+        fn non_async_special_case<'db>(
+            db: &'db dyn Db,
+            ty: Type<'db>,
+        ) -> Option<Cow<'db, TupleSpec<'db>>> {


other than the Type::Union branch, this is exactly the same code that used to exist in the match statement slightly lower down. It's just been extracted into a standalone function (so that it can be called recursively), and moved higher up to satisfy Clippy.

dcreager · 2025-10-14T00:51:06Z

We've supported unpacking of unions like this in assignments, but it looks like this implementation might be more general. Can we replace the older union-related code in types/unpacker.rs with calls to this new logic?

AlexWaygood · 2025-10-14T11:42:43Z

We've supported unpacking of unions like this in assignments, but it looks like this implementation might be more general. Can we replace the older union-related code in types/unpacker.rs with calls to this new logic?

That's a great question. I've just spent a while looking at this (probably longer than I should have!), and my answer is... I don't think so, unfortunately. Obtaining a different tuple spec for each union member, like unpacker.rs is doing, is still preferable to calling try_iterate() directly on a union type -- it's still less lossy, yields better fallback results in the case of errors, and gives better error messages.

I experimented with having Type::try_iterate() return Result<IterationOutcome<'db>, IterationError<'db>>, where IterationOutcome is an abstraction that looks like this:

struct IterationOutcome<'db>(smallvec::SmallVec<[Cow<'db, TupleSpec<'db>>; 1]>);

but even that is too lossy a representation for unpacker.rs. unpacker.rs really needs to know whether each individual union element is iterable, and what kind of diagnostic it emits if it's not. So for unpacker.rs to be able to use try_iterate without manually mapping over the union elements, try_iterate would need to return IterationOutcome<'db>, where IterationOutcome looks like

struct IterationOutcome<'db>(
    smallvec::SmallVec<[Result<Cow<'db, TupleSpec<'db>>, IterationError<'db>>; 1]>
);

At which point the abstraction becomes so complicated, that I think it's probably not worth it anymore.

So the question is: what does this PR actually get us?

Firstly it gets us much better call-binding for calls with variadic parameters: unlike in unpacker.rs, we do not map over the union elements manually in call/bind.rs. Maybe we should map over the union elements there, like in unpacker.rs, but it looks awkward to pull off:

ruff/crates/ty_python_semantic/src/types/call/bind.rs

Lines 2283 to 2291 in ac2c530

    
               /// Match a variadic argument to the remaining positional, standard or variadic parameters. 
        
               fn match_variadic( 
        
                   &mut self, 
        
                   db: &'db dyn Db, 
        
                   argument_index: usize, 
        
                   argument: Argument<'a>, 
        
                   argument_type: Option<Type<'db>>, 
        
               ) -> Result<(), ()> { 
        
                   let tuple = argument_type.map(|ty| ty.iterate(db));

Secondly, it gets us much more precise inference even if all you want is the homogeneous element type. For example:

from typing import Literal

def f(x: Literal["abc", "def"]):
    for item in x:
        # main: LiteralString
        # This PR: Literal["a", "b", "c", "d", "e", "f"]
        reveal_type(item)

dcreager

That's a great question. I've just spent a while looking at this (probably longer than I should have!), and my answer is... I don't think so, unfortunately.

Thanks for looking at that! No argument with your findings. Can you summarize them in the code? Possibly in unpacker.rs, to explain why you can't use the union-handling logic that we use for argument splatting

crates/ty_python_semantic/resources/mdtest/call/function.md

dcreager · 2025-10-14T19:44:14Z

crates/ty_python_semantic/src/types/call/bind.rs

+                                let return_type = if let Type::Union(union) = argument {
+                                    union.map(db, |element| {
+                                        Type::tuple(TupleType::new(db, &element.iterate(db)))
+                                    })
+                                } else {
+                                    Type::tuple(TupleType::new(db, &argument.iterate(db)))
+                                };
+                                overload.set_return_type(return_type);


Not blocking for this PR, but this pattern seems to come up enough to deserve a helper method on Type — a map_over_union that applies the function to each union element if it's a union, or to the individual type if not.

dcreager · 2025-10-14T19:45:41Z

crates/ty_python_semantic/src/types/tuple.rs

        }
    }

+    fn all_elements(&self) -> impl Iterator<Item = &Type<'db>> {


nit: Can you throw a copied onto each iterator so that this can return Iterator<Item = Type>? I find it good karma to not make the caller have to worry about that

Hmm, I feel like here it's more ergonomic to have it return Iterator<Item = &Type<'db>>, because then it matches the signature of TupleSpec::all_elements (and, indeed, the two other all_elements() methods in this module!):

ruff/crates/ty_python_semantic/src/types/tuple.rs

Lines 343 to 345 in 6f468ae

pub(crate) fn all_elements(&self) -> impl Iterator<Item = &T> {

self.0.iter()

}

it seems like strictly more code for not much gain here, if I make it return Iterator<Item = Type<'db>> rather than Iterator<Item = &Type<'db>>?

diff --git a/crates/ty_python_semantic/src/types/tuple.rs b/crates/ty_python_semantic/src/types/tuple.rs index dcb3df675d..cc39e696f8 100644 --- a/crates/ty_python_semantic/src/types/tuple.rs +++ b/crates/ty_python_semantic/src/types/tuple.rs @@ -1583,7 +1583,7 @@ impl<'db> TupleSpecBuilder<'db> { } } - fn all_elements(&self) -> impl Iterator<Item = &Type<'db>> { + fn all_elements(&self) -> impl Iterator<Item = Type<'db>> { match self { TupleSpecBuilder::Fixed(elements) => Either::Left(elements.iter()), TupleSpecBuilder::Variable { @@ -1592,6 +1592,7 @@ impl<'db> TupleSpecBuilder<'db> { suffix, } => Either::Right(prefix.iter().chain(std::iter::once(variable)).chain(suffix)), } + .copied() } /// Return a new tuple-spec builder that reflects the union of this tuple and another tuple. @@ -1625,8 +1626,10 @@ impl<'db> TupleSpecBuilder<'db> { // would actually lead to more precise inference, so it's probably not worth the // complexity. _ => { - let unioned = - UnionType::from_elements(db, self.all_elements().chain(other.all_elements())); + let unioned = UnionType::from_elements( + db, + self.all_elements().chain(other.all_elements().copied()), + ); TupleSpecBuilder::Variable { prefix: vec![], variable: unioned,

crates/ty_python_semantic/src/types/tuple.rs

…rable * origin/main: Don't use codspeed or depot runners in CI jobs on forks (#20894) [ty] cache Type::is_redundant_with (#20477) Fix run-away for mutually referential instance attributes (#20645) [ty] Limit shown import paths to at most 5 unless ty runs with `-v` (#20912) [ty] Use field-specifier return type as the default type for the field (#20915) [ty] Do not assume that `field`s have a default value (#20914) [ty] Fix match pattern value narrowing to use equality semantics (#20882) Update setup instructions for Zed 0.208.0+ (#20902) Move TOML indent size config (#20905) [syntax-errors]: implement F702 as semantic syntax error (#20869) [ty] Heterogeneous unpacking support for unions (#20377) [ty] refactor `Place` (#20871) Auto-accept snapshot changes as part of typeshed-sync PRs (#20892) [`airflow`] Add warning to `airflow.datasets.DatasetEvent` usage (`AIR301`) (#20551) [`flake8-pyi`] Fix operator precedence by adding parentheses when needed (`PYI061`) (#20508) [`pyupgrade`] Fix false negative for `TypeVar` with default argument in `non-pep695-generic-class` (`UP046`) (#20660) Update parser snapshots (#20893) Fix syntax error false positives for escapes and quotes in f-strings (#20867)

…nt-sets * dcreager/non-non-inferable: Don't use codspeed or depot runners in CI jobs on forks (#20894) [ty] cache Type::is_redundant_with (#20477) Fix run-away for mutually referential instance attributes (#20645) [ty] Limit shown import paths to at most 5 unless ty runs with `-v` (#20912) [ty] Use field-specifier return type as the default type for the field (#20915) [ty] Do not assume that `field`s have a default value (#20914) [ty] Fix match pattern value narrowing to use equality semantics (#20882) Update setup instructions for Zed 0.208.0+ (#20902) Move TOML indent size config (#20905) [syntax-errors]: implement F702 as semantic syntax error (#20869) [ty] Heterogeneous unpacking support for unions (#20377) [ty] refactor `Place` (#20871) Auto-accept snapshot changes as part of typeshed-sync PRs (#20892) [`airflow`] Add warning to `airflow.datasets.DatasetEvent` usage (`AIR301`) (#20551) [`flake8-pyi`] Fix operator precedence by adding parentheses when needed (`PYI061`) (#20508) [`pyupgrade`] Fix false negative for `TypeVar` with default argument in `non-pep695-generic-class` (`UP046`) (#20660) Update parser snapshots (#20893) Fix syntax error false positives for escapes and quotes in f-strings (#20867)

AlexWaygood added the ty Multi-file analysis & type inference label Sep 14, 2025

AlexWaygood closed this Sep 14, 2025

AlexWaygood reopened this Sep 14, 2025

AlexWaygood force-pushed the alex/tuplespec-union branch from 9c25d55 to d3ac1ac Compare October 11, 2025 12:25

AlexWaygood added the ecosystem-analyzer label Oct 11, 2025

AlexWaygood commented Oct 11, 2025

View reviewed changes

AlexWaygood marked this pull request as ready for review October 11, 2025 14:03

AlexWaygood requested review from carljm, dcreager and sharkdp as code owners October 11, 2025 14:03

AlexWaygood force-pushed the alex/tuplespec-union branch 2 times, most recently from 2fe357a to a377c0a Compare October 11, 2025 14:46

AlexWaygood marked this pull request as draft October 13, 2025 10:20

AlexWaygood marked this pull request as ready for review October 13, 2025 10:20

sharkdp removed their request for review October 14, 2025 07:15

dcreager approved these changes Oct 14, 2025

View reviewed changes

AlexWaygood added 8 commits October 15, 2025 18:47

[ty] Heterogeneous unpacking support for unions

cc7faa0

tests

0ac4fd4

.

a932b3c

more comments and tests

c488648

fix lint

c068c10

may as well?

8497f97

remove untested code

14c7d82

improve

e2a21e2

address review

8dde577

AlexWaygood force-pushed the alex/tuplespec-union branch from 6f468ae to 8dde577 Compare October 15, 2025 18:12

AlexWaygood merged commit fd568f0 into main Oct 15, 2025
41 checks passed

AlexWaygood deleted the alex/tuplespec-union branch October 15, 2025 18:30

AlexWaygood mentioned this pull request Oct 15, 2025

Standardize syntax error construction #20903

Merged

	pub(crate) fn all_elements(&self) -> impl Iterator<Item = &T> {
	self.0.iter()
	}

[ty] Heterogeneous unpacking support for unions #20377

[ty] Heterogeneous unpacking support for unions #20377

Uh oh!

Conversation

AlexWaygood commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

github-actions bot commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Diagnostic diff on typing conformance tests

Uh oh!

github-actions bot commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mypy_primer results

Uh oh!

github-actions bot commented Oct 11, 2025

ecosystem-analyzer results

Uh oh!

codspeed-hq bot commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #20377 will improve performances by 6.26%

Summary

Benchmarks breakdown

Footnotes

Uh oh!

AlexWaygood Oct 11, 2025

Choose a reason for hiding this comment

Uh oh!

dcreager commented Oct 14, 2025

Uh oh!

AlexWaygood commented Oct 14, 2025

Uh oh!

dcreager left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dcreager Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

dcreager Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

AlexWaygood Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AlexWaygood commented Sep 14, 2025 •

edited

Loading

github-actions bot commented Sep 14, 2025 •

edited

Loading

github-actions bot commented Sep 14, 2025 •

edited

Loading

`mypy_primer` results

`ecosystem-analyzer` results

codspeed-hq bot commented Oct 11, 2025 •

edited

Loading