Skip to content

Conversation

@AlexWaygood
Copy link
Member

@AlexWaygood AlexWaygood commented Sep 14, 2025

Summary

This PR adds precise heterogeneous unpacking support for unions.

For a tuple such as tuple[int, str], we've long recognised that if you unpack this tuple, the first element will be an int and the second will be a str. But the same has not been true for tuple[Literal[42], str] | tuple[Literal[56], str] -- if a user unpacked this union of tuples, we would infer that both first and second elements were of type Literal[42, 56] | str. This PR fixes that: we now infer that the first element will be of type Literal[42, 56] and the second element will be of type str.

This doesn't add much complexity to our iteration logic, fixes a number of false positives in the ecosystem, and (surprisingly!) leads to a nice performance boost on the colour-science benchmark.

Test Plan

Mdtests added

@AlexWaygood AlexWaygood added the ty Multi-file analysis & type inference label Sep 14, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Sep 14, 2025

Diagnostic diff on typing conformance tests

No changes detected when running ty on typing conformance tests ✅

@AlexWaygood AlexWaygood reopened this Sep 14, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Sep 14, 2025

mypy_primer results

Changes were detected when running on open source projects
scrapy (https://github.com/scrapy/scrapy)
- scrapy/http/headers.py:39:21: error[not-iterable] Object of type `AnyStr@update` may not be iterable
- scrapy/utils/datatypes.py:91:66: error[not-iterable] Object of type `AnyStr@update` may not be iterable
- Found 1067 diagnostics
+ Found 1065 diagnostics

vision (https://github.com/pytorch/vision)
- references/classification/utils.py:420:5: error[invalid-assignment] Object of type `tuple[type | Unknown, ...]` is not assignable to `list[type] | None`
+ references/classification/utils.py:420:5: error[invalid-assignment] Object of type `tuple[type, ...] | tuple[Unknown, ...]` is not assignable to `list[type] | None`

xarray (https://github.com/pydata/xarray)
- xarray/tests/test_groupby.py:3059:32: error[parameter-already-assigned] Multiple values provided for parameter `freq` of function `date_range`
- Found 1617 diagnostics
+ Found 1616 diagnostics

koda-validate (https://github.com/keithasaurus/koda-validate)
- koda_validate/generic.py:236:21: error[not-iterable] Object of type `ListOrTupleOrSetAny@UniqueItems` may not be iterable
- Found 69 diagnostics
+ Found 68 diagnostics

scikit-learn (https://github.com/scikit-learn/scikit-learn)
- sklearn/utils/tests/test_multiclass.py:410:27: warning[possibly-missing-attribute] Attribute `toarray` on type `(Unknown & SparseABC) | (list[Unknown | list[Unknown | int]] & SparseABC) | (list[Unknown | list[Unknown | str]] & SparseABC) | ... omitted 11 union elements` may be missing
+ sklearn/utils/tests/test_multiclass.py:410:27: warning[possibly-missing-attribute] Attribute `toarray` on type `(Unknown & SparseABC) | (list[Unknown | list[Unknown | int]] & SparseABC) | (_NotAnArray & SparseABC) | ... omitted 11 union elements` may be missing

pandas (https://github.com/pandas-dev/pandas)
- pandas/tests/util/test_assert_extension_array_equal.py:108:41: error[invalid-argument-type] Argument to function `assert_extension_array_equal` is incorrect: Expected `bool | Literal["equiv"]`, found `@Todo | SparseArray`
- pandas/tests/util/test_assert_extension_array_equal.py:108:41: error[invalid-argument-type] Argument to function `assert_extension_array_equal` is incorrect: Expected `str`, found `@Todo | SparseArray`
- Found 3389 diagnostics
+ Found 3387 diagnostics

core (https://github.com/home-assistant/core)
- homeassistant/components/lovelace/websocket.py:58:46: error[invalid-argument-type] Argument to bound method `send_error` is incorrect: Expected `dict[str, Any] | None`, found `str`
- Found 13756 diagnostics
+ Found 13755 diagnostics
No memory usage changes detected ✅

@github-actions
Copy link
Contributor

ecosystem-analyzer results

Lint rule Added Removed Changed
invalid-argument-type 0 6 0
parameter-already-assigned 0 1 0
Total 0 7 0

Full report with detailed diff (timing results)

@codspeed-hq
Copy link

codspeed-hq bot commented Oct 11, 2025

CodSpeed Performance Report

Merging #20377 will improve performances by 6.26%

Comparing alex/tuplespec-union (8dde577) with main (4b7f184)

Summary

⚡ 1 improvement
✅ 20 untouched
⏩ 30 skipped1

Benchmarks breakdown

Mode Benchmark BASE HEAD Change
WallTime medium[colour-science] 11.2 s 10.5 s +6.26%

Footnotes

  1. 30 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Comment on lines +5280 to +5534
fn non_async_special_case<'db>(
db: &'db dyn Db,
ty: Type<'db>,
) -> Option<Cow<'db, TupleSpec<'db>>> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other than the Type::Union branch, this is exactly the same code that used to exist in the match statement slightly lower down. It's just been extracted into a standalone function (so that it can be called recursively), and moved higher up to satisfy Clippy.

@AlexWaygood AlexWaygood marked this pull request as ready for review October 11, 2025 14:03
@AlexWaygood AlexWaygood force-pushed the alex/tuplespec-union branch 2 times, most recently from 2fe357a to a377c0a Compare October 11, 2025 14:46
@AlexWaygood AlexWaygood marked this pull request as draft October 13, 2025 10:20
@AlexWaygood AlexWaygood marked this pull request as ready for review October 13, 2025 10:20
@dcreager
Copy link
Member

We've supported unpacking of unions like this in assignments, but it looks like this implementation might be more general. Can we replace the older union-related code in types/unpacker.rs with calls to this new logic?

@sharkdp sharkdp removed their request for review October 14, 2025 07:15
@AlexWaygood
Copy link
Member Author

We've supported unpacking of unions like this in assignments, but it looks like this implementation might be more general. Can we replace the older union-related code in types/unpacker.rs with calls to this new logic?

That's a great question. I've just spent a while looking at this (probably longer than I should have!), and my answer is... I don't think so, unfortunately. Obtaining a different tuple spec for each union member, like unpacker.rs is doing, is still preferable to calling try_iterate() directly on a union type -- it's still less lossy, yields better fallback results in the case of errors, and gives better error messages.

I experimented with having Type::try_iterate() return Result<IterationOutcome<'db>, IterationError<'db>>, where IterationOutcome is an abstraction that looks like this:

struct IterationOutcome<'db>(smallvec::SmallVec<[Cow<'db, TupleSpec<'db>>; 1]>);

but even that is too lossy a representation for unpacker.rs. unpacker.rs really needs to know whether each individual union element is iterable, and what kind of diagnostic it emits if it's not. So for unpacker.rs to be able to use try_iterate without manually mapping over the union elements, try_iterate would need to return IterationOutcome<'db>, where IterationOutcome looks like

struct IterationOutcome<'db>(
    smallvec::SmallVec<[Result<Cow<'db, TupleSpec<'db>>, IterationError<'db>>; 1]>
);

At which point the abstraction becomes so complicated, that I think it's probably not worth it anymore.

So the question is: what does this PR actually get us?

Firstly it gets us much better call-binding for calls with variadic parameters: unlike in unpacker.rs, we do not map over the union elements manually in call/bind.rs. Maybe we should map over the union elements there, like in unpacker.rs, but it looks awkward to pull off:

/// Match a variadic argument to the remaining positional, standard or variadic parameters.
fn match_variadic(
&mut self,
db: &'db dyn Db,
argument_index: usize,
argument: Argument<'a>,
argument_type: Option<Type<'db>>,
) -> Result<(), ()> {
let tuple = argument_type.map(|ty| ty.iterate(db));

Secondly, it gets us much more precise inference even if all you want is the homogeneous element type. For example:

from typing import Literal

def f(x: Literal["abc", "def"]):
    for item in x:
        # main: LiteralString
        # This PR: Literal["a", "b", "c", "d", "e", "f"]
        reveal_type(item)

Copy link
Member

@dcreager dcreager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great question. I've just spent a while looking at this (probably longer than I should have!), and my answer is... I don't think so, unfortunately.

Thanks for looking at that! No argument with your findings. Can you summarize them in the code? Possibly in unpacker.rs, to explain why you can't use the union-handling logic that we use for argument splatting

Comment on lines +1103 to +1116
let return_type = if let Type::Union(union) = argument {
union.map(db, |element| {
Type::tuple(TupleType::new(db, &element.iterate(db)))
})
} else {
Type::tuple(TupleType::new(db, &argument.iterate(db)))
};
overload.set_return_type(return_type);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking for this PR, but this pattern seems to come up enough to deserve a helper method on Type — a map_over_union that applies the function to each union element if it's a union, or to the individual type if not.

}
}

fn all_elements(&self) -> impl Iterator<Item = &Type<'db>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you throw a copied onto each iterator so that this can return Iterator<Item = Type>? I find it good karma to not make the caller have to worry about that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I feel like here it's more ergonomic to have it return Iterator<Item = &Type<'db>>, because then it matches the signature of TupleSpec::all_elements (and, indeed, the two other all_elements() methods in this module!):

pub(crate) fn all_elements(&self) -> impl Iterator<Item = &T> {
self.0.iter()
}

it seems like strictly more code for not much gain here, if I make it return Iterator<Item = Type<'db>> rather than Iterator<Item = &Type<'db>>?

diff --git a/crates/ty_python_semantic/src/types/tuple.rs b/crates/ty_python_semantic/src/types/tuple.rs
index dcb3df675d..cc39e696f8 100644
--- a/crates/ty_python_semantic/src/types/tuple.rs
+++ b/crates/ty_python_semantic/src/types/tuple.rs
@@ -1583,7 +1583,7 @@ impl<'db> TupleSpecBuilder<'db> {
         }
     }
 
-    fn all_elements(&self) -> impl Iterator<Item = &Type<'db>> {
+    fn all_elements(&self) -> impl Iterator<Item = Type<'db>> {
         match self {
             TupleSpecBuilder::Fixed(elements) => Either::Left(elements.iter()),
             TupleSpecBuilder::Variable {
@@ -1592,6 +1592,7 @@ impl<'db> TupleSpecBuilder<'db> {
                 suffix,
             } => Either::Right(prefix.iter().chain(std::iter::once(variable)).chain(suffix)),
         }
+        .copied()
     }
 
     /// Return a new tuple-spec builder that reflects the union of this tuple and another tuple.
@@ -1625,8 +1626,10 @@ impl<'db> TupleSpecBuilder<'db> {
             // would actually lead to more precise inference, so it's probably not worth the
             // complexity.
             _ => {
-                let unioned =
-                    UnionType::from_elements(db, self.all_elements().chain(other.all_elements()));
+                let unioned = UnionType::from_elements(
+                    db,
+                    self.all_elements().chain(other.all_elements().copied()),
+                );
                 TupleSpecBuilder::Variable {
                     prefix: vec![],
                     variable: unioned,

@AlexWaygood AlexWaygood merged commit fd568f0 into main Oct 15, 2025
41 checks passed
@AlexWaygood AlexWaygood deleted the alex/tuplespec-union branch October 15, 2025 18:30
dcreager added a commit that referenced this pull request Oct 16, 2025
…rable

* origin/main:
  Don't use codspeed or depot runners in CI jobs on forks (#20894)
  [ty] cache Type::is_redundant_with (#20477)
  Fix run-away for mutually referential instance attributes (#20645)
  [ty] Limit shown import paths to at most 5 unless ty runs with `-v` (#20912)
  [ty] Use field-specifier return type as the default type for the field (#20915)
  [ty] Do not assume that `field`s have a default value (#20914)
  [ty] Fix match pattern value narrowing to use equality semantics (#20882)
  Update setup instructions for Zed 0.208.0+ (#20902)
  Move TOML indent size config (#20905)
  [syntax-errors]: implement F702 as semantic syntax error (#20869)
  [ty] Heterogeneous unpacking support for unions (#20377)
  [ty] refactor `Place` (#20871)
  Auto-accept snapshot changes as part of typeshed-sync PRs (#20892)
  [`airflow`] Add warning to `airflow.datasets.DatasetEvent` usage (`AIR301`) (#20551)
  [`flake8-pyi`] Fix operator precedence by adding parentheses when needed (`PYI061`) (#20508)
  [`pyupgrade`] Fix false negative for `TypeVar` with default argument in `non-pep695-generic-class` (`UP046`) (#20660)
  Update parser snapshots (#20893)
  Fix syntax error false positives for escapes and quotes in f-strings (#20867)
dcreager added a commit that referenced this pull request Oct 16, 2025
…nt-sets

* dcreager/non-non-inferable:
  Don't use codspeed or depot runners in CI jobs on forks (#20894)
  [ty] cache Type::is_redundant_with (#20477)
  Fix run-away for mutually referential instance attributes (#20645)
  [ty] Limit shown import paths to at most 5 unless ty runs with `-v` (#20912)
  [ty] Use field-specifier return type as the default type for the field (#20915)
  [ty] Do not assume that `field`s have a default value (#20914)
  [ty] Fix match pattern value narrowing to use equality semantics (#20882)
  Update setup instructions for Zed 0.208.0+ (#20902)
  Move TOML indent size config (#20905)
  [syntax-errors]: implement F702 as semantic syntax error (#20869)
  [ty] Heterogeneous unpacking support for unions (#20377)
  [ty] refactor `Place` (#20871)
  Auto-accept snapshot changes as part of typeshed-sync PRs (#20892)
  [`airflow`] Add warning to `airflow.datasets.DatasetEvent` usage (`AIR301`) (#20551)
  [`flake8-pyi`] Fix operator precedence by adding parentheses when needed (`PYI061`) (#20508)
  [`pyupgrade`] Fix false negative for `TypeVar` with default argument in `non-pep695-generic-class` (`UP046`) (#20660)
  Update parser snapshots (#20893)
  Fix syntax error false positives for escapes and quotes in f-strings (#20867)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ecosystem-analyzer ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants