-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[ty] implement TypedDict structural assignment
#21467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Diagnostic diff on typing conformance testsChanges were detected when running ty on typing conformance tests--- old-output.txt 2025-11-20 20:55:37.641020926 +0000
+++ new-output.txt 2025-11-20 20:55:41.165048397 +0000
@@ -954,11 +954,15 @@
typeddicts_extra_items.py:285:43: error[invalid-key] Unknown key "language" for TypedDict `ExtraMovie`: Unknown key "language"
typeddicts_extra_items.py:293:44: error[invalid-key] Unknown key "year" for TypedDict `ClosedMovie`: Unknown key "year"
typeddicts_extra_items.py:299:54: error[invalid-key] Unknown key "summary" for TypedDict `MovieExtraStr`: Unknown key "summary"
+typeddicts_extra_items.py:300:34: error[invalid-assignment] Object of type `MovieExtraStr` is not assignable to `Mapping[str, str]`
typeddicts_extra_items.py:302:54: error[invalid-key] Unknown key "year" for TypedDict `MovieExtraInt`: Unknown key "year"
+typeddicts_extra_items.py:303:34: error[invalid-assignment] Object of type `MovieExtraInt` is not assignable to `Mapping[str, int]`
+typeddicts_extra_items.py:304:44: error[invalid-assignment] Object of type `MovieExtraInt` is not assignable to `Mapping[str, int | str]`
typeddicts_extra_items.py:310:5: error[type-assertion-failure] Type `list[tuple[str, int | str]]` does not match asserted type `list[tuple[str, object]]`
typeddicts_extra_items.py:311:5: error[type-assertion-failure] Type `list[int | str]` does not match asserted type `list[object]`
-typeddicts_extra_items.py:327:5: error[unresolved-attribute] Object of type `IntDict` has no attribute `clear`
+typeddicts_extra_items.py:326:25: error[invalid-assignment] Object of type `IntDict` is not assignable to `dict[str, int]`
typeddicts_extra_items.py:329:52: error[invalid-key] Unknown key "bar" for TypedDict `IntDictWithNum` - did you mean "num"?
+typeddicts_extra_items.py:330:32: error[invalid-assignment] Object of type `IntDictWithNum` is not assignable to `dict[str, int]`
typeddicts_extra_items.py:337:1: error[unresolved-attribute] Object of type `IntDictWithNum` has no attribute `clear`
typeddicts_extra_items.py:339:1: error[type-assertion-failure] Type `tuple[str, int]` does not match asserted type `Unknown`
typeddicts_extra_items.py:339:13: error[unresolved-attribute] Object of type `IntDictWithNum` has no attribute `popitem`
@@ -980,12 +984,26 @@
typeddicts_readonly.py:51:4: error[invalid-assignment] Cannot assign to key "year" on TypedDict `Movie1`: key is marked read-only
typeddicts_readonly.py:60:4: error[invalid-assignment] Cannot assign to key "title" on TypedDict `Movie2`: key is marked read-only
typeddicts_readonly.py:61:4: error[invalid-assignment] Cannot assign to key "year" on TypedDict `Movie2`: key is marked read-only
+typeddicts_readonly_consistency.py:37:14: error[invalid-assignment] Object of type `A1` is not assignable to `B1`
+typeddicts_readonly_consistency.py:38:14: error[invalid-assignment] Object of type `C1` is not assignable to `B1`
+typeddicts_readonly_consistency.py:40:14: error[invalid-assignment] Object of type `A1` is not assignable to `C1`
+typeddicts_readonly_consistency.py:81:14: error[invalid-assignment] Object of type `A2` is not assignable to `B2`
+typeddicts_readonly_consistency.py:82:14: error[invalid-assignment] Object of type `C2` is not assignable to `B2`
+typeddicts_readonly_consistency.py:84:14: error[invalid-assignment] Object of type `A2` is not assignable to `C2`
+typeddicts_readonly_consistency.py:85:14: error[invalid-assignment] Object of type `B2` is not assignable to `C2`
typeddicts_readonly_inheritance.py:36:4: error[invalid-assignment] Cannot assign to key "name" on TypedDict `Album2`: key is marked read-only
typeddicts_readonly_inheritance.py:65:19: error[missing-typed-dict-key] Missing required key 'name' in TypedDict `RequiredName` constructor
typeddicts_readonly_inheritance.py:82:14: error[invalid-assignment] Invalid assignment to key "ident" with declared type `str` on TypedDict `User`: value of type `Literal[3]`
typeddicts_readonly_inheritance.py:83:15: error[invalid-argument-type] Invalid argument to key "ident" with declared type `str` on TypedDict `User`: value of type `Literal[3]`
typeddicts_readonly_inheritance.py:84:5: error[missing-typed-dict-key] Missing required key 'ident' in TypedDict `User` constructor
+typeddicts_type_consistency.py:21:10: error[invalid-assignment] Object of type `B1` is not assignable to `A1`
+typeddicts_type_consistency.py:38:10: error[invalid-assignment] Object of type `B2` is not assignable to `A2`
+typeddicts_type_consistency.py:65:6: error[invalid-assignment] Object of type `A3` is not assignable to `B3`
typeddicts_type_consistency.py:69:21: error[invalid-key] Unknown key "y" for TypedDict `A3`: Unknown key "y"
+typeddicts_type_consistency.py:76:22: error[invalid-assignment] Object of type `B3` is not assignable to `dict[str, int]`
+typeddicts_type_consistency.py:77:25: error[invalid-assignment] Object of type `B3` is not assignable to `dict[str, object]`
+typeddicts_type_consistency.py:78:22: error[invalid-assignment] Object of type `B3` is not assignable to `dict[Any, Any]`
+typeddicts_type_consistency.py:82:25: error[invalid-assignment] Object of type `B3` is not assignable to `Mapping[str, int]`
typeddicts_type_consistency.py:101:14: error[invalid-assignment] Object of type `Unknown | None` is not assignable to `str`
typeddicts_type_consistency.py:126:56: error[invalid-argument-type] Invalid argument to key "inner_key" with declared type `str` on TypedDict `Inner1`: value of type `Literal[1]`
typeddicts_usage.py:23:7: error[invalid-key] Unknown key "director" for TypedDict `Movie`: Unknown key "director"
@@ -993,5 +1011,5 @@
typeddicts_usage.py:28:17: error[missing-typed-dict-key] Missing required key 'name' in TypedDict `Movie` constructor
typeddicts_usage.py:28:18: error[invalid-key] Unknown key "title" for TypedDict `Movie`: Unknown key "title"
typeddicts_usage.py:40:24: error[invalid-type-form] The special form `typing.TypedDict` is not allowed in type expressions. Did you mean to use a concrete TypedDict or `collections.abc.Mapping[str, object]` instead?
-Found 995 diagnostics
+Found 1013 diagnostics
WARN A fatal error occurred while checking some files. Not all project files were analyzed. See the diagnostics list above for details.
|
|
|
||
| class Person(TypedDict): | ||
| name: str | ||
| phone_number: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test wants to look at the diagnostics we print for TypedDicts in a union, but previously Person was a subtype of Animal, so this PR made the union disappear. Adding phone_number breaks the subtyping relationship, so the union remains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually if we end up going with "don't generally simplify Unions of TypedDicts", I can put this test back the way it was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better for the test to pass whichever way we go on that, so this change seems good to me :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I expect if we land more comprehensive cycle-panic avoidance, we might want to go back to simplifying typed dicts in unions.
|
| if self_item_field.is_required() { | ||
| // A required field can't be assigned to a not-required, mutable field | ||
| // in the target, because `del` is allowed on the target field. | ||
| return ConstraintSet::from(false); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rule was present in historical versions of the typing spec, but it's missing from the current version. It is mentioned and tested in the conformance suite. After this lands I'll open a PR to the typing docs upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
| Lint rule | Added | Removed | Changed |
|---|---|---|---|
invalid-argument-type |
94 | 0 | 1 |
no-matching-overload |
23 | 0 | 0 |
unused-ignore-comment |
0 | 23 | 0 |
invalid-assignment |
7 | 6 | 1 |
invalid-return-type |
1 | 3 | 1 |
invalid-key |
0 | 4 | 0 |
type-assertion-failure |
0 | 2 | 0 |
unsupported-operator |
2 | 0 | 0 |
| Total | 127 | 38 | 3 |
|
It seems pretty common in the ecosystem report to want to pass a |
If there are cases like this showing up a lot in the ecosystem, can we just verify quick that we aren't being more strict than other type checkers? If not, then these are just true positives (at least as far as the specified type system is concerned) in the ecosystem that we are catching, great. If so, maybe other type checkers are implementing some kind of pragmatic compromise that we should at least consider. |
AlexWaygood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!! I haven't looked through the whole PR yet, but spotted one significant thing
It looks like there are a few new false positives on the typing conformance suite file I guess this is just because we don't support Other than the lines listed above, all other new diagnostics on the conformance suite look like true positives to me -- great job! |
|
Unfortunately it looks like there is a huge slowdown on the pydantic benchmark on this branch, which is causing it to timeout in CI currently. On my macbook locally, the benchmark completes in 2.161s on I suggest tackling some of the correctness issues I pointed out in #21467 (comment) and #21467 (comment) before looking at this too much. There's always a chance that fixing one/both of those will "miraculously" fix the regression 😆 |
Ah, this turned out to be exactly the issue that @AlexWaygood caught above: I was being too strict and making |
Yes I think all of the new errors-that-shouldn't-be-there are cases where |
b2d6f35 to
bc09e93
Compare
Alas, no. Digging into it. |
|
I don't have an answer yet, but an example of a specific file that's giving us trouble (which might be infecting a lot of other files, not sure) is |
|
Something you could try would be to add Salsa caching to The cost of doing this would be an increased memory usage, but that might be worth it to avoid a 30x execution time increase and benchmarks that no longer complete in CI... worth experimenting with, anyway. |
bc09e93 to
df183be
Compare
| KnownClass::Object.to_instance(db).when_assignable_to( | ||
| db, | ||
| target_item_field.declared_ty, | ||
| inferable, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slightly more efficient, and less verbose:
| KnownClass::Object.to_instance(db).when_assignable_to( | |
| db, | |
| target_item_field.declared_ty, | |
| inferable, | |
| ) | |
| Type::object().when_assignable_to( | |
| db, | |
| target_item_field.declared_ty, | |
| inferable, | |
| ) |
also, is really correct that we should use when_assignable_to even if relation == TypeRelation::Subtyping? Should it be this instead?
| KnownClass::Object.to_instance(db).when_assignable_to( | |
| db, | |
| target_item_field.declared_ty, | |
| inferable, | |
| ) | |
| Type::object().has_relation_to_impl( | |
| db, | |
| target_item_field.declared_ty, | |
| inferable, | |
| relation, | |
| relation_visitor, | |
| disjointness_visitor, | |
| ) |
(no tests fail if I make that change, so it looks like we might be missing some test coverage here either way ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird: committing the first suggestion resolves the whole comment, even though the second suggestion is pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's an example of something I can do in Python that demands the "subtyping" relation, as opposed to the "assignability" relation? (Besides ty_extensions.is_subtype_of I guess.) I get that they're going to be the same for fully-static types, so I know part of the answer is going to involve Any or similar, but I'm afraid I've gotten this far without actually knowing what subtyping per se is for :-D
(Edit: I went ahead and asked this in chat.)
|
The way I try to debug those issues is to add a |
I can't reproduce a panic on your branch for this snippet, but I can if I add EDIT: ah, and I was explicitly passing |
|
I tried out this change locally: Patchdiff --git a/crates/ty_python_semantic/src/types/class.rs b/crates/ty_python_semantic/src/types/class.rs
index 17b5fe7625..109a44e159 100644
--- a/crates/ty_python_semantic/src/types/class.rs
+++ b/crates/ty_python_semantic/src/types/class.rs
@@ -126,16 +126,6 @@ fn try_metaclass_cycle_initial<'db>(
})
}
-fn fields_cycle_initial<'db>(
- _db: &'db dyn Db,
- _id: salsa::Id,
- _self: ClassLiteral<'db>,
- _specialization: Option<Specialization<'db>>,
- _field_policy: CodeGeneratorKind<'db>,
-) -> FxIndexMap<Name, Field<'db>> {
- FxIndexMap::default()
-}
-
/// A category of classes with code generation capabilities (with synthesized methods).
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, salsa::Update, get_size2::GetSize)]
pub(crate) enum CodeGeneratorKind<'db> {
@@ -2339,7 +2329,7 @@ impl<'db> ClassLiteral<'db> {
// Use the alias name if provided, otherwise use the field name
let parameter_name =
- Name::new(alias.map(|alias| &**alias).unwrap_or(&**field_name));
+ Name::new(alias.map(|alias| &**alias).unwrap_or(&*field_name));
let mut parameter = if is_kw_only {
Parameter::keyword_only(parameter_name)
@@ -2603,8 +2593,8 @@ impl<'db> ClassLiteral<'db> {
)))
}
(CodeGeneratorKind::TypedDict, "get") => {
- let overloads = self
- .fields(db, specialization, field_policy)
+ let fields = self.fields(db, specialization, field_policy);
+ let overloads = fields
.iter()
.flat_map(|(name, field)| {
let key_type =
@@ -2834,7 +2824,6 @@ impl<'db> ClassLiteral<'db> {
/// Returns a list of all annotated attributes defined in this class, or any of its superclasses.
///
/// See [`ClassLiteral::own_fields`] for more details.
- #[salsa::tracked(returns(ref), cycle_initial=fields_cycle_initial, heap_size=get_size2::GetSize::get_heap_size)]
pub(crate) fn fields(
self,
db: &'db dyn Db,
diff --git a/crates/ty_python_semantic/src/types/diagnostic.rs b/crates/ty_python_semantic/src/types/diagnostic.rs
index ed8b1be1ab..1542da24ac 100644
--- a/crates/ty_python_semantic/src/types/diagnostic.rs
+++ b/crates/ty_python_semantic/src/types/diagnostic.rs
@@ -35,7 +35,7 @@ use ruff_python_ast::parenthesize::parentheses_iterator;
use ruff_python_ast::{self as ast, AnyNodeRef, Identifier};
use ruff_python_trivia::CommentRanges;
use ruff_text_size::{Ranged, TextRange};
-use rustc_hash::FxHashSet;
+use rustc_hash::{FxHashMap, FxHashSet};
use std::fmt::Formatter;
/// Registers all known type check lints.
@@ -3190,7 +3190,7 @@ pub(crate) fn report_invalid_key_on_typed_dict<'db>(
typed_dict_ty: Type<'db>,
full_object_ty: Option<Type<'db>>,
key_ty: Type<'db>,
- items: &FxIndexMap<Name, Field<'db>>,
+ items: &FxHashMap<Name, Field<'db>>,
) {
let db = context.db();
if let Some(builder) = context.report_lint(&INVALID_KEY, key_node) {
diff --git a/crates/ty_python_semantic/src/types/infer/builder.rs b/crates/ty_python_semantic/src/types/infer/builder.rs
index 85c645d37a..90bbb33ea9 100644
--- a/crates/ty_python_semantic/src/types/infer/builder.rs
+++ b/crates/ty_python_semantic/src/types/infer/builder.rs
@@ -919,9 +919,9 @@ impl<'db, 'ast> TypeInferenceBuilder<'db, 'ast> {
CodeGeneratorKind::from_class(self.db(), class, None)
{
let specialization = None;
+ let fields = class.fields(self.db(), specialization, field_policy);
- let kw_only_sentinel_fields: Vec<_> = class
- .fields(self.db(), specialization, field_policy)
+ let kw_only_sentinel_fields: Vec<_> = fields
.iter()
.filter_map(|(name, field)| {
field.is_kw_only_sentinel(self.db()).then_some(name)
diff --git a/crates/ty_python_semantic/src/types/typed_dict.rs b/crates/ty_python_semantic/src/types/typed_dict.rs
index c093dbef25..8e94a4fb1f 100644
--- a/crates/ty_python_semantic/src/types/typed_dict.rs
+++ b/crates/ty_python_semantic/src/types/typed_dict.rs
@@ -4,6 +4,7 @@ use ruff_db::parsed::parsed_module;
use ruff_python_ast::Arguments;
use ruff_python_ast::{self as ast, AnyNodeRef, StmtClassDef, name::Name};
use ruff_text_size::Ranged;
+use rustc_hash::FxHashMap;
use super::class::{ClassType, CodeGeneratorKind, Field};
use super::context::InferContext;
@@ -12,10 +13,10 @@ use super::diagnostic::{
report_missing_typed_dict_key,
};
use super::{ApplyTypeMappingVisitor, Type, TypeMapping, visitor};
+use crate::Db;
use crate::types::constraints::ConstraintSet;
use crate::types::generics::InferableTypeVars;
use crate::types::{HasRelationToVisitor, IsDisjointVisitor, TypeContext, TypeRelation};
-use crate::{Db, FxIndexMap};
use ordermap::OrderSet;
@@ -56,9 +57,25 @@ impl<'db> TypedDictType<'db> {
self.defining_class
}
- pub(crate) fn items(self, db: &'db dyn Db) -> &'db FxIndexMap<Name, Field<'db>> {
- let (class_literal, specialization) = self.defining_class.class_literal(db);
- class_literal.fields(db, specialization, CodeGeneratorKind::TypedDict)
+ pub(crate) fn items(self, db: &'db dyn Db) -> &'db FxHashMap<Name, Field<'db>> {
+ #[salsa::tracked(returns(ref), cycle_initial=items_cycle_initial, heap_size=get_size2::GetSize::get_heap_size)]
+ fn items_inner<'db>(db: &'db dyn Db, class: ClassType<'db>) -> FxHashMap<Name, Field<'db>> {
+ let (class_literal, specialization) = class.class_literal(db);
+ class_literal
+ .fields(db, specialization, CodeGeneratorKind::TypedDict)
+ .into_iter()
+ .collect()
+ }
+
+ fn items_cycle_initial<'db>(
+ _db: &'db dyn Db,
+ _id: salsa::Id,
+ _class: ClassType<'db>,
+ ) -> FxHashMap<Name, Field<'db>> {
+ FxHashMap::default()
+ }
+
+ items_inner(db, self.defining_class)
}The idea of the change is to make it easier for the cycle to converge by returning an With that patch applied, we still panic on the repro from #21467 (comment) (providing you pass New query stacktraceNote that (with or without my patch above), when invoking Pydantic query stacktrace |
|
This patch gets us passing on both minimal repros so far (#21467 (comment) and #21467 (comment)), and seems probably worth doing on its own merits, since it'll make cycles rarer and avoid us having to engage in a full structual check for many Patchdiff --git a/crates/ty_python_semantic/src/types/class.rs b/crates/ty_python_semantic/src/types/class.rs
index 17b5fe7625..78beca70d1 100644
--- a/crates/ty_python_semantic/src/types/class.rs
+++ b/crates/ty_python_semantic/src/types/class.rs
@@ -566,7 +566,9 @@ impl<'db> ClassType<'db> {
},
// Protocol and Generic are not represented by a ClassType.
- ClassBase::Protocol | ClassBase::Generic => ConstraintSet::from(false),
+ ClassBase::Protocol | ClassBase::Generic | ClassBase::TypedDict => {
+ ConstraintSet::from(false)
+ }
ClassBase::Class(base) => match (base, other) {
(ClassType::NonGeneric(base), ClassType::NonGeneric(other)) => {
@@ -589,11 +591,6 @@ impl<'db> ClassType<'db> {
ConstraintSet::from(false)
}
},
-
- ClassBase::TypedDict => {
- // TODO: Implement subclassing and assignability for TypedDicts.
- ConstraintSet::from(true)
- }
}
})
}
diff --git a/crates/ty_python_semantic/src/types/infer/builder.rs b/crates/ty_python_semantic/src/types/infer/builder.rs
index 85c645d37a..449e770aff 100644
--- a/crates/ty_python_semantic/src/types/infer/builder.rs
+++ b/crates/ty_python_semantic/src/types/infer/builder.rs
@@ -8007,25 +8007,26 @@ impl<'db, 'ast> TypeInferenceBuilder<'db, 'ast> {
// the `try_call` path below.
// TODO: it should be possible to move these special cases into the `try_call_constructor`
// path instead, or even remove some entirely once we support overloads fully.
- let has_special_cased_constructor = matches!(
- class.known(self.db()),
- Some(
- KnownClass::Bool
- | KnownClass::Str
- | KnownClass::Type
- | KnownClass::Object
- | KnownClass::Property
- | KnownClass::Super
- | KnownClass::TypeAliasType
- | KnownClass::Deprecated
- )
- ) || (
- // Constructor calls to `tuple` and subclasses of `tuple` are handled in `Type::Bindings`,
- // but constructor calls to `tuple[int]`, `tuple[int, ...]`, `tuple[int, *tuple[str, ...]]` (etc.)
- // are handled by the default constructor-call logic (we synthesize a `__new__` method for them
- // in `ClassType::own_class_member()`).
- class.is_known(self.db(), KnownClass::Tuple) && !class.is_generic()
- );
+ let has_special_cased_constructor =
+ matches!(
+ class.known(self.db()),
+ Some(
+ KnownClass::Bool
+ | KnownClass::Str
+ | KnownClass::Type
+ | KnownClass::Object
+ | KnownClass::Property
+ | KnownClass::Super
+ | KnownClass::TypeAliasType
+ | KnownClass::Deprecated
+ )
+ ) || (
+ // Constructor calls to `tuple` and subclasses of `tuple` are handled in `Type::Bindings`,
+ // but constructor calls to `tuple[int]`, `tuple[int, ...]`, `tuple[int, *tuple[str, ...]]` (etc.)
+ // are handled by the default constructor-call logic (we synthesize a `__new__` method for them
+ // in `ClassType::own_class_member()`).
+ class.is_known(self.db(), KnownClass::Tuple) && !class.is_generic()
+ ) || CodeGeneratorKind::TypedDict.matches(self.db(), class.class_literal(self.db()).0, class.class_literal(self.db()).1);
// temporary special-casing for all subclasses of `enum.Enum`
// until we support the functional syntax for creating enum classes
diff --git a/crates/ty_python_semantic/src/types/typed_dict.rs b/crates/ty_python_semantic/src/types/typed_dict.rs
index c093dbef25..bafe798425 100644
--- a/crates/ty_python_semantic/src/types/typed_dict.rs
+++ b/crates/ty_python_semantic/src/types/typed_dict.rs
@@ -90,6 +90,16 @@ impl<'db> TypedDictType<'db> {
relation_visitor: &HasRelationToVisitor<'db>,
disjointness_visitor: &IsDisjointVisitor<'db>,
) -> ConstraintSet<'db> {
+ // First do a quick nominal check that (if it succeeds) means that we can avoid
+ // materializing the full `TypeDict` schema for either `self` or `target`.
+ // This should be cheaper in many cases, and also helps us avoid some cycles.
+ if self
+ .defining_class
+ .is_subclass_of(db, target.defining_class)
+ {
+ return ConstraintSet::from(true);
+ }
+
let self_items = self.items(db);
let target_items = target.items(db);
// Many rules violations short-circuit with "never", but asking whether one field isWe still panic when checking pydantic, even with that patch applied, however, so @oconnor663 will have to find a new minimal repro if he applies the patch ;) |
Yes I should've said I'm just running all of these snippets out of |
|
Here's a new minimized repro (running as a standalone script out of from typing import TypedDict
class Foo1(TypedDict):
x1: MyUnion
class Foo2(TypedDict):
x2: MyUnion
class Foo3(Foo2):
pass
class Foo4(Foo2):
x3: MyUnion
class Foo5(TypedDict):
x5: MyUnion
MyUnion = Foo1 | Foo3 | Foo4 | Foo5It feels "pretty much the same" as the last one. What I'm noticing as I minimize these, is that there are a lot of "paths" I can take on the way down. I'll get a different result if I start ripping things out from the top vs from the bottom vs in the middle. Eventually I rip out something that removes the panic, so I put that thing back and roll the dice again on where to delete more stuff. So the end result is 1) one of many local minima I could wind up at and 2) maximally sensitive to the order of elements within the union. Which means that minor tweaks in how things are evaluated are likely to "fix" one of these minimal examples by coincidence without really fixing the underlying issue. I'm going to try to follow @carljm and @MichaReiser's advice above and play around with logging this some more, to see if I can get some more intuition about what's going on. |
|
Making some progress with printouts. (Finally figured out that a "cycle recovery function" is Notable that the union keeps collapsing down to a single TypedDict type. More staring required... |
|
One thing that we could consider here that might help is to just never consider two named |
|
I'll try that. Separately, I've been refining the printout above, and now I see that sometimes my |
|
Also Alex mentioned at one point that he hasn't been able to get |
fd7ac18 to
17e3bf0
Compare
|
I think I'm seeing CI runs for later commits getting cancelled in favor of earlier commits, which doesn't make sense to me. Known problem? My fault for force-pushing something? In any case, I've manually restarted https://github.com/astral-sh/ruff/actions/runs/19516380284/job/55870895275, and I think that's the one to watch. |
|
Ok, everything is passing, except that CodSpeed reports a ~10x slowdown on the Pydantic benchmark. I'm going to try to isolate what parts are costing us the most time, but at the same time I need feedback on whether this is "obviously a performance bug" or "maybe reasonable given how convoluted their unions are"? |
|
The magnitude of the regression is surprising to me. I definitely think we should explore how we can be more efficient here, but given that no other project regresses more than 1%, it's clearly a factor of the size and ubiquity of the typed dicts in pydantic; I wouldn't necessarily be opposed to going ahead and landing this and doing more optimization as a follow-up. |
carljm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work!
AlexWaygood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!!
Since Carl already reviewed, I haven't done an exhaustive review of the semantics here vis-a-vis the spec -- just a few notes from me.
I think we should definitely look at improving perf here as a (post-beta?) followup. Aside from anything else, it's frustrating that the benchmarks CI job now takes 5 minutes longer than it used to 😆 But I do agree that we should land this now and move on; there's too much else to do before the beta.
I don't think you have any tests currently for generic typeddicts. It looks like everything works correctly on your branch, but it would be great to test it explicitly; something like this?
from typing import TypedDict
from ty_extensions import static_assert, is_assignable_to, is_subtype_of
class F[T](TypedDict):
x: T
class G[T](TypedDict):
x: T
static_assert(is_assignable_to(F, G))
def x[T](a: T) -> T:
static_assert(is_subtype_of(F[T], G[T]))
return a
static_assert(is_subtype_of(F[int], G[int]))
static_assert(not is_assignable_to(F[int], G[str]))Some tests that use the legacy syntax (multiple-inheriting from TypedDict and Generic[T]) would be great as well.
I ran the property tests on this branch and didn't see any issues. Though I don't think we include any TypedDict types in the property tests right now, so that's not a massive surprise 😆. We should probably open a followup issue for that -- again, that should probably be done post-beta.
| relation_visitor: &HasRelationToVisitor<'db>, | ||
| disjointness_visitor: &IsDisjointVisitor<'db>, | ||
| ) -> ConstraintSet<'db> { | ||
| // First do a quick nominal check that (if it succeeds) means that we can avoid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just talked with Alex about the 10x performance regression on pydantic here, caused by the large union of TypedDicts (many of which are probably not subtypes of each other). When we build that union, we need to perform O(n²) subtyping checks, so it's understandable that we see a huge regression now that we actually do nontrivial work in those checks. Other than the nominal check here (which is probably not super fast, actually), are there any other short circuit paths where we could return false early? The most common case might be something like the following where two "normal" TypedDicts (no extra_items or similar) are simply incompatible because their key names are not compatible:
class A(TypedDict):
key_a: int
class B(TypedDict):
key_b: intIs there something we can do by just looking at the names, without ever looking at the value types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried out something like this, and it passes all tests, but doesn't seem to lead to any speedup sadly:
diff --git a/crates/ty_python_semantic/src/types/typed_dict.rs b/crates/ty_python_semantic/src/types/typed_dict.rs
index bafe798425..6f638c9817 100644
--- a/crates/ty_python_semantic/src/types/typed_dict.rs
+++ b/crates/ty_python_semantic/src/types/typed_dict.rs
@@ -102,6 +102,21 @@ impl<'db> TypedDictType<'db> {
let self_items = self.items(db);
let target_items = target.items(db);
+
+ // For `self` to be a subtype of `target`,
+ // all `target` fields must be present in `self` unless
+ // the target field specifically has the form `NotRequired[ReadOnly[object]]`.
+ // We therefore do a quick check for missing keys here before doing the full
+ // (expensive) relation checks below.
+ for (name, field) in target_items {
+ if field.is_read_only() && !field.is_required() && field.declared_ty.is_object() {
+ continue;
+ }
+ if !self_items.contains_key(name) {
+ return ConstraintSet::from(false);
+ }
+ }
+
// Many rules violations short-circuit with "never", but asking whether one field is
// [relation] to/of another can produce more complicated constraints, and we collect those.
let mut constraints = ConstraintSet::from(true);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you verify that this early return actually triggers on the pydantic code? If so, is it possible that the nominal subtyping check above is expensive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you verify that this early return actually triggers on the pydantic code?
No, I didn't
If so, is it possible that the nominal subtyping check above is expensive?
That's certainly possible... even if it is expensive, I think it may still be worth doing to reduce the chance of pathological cycles for recursive typeddicts. But that's just speculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried out this patch, which also did not speed things up materially:
diff --git a/crates/ty_python_semantic/src/types/typed_dict.rs b/crates/ty_python_semantic/src/types/typed_dict.rs
index bafe798425..c086eb4c11 100644
--- a/crates/ty_python_semantic/src/types/typed_dict.rs
+++ b/crates/ty_python_semantic/src/types/typed_dict.rs
@@ -90,18 +90,23 @@ impl<'db> TypedDictType<'db> {
relation_visitor: &HasRelationToVisitor<'db>,
disjointness_visitor: &IsDisjointVisitor<'db>,
) -> ConstraintSet<'db> {
- // First do a quick nominal check that (if it succeeds) means that we can avoid
- // materializing the full `TypeDict` schema for either `self` or `target`.
- // This should be cheaper in many cases, and also helps us avoid some cycles.
- if self
- .defining_class
- .is_subclass_of(db, target.defining_class)
- {
- return ConstraintSet::from(true);
- }
-
let self_items = self.items(db);
let target_items = target.items(db);
+
+ // For `self` to be a subtype of `target`,
+ // all `target` fields must be present in `self` unless
+ // the target field specifically has the form `NotRequired[ReadOnly[object]]`.
+ // We therefore do a quick check for missing keys here before doing the full
+ // (expensive) relation checks below.
+ for (name, field) in target_items {
+ if field.is_read_only() && !field.is_required() && field.declared_ty.is_object() {
+ continue;
+ }
+ if !self_items.contains_key(name) {
+ return ConstraintSet::from(false);
+ }
+ }
+
// Many rules violations short-circuit with "never", but asking whether one field is
// [relation] to/of another can produce more complicated constraints, and we collect those.
let mut constraints = ConstraintSet::from(true);and adding Salsa caching to Type::is_assignable_to also didn't help. The benchmark stays stubbornly in the 19s-20s range for me locally whatever I do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the expense in the profile above looks to be inside is_assignable_to (not is_subtype_of or is_redundant_with), it may be worth trying to Salsa cache is_assignable_to again?
I know we tried that on an earlier version of this PR and it didn't fix the even-more-massive blowup then -- but that was solved by caching fields. It seems like the current smaller blowup might be helped by caching is_assignable_to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the expense in the profile above looks to be inside
is_assignable_to(notis_subtype_oforis_redundant_with), it may be worth trying to Salsa cacheis_assignable_toagain?
As I said above in #21467 (comment), I tried that earlier today and it didn't yield any speedup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, I missed that brief note after the big code diff...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe caching is_subclass_of directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm measuring ~no performance impact from this diff on Pydantic :(
diff --git a/crates/ty_python_semantic/src/types/class.rs b/crates/ty_python_semantic/src/types/class.rs
index fc3b9d4227..93827ae16b 100644
--- a/crates/ty_python_semantic/src/types/class.rs
+++ b/crates/ty_python_semantic/src/types/class.rs
@@ -524,6 +524,7 @@ impl<'db> ClassType<'db> {
}
/// Return `true` if `other` is present in this class's MRO.
+ #[salsa::tracked]
pub(super) fn is_subclass_of(self, db: &'db dyn Db, other: ClassType<'db>) -> bool {
self.when_subclass_of(db, other, InferableTypeVars::None)
.is_always_satisfied(db)
@@ -1758,6 +1759,7 @@ impl<'db> ClassLiteral<'db> {
}
/// Return `true` if `other` is present in this class's MRO.
+ #[salsa::tracked]
pub(super) fn is_subclass_of(
self,
db: &'db dyn Db,I think I've addressed all the non-perf comments, and CI is green except for CodSpeed (which doesn't list any other regressions besides Pydantic), so if there are no objections I'll go ahead and land this as-is and defer the rest of the perf work until after Beta.
dd9e1ad to
19933ce
Compare
| class NotRequiredReadOnlyAny(TypedDict): | ||
| x: NotRequired[ReadOnly[Any]] | ||
|
|
||
| # fmt: off |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, I sorta prefer Black's formatting 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Black's formatting looks nicer if you're scrolling past the code, but I think this version is better if you're actually trying to read it. (Or more to the point for me, if you're trying to edit it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Editing -- yes. Reading -- I disagree. But I also don't feel strongly; don't feel you need to change this just to keep me happy :-)
Could you possibly add some tests for generic typeddicts (I gave some suggestions in #21467 (review))? But otherwise, yes, I think let's get this in 👍 |
|
Woops, thanks for catching. Added structural assignability to the existing mdtest sections for both class-style and legacy generic typeddicts. |
…d-typevar * origin/main: (24 commits) [ty] Remove brittle constraint set reveal tests (#21568) [`ruff`] Catch more dummy variable uses (`RUF052`) (#19799) [ty] Use the same snapshot handling as other tests (#21564) [ty] suppress autocomplete suggestions during variable binding (#21549) Set severity for non-rule diagnostics (#21559) [ty] Add `with_type` convenience to display code (#21563) [ty] Implement docstring rendering to markdown (#21550) [ty] Reduce indentation of `TypeInferenceBuilder::infer_attribute_load` (#21560) Bump 0.14.6 (#21558) [ty] Improve debug messages when imports fail (#21555) [ty] Add support for relative import completions [ty] Refactor detection of import statements for completions [ty] Use dedicated collector for completions [ty] Attach subdiagnostics to `unresolved-import` errors for relative imports as well as absolute imports (#21554) [ty] support PEP 613 type aliases (#21394) [ty] More low-hanging fruit for inlay hint goto-definition (#21548) [ty] implement `TypedDict` structural assignment (#21467) [ty] Add more random TypeDetails and tests (#21546) [ty] Add goto for `Unknown` when it appears in an inlay hint (#21545) [ty] Add type definitions for `Type::SpecialForm`s (#21544) ...
Closes astral-sh/ty#1387.