Skip to content

Conversation

@AlexWaygood
Copy link
Member

@AlexWaygood AlexWaygood commented Nov 18, 2025

Summary

Locally, applying this patch to @oconnor663's branch for #21467 speeds us up >120x on the pydantic benchmark (we go from taking a whole minute to complete the benchmark to taking <0.5s).

But I think this might provide a performance boost on main, which is why I'm filing it as a standalone PR.

Test Plan

All existing tests pass

@AlexWaygood AlexWaygood added performance Potential performance improvement ty Multi-file analysis & type inference labels Nov 18, 2025
@astral-sh-bot
Copy link

astral-sh-bot bot commented Nov 18, 2025

Diagnostic diff on typing conformance tests

Changes were detected when running ty on typing conformance tests
--- old-output.txt	2025-11-18 17:45:25.535995063 +0000
+++ new-output.txt	2025-11-18 17:45:28.972019207 +0000
@@ -1,4 +1,4 @@
-fatal[panic] Panicked at /home/runner/.cargo/git/checkouts/salsa-e6f3bb7c2a062968/a885bb4/src/function/execute.rs:465:17 when checking `/home/runner/work/ruff/ruff/typing/conformance/tests/aliases_typealiastype.py`: `infer_definition_types(Id(19aa3)): execute: too many cycle iterations`
+fatal[panic] Panicked at /home/runner/.cargo/git/checkouts/salsa-e6f3bb7c2a062968/a885bb4/src/function/execute.rs:465:17 when checking `/home/runner/work/ruff/ruff/typing/conformance/tests/aliases_typealiastype.py`: `infer_definition_types(Id(19ea3)): execute: too many cycle iterations`
 _directives_deprecated_library.py:15:31: error[invalid-return-type] Function always implicitly returns `None`, which is not assignable to return type `int`
 _directives_deprecated_library.py:30:26: error[invalid-return-type] Function always implicitly returns `None`, which is not assignable to return type `str`
 _directives_deprecated_library.py:36:41: error[invalid-return-type] Function always implicitly returns `None`, which is not assignable to return type `Self@__add__`

@astral-sh-bot
Copy link

astral-sh-bot bot commented Nov 18, 2025

mypy_primer results

Changes were detected when running on open source projects
scikit-build-core (https://github.com/scikit-build/scikit-build-core)
- src/scikit_build_core/build/wheel.py:98:20: error[�]8;;https://ty.dev/rules#no-matching-overload�\no-matching-overload�]8;;�\] No overload of bound method `__init__` matches arguments
- Found 44 diagnostics
+ Found 43 diagnostics

No memory usage changes detected ✅

/// Returns a list of all annotated attributes defined in this class, or any of its superclasses.
///
/// See [`ClassLiteral::own_fields`] for more details.
#[salsa::tracked(returns(ref))]
Copy link
Member

@MichaReiser MichaReiser Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to cache this. Can you add the heap_size attribute so that we can get a sense of how expensive this is in terms of memory consumption?

How much does it change performance if we were to cache own_fields only? I'm asking because that would require slightly less memory

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the heap_size attribute so that we can get a sense of how expensive this is in terms of memory consumption?

Yes, I was already working on this before you posted your comment 😆

How much does it change performance if we were to cache own_fields only? I'm asking because that would require slightly less memory

Not sure. Happy to try it, but I'll wait for Codspeed to finish on this version of the PR first so we can compare the two using the codspeed runners

@astral-sh-bot
Copy link

astral-sh-bot bot commented Nov 18, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@AlexWaygood
Copy link
Member Author

1-3% improvements on six benchmarks, including a 2% improvement on pydantic, offset by a 1% regression on 9 benchmarks. No memory usage change reported by mypy_primer.

@MichaReiser
Copy link
Member

Should we rebase this on top of Jack's PR to see the real performance benefits? It seems more questionable to merge with the now very limited performance improvement

@carljm
Copy link
Contributor

carljm commented Nov 18, 2025

I think the options are "land this first" or "merge it with Jack's PR" -- I don't think we can land the TypedDict PR separately without this, because it makes benchmarks time out on GitHub.

I would be inclined to land this separately just because it's easier to review as two separate PRs. But we can rebase Jack's PR on top of this first to verify that this fixes the performance issues there.

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Nov 18, 2025

Should we rebase this on top of Jack's PR to see the real performance benefits? It seems more questionable to merge with the now very limited performance improvement

Hmm, I'm not totally sure I understand the standard being applied here -- I feel like we've merged lots of performance improvements in the past that showed <3% improvements on our benchmarks, especially if they didn't have any measurable memory-usage regressions :-)

As I say, I've also verified locally that this yields a huge speedup when I cherry pick it onto #21467, which currently can't be merged due to the performance regression on that branch.

But sure, I'm happy if @oconnor663 wants to just cherry-pick f407537 onto #21467.

How much does it change performance if we were to cache own_fields only? I'm asking because that would require slightly less memory

I tried this out locally (applied to Jack's PR). It has very competitive performance to this version. But I'm inclined to go with this version, since there's no memory regression reported on this PR, and since caching ClassLiteral::own_fields rather than ClassLiteral::fields makes our APIs a fair bit less ergonomic IMO. You end up always having to assign the result of class.fields(db) before iterating over it, or the borrow checker complains in several places that temporary values are being dropped while borrowed, because the return type of ClassLiteral::fields() needs to be FxIndexMap<&'db Name, &'db Field<'db>> rather than &'db FxIndexMap<Name, Field<'db>>

diff --git a/crates/ty_python_semantic/src/types/class.rs b/crates/ty_python_semantic/src/types/class.rs
index abc54f5626..54cfd75f9c 100644
--- a/crates/ty_python_semantic/src/types/class.rs
+++ b/crates/ty_python_semantic/src/types/class.rs
@@ -2593,8 +2593,9 @@ impl<'db> ClassLiteral<'db> {
                 )))
             }
             (CodeGeneratorKind::TypedDict, "get") => {
-                let overloads = self
-                    .fields(db, specialization, field_policy)
+                let fields = self.fields(db, specialization, field_policy);
+
+                let overloads = fields
                     .iter()
                     .flat_map(|(name, field)| {
                         let key_type =
@@ -2824,17 +2825,19 @@ impl<'db> ClassLiteral<'db> {
     /// Returns a list of all annotated attributes defined in this class, or any of its superclasses.
     ///
     /// See [`ClassLiteral::own_fields`] for more details.
-    #[salsa::tracked(returns(ref))]
     pub(crate) fn fields(
         self,
         db: &'db dyn Db,
         specialization: Option<Specialization<'db>>,
         field_policy: CodeGeneratorKind<'db>,
-    ) -> FxIndexMap<Name, Field<'db>> {
+    ) -> FxIndexMap<&'db Name, &'db Field<'db>> {
         if field_policy == CodeGeneratorKind::NamedTuple {
             // NamedTuples do not allow multiple inheritance, so it is sufficient to enumerate the
             // fields of this class only.
-            return self.own_fields(db, specialization, field_policy);
+            return self
+                .own_fields(db, specialization, field_policy)
+                .iter()
+                .collect();
         }
 
         let matching_classes_in_mro: Vec<_> = self
@@ -2873,11 +2876,12 @@ impl<'db> ClassLiteral<'db> {
     ///     y: str = "a"
     /// ```
     /// we return a map `{"x": (int, None), "y": (str, Some(Literal["a"]))}`.
+    #[salsa::tracked(returns(ref))]
     pub(super) fn own_fields(
         self,
         db: &'db dyn Db,
         specialization: Option<Specialization<'db>>,
-        field_policy: CodeGeneratorKind,
+        field_policy: CodeGeneratorKind<'db>,
     ) -> FxIndexMap<Name, Field<'db>> {
         let mut attributes = FxIndexMap::default();
 
diff --git a/crates/ty_python_semantic/src/types/diagnostic.rs b/crates/ty_python_semantic/src/types/diagnostic.rs
index 66316f1537..5b3cc2b62e 100644
--- a/crates/ty_python_semantic/src/types/diagnostic.rs
+++ b/crates/ty_python_semantic/src/types/diagnostic.rs
@@ -3135,7 +3135,7 @@ pub(crate) fn report_invalid_key_on_typed_dict<'db>(
     typed_dict_ty: Type<'db>,
     full_object_ty: Option<Type<'db>>,
     key_ty: Type<'db>,
-    items: &FxIndexMap<Name, Field<'db>>,
+    items: &FxIndexMap<&'db Name, &'db Field<'db>>,
 ) {
     let db = context.db();
     if let Some(builder) = context.report_lint(&INVALID_KEY, key_node) {
@@ -3197,8 +3197,8 @@ pub(crate) fn report_invalid_key_on_typed_dict<'db>(
 pub(super) fn report_namedtuple_field_without_default_after_field_with_default<'db>(
     context: &InferContext<'db, '_>,
     class: ClassLiteral<'db>,
-    (field, field_def): &(Name, Option<Definition<'db>>),
-    (field_with_default, field_with_default_def): &(Name, Option<Definition<'db>>),
+    (field, field_def): (&Name, Option<Definition<'db>>),
+    (field_with_default, field_with_default_def): (&Name, Option<Definition<'db>>),
 ) {
     let db = context.db();
     let module = context.module();
diff --git a/crates/ty_python_semantic/src/types/infer/builder.rs b/crates/ty_python_semantic/src/types/infer/builder.rs
index caa925ad74..778fe849b2 100644
--- a/crates/ty_python_semantic/src/types/infer/builder.rs
+++ b/crates/ty_python_semantic/src/types/infer/builder.rs
@@ -4,6 +4,7 @@ use itertools::{Either, Itertools};
 use ruff_db::diagnostic::{Annotation, DiagnosticId, Severity};
 use ruff_db::files::File;
 use ruff_db::parsed::ParsedModuleRef;
+use ruff_python_ast::name::Name;
 use ruff_python_ast::visitor::{Visitor, walk_expr};
 use ruff_python_ast::{
     self as ast, AnyNodeRef, ExprContext, HasNodeIndex, NodeIndex, PythonVersion,
@@ -613,12 +614,11 @@ impl<'db, 'ast> TypeInferenceBuilder<'db, 'ast> {
                     ) {
                         field_with_default_encountered =
                             Some((field_name, field.single_declaration));
-                    } else if let Some(field_with_default) = field_with_default_encountered.as_ref()
-                    {
+                    } else if let Some(field_with_default) = field_with_default_encountered {
                         report_namedtuple_field_without_default_after_field_with_default(
                             &self.context,
                             class,
-                            &(field_name, field.single_declaration),
+                            (field_name, field.single_declaration),
                             field_with_default,
                         );
                     }
@@ -919,9 +919,9 @@ impl<'db, 'ast> TypeInferenceBuilder<'db, 'ast> {
                 CodeGeneratorKind::from_class(self.db(), class, None)
             {
                 let specialization = None;
+                let fields = class.fields(self.db(), specialization, field_policy);
 
-                let kw_only_sentinel_fields: Vec<_> = class
-                    .fields(self.db(), specialization, field_policy)
+                let kw_only_sentinel_fields: Vec<_> = fields
                     .iter()
                     .filter_map(|(name, field)| {
                         field.is_kw_only_sentinel(self.db()).then_some(name)
@@ -7269,7 +7269,7 @@ impl<'db, 'ast> TypeInferenceBuilder<'db, 'ast> {
             }
 
             let value_ty = if let Some(Type::StringLiteral(key)) = key_ty
-                && let Some(field) = typed_dict_items.get(key.value(self.db()))
+                && let Some(field) = typed_dict_items.get(&Name::from(key.value(self.db())))
             {
                 self.infer_expression(&item.value, TypeContext::new(Some(field.declared_ty)))
             } else {
@@ -7954,7 +7954,7 @@ impl<'db, 'ast> TypeInferenceBuilder<'db, 'ast> {
                                     Type::TypedDict(typed_dict_ty),
                                     None,
                                     key_ty,
-                                    items,
+                                    &items,
                                 );
                                 // Return `Unknown` to prevent the overload system from generating its own error
                                 return Type::unknown();
@@ -11285,7 +11285,7 @@ impl<'db, 'ast> TypeInferenceBuilder<'db, 'ast> {
                                 value_ty,
                                 None,
                                 slice_ty,
-                                typed_dict.items(db),
+                                &typed_dict.items(db),
                             );
                         } else {
                             if let Some(builder) =
diff --git a/crates/ty_python_semantic/src/types/typed_dict.rs b/crates/ty_python_semantic/src/types/typed_dict.rs
index 1b33d7111f..521329f2c9 100644
--- a/crates/ty_python_semantic/src/types/typed_dict.rs
+++ b/crates/ty_python_semantic/src/types/typed_dict.rs
@@ -58,7 +58,7 @@ impl<'db> TypedDictType<'db> {
         self.defining_class
     }
 
-    pub(crate) fn items(self, db: &'db dyn Db) -> &'db FxIndexMap<Name, Field<'db>> {
+    pub(crate) fn items(self, db: &'db dyn Db) -> FxIndexMap<&'db Name, &'db Field<'db>> {
         let (class_literal, specialization) = self.defining_class.class_literal(db);
         class_literal.fields(db, specialization, CodeGeneratorKind::TypedDict)
     }
@@ -318,7 +318,7 @@ pub(super) fn validate_typed_dict_key_assignment<'db, 'ast>(
     let items = typed_dict.items(db);
 
     // Check if key exists in `TypedDict`
-    let Some((_, item)) = items.iter().find(|(name, _)| *name == key) else {
+    let Some((_, item)) = items.iter().find(|(name, _)| **name == key) else {
         if emit_diagnostic {
             report_invalid_key_on_typed_dict(
                 context,
@@ -327,7 +327,7 @@ pub(super) fn validate_typed_dict_key_assignment<'db, 'ast>(
                 Type::TypedDict(typed_dict),
                 full_object_ty,
                 Type::string_literal(db, key),
-                items,
+                &items,
             );
         }

@AlexWaygood AlexWaygood marked this pull request as ready for review November 18, 2025 16:53
Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@AlexWaygood AlexWaygood enabled auto-merge (squash) November 18, 2025 17:46
@AlexWaygood AlexWaygood merged commit 8dad289 into main Nov 18, 2025
39 of 40 checks passed
@AlexWaygood AlexWaygood deleted the alex/optimize-fields branch November 18, 2025 17:48
@oconnor663
Copy link
Contributor

Thank you @AlexWaygood (and reviewers)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Potential performance improvement ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants