Skip to content

Conversation

@sharkdp
Copy link
Contributor

@sharkdp sharkdp commented Apr 16, 2025

Summary

This changeset allows us to generate the signature of synthesized __init__ functions in dataclasses by analyzing the fields on the class (and its superclasses). There are certain things that I have not yet attempted to model in this PR, like kw_only, dataclasses.KW_ONLY or functionality around dataclasses.field.

depends on #17406

ticket: #16651

Ecosystem analysis

These two seem to depend on missing features in generics (see relevant code here):

+ error[lint:unknown-argument] /tmp/mypy_primer/projects/dacite/tests/core/test_generics.py:54:24: Argument `x` does not match any known parameter
+ error[lint:unknown-argument] /tmp/mypy_primer/projects/dacite/tests/core/test_generics.py:54:38: Argument `y` does not match any known parameter

These two are true positives 🥳. See relevant code here.

+ error[lint:invalid-argument-type] /tmp/mypy_primer/projects/dacite/tests/core/test_config.py:161:24: Argument to this function is incorrect: Expected `int`, found `Literal["test"]`
+ error[lint:invalid-argument-type] /tmp/mypy_primer/projects/dacite/tests/core/test_config.py:172:24: Argument to this function is incorrect: Expected `int | float`, found `Literal["test"]`

This one depends on ** unpacking of dictionaries, which we don't support yet:

+ error[lint:missing-argument] /tmp/mypy_primer/projects/mypy_primer/mypy_primer/globals.py:218:11: No arguments provided for required parameters `new`, `old`, `repo`, `type_checker`, `mypyc_compile_level`, `custom_typeshed_repo`, `new_typeshed`, `old_typeshed`, `new_prepend_path`, `old_prepend_path`, `additional_flags`, `project_selector`, `known_dependency_selector`, `local_project`, `expected_success`, `project_date`, `shard_index`, `num_shards`, `output`, `old_success`, `coverage`, `bisect`, `bisect_output`, `validate_expected_success`, `measure_project_runtimes`, `concurrency`, `base_dir`, `debug`, `clear`

Test Plan

New Markdown tests.

@sharkdp sharkdp added the ty Multi-file analysis & type inference label Apr 16, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Apr 16, 2025

mypy_primer results

Changes were detected when running on open source projects
dacite (https://github.com/konradhalas/dacite)
+ error[lint:unknown-argument] /tmp/mypy_primer/projects/dacite/tests/core/test_generics.py:54:24: Argument `x` does not match any known parameter
+ error[lint:unknown-argument] /tmp/mypy_primer/projects/dacite/tests/core/test_generics.py:54:38: Argument `y` does not match any known parameter
+ error[lint:invalid-argument-type] /tmp/mypy_primer/projects/dacite/tests/core/test_config.py:161:24: Argument to this function is incorrect: Expected `int`, found `Literal["test"]`
+ error[lint:invalid-argument-type] /tmp/mypy_primer/projects/dacite/tests/core/test_config.py:172:24: Argument to this function is incorrect: Expected `int | float`, found `Literal["test"]`
- Found 154 diagnostics
+ Found 158 diagnostics

mypy_primer (https://github.com/hauntsaninja/mypy_primer)
+ error[lint:missing-argument] /tmp/mypy_primer/projects/mypy_primer/mypy_primer/globals.py:218:11: No arguments provided for required parameters `new`, `old`, `repo`, `type_checker`, `mypyc_compile_level`, `custom_typeshed_repo`, `new_typeshed`, `old_typeshed`, `new_prepend_path`, `old_prepend_path`, `additional_flags`, `project_selector`, `known_dependency_selector`, `local_project`, `expected_success`, `project_date`, `shard_index`, `num_shards`, `output`, `old_success`, `coverage`, `bisect`, `bisect_output`, `validate_expected_success`, `measure_project_runtimes`, `concurrency`, `base_dir`, `debug`, `clear`
- Found 9 diagnostics
+ Found 10 diagnostics

@sharkdp sharkdp force-pushed the david/dataclasses-pt3 branch from 2c6c1df to ef6e875 Compare April 16, 2025 13:12
@sharkdp sharkdp force-pushed the david/dataclasses-pt2 branch from 182d0a1 to 2a31576 Compare April 16, 2025 13:14
@sharkdp sharkdp force-pushed the david/dataclasses-pt3 branch 4 times, most recently from bdd3212 to c589560 Compare April 16, 2025 18:34
@sharkdp sharkdp closed this Apr 16, 2025
@sharkdp sharkdp reopened this Apr 16, 2025
@sharkdp sharkdp force-pushed the david/dataclasses-pt3 branch from c589560 to 0370df5 Compare April 16, 2025 20:07
Comment on lines +965 to +1027
if !declarations
.clone()
.all(|DeclarationWithConstraint { declaration, .. }| {
declaration.is_some_and(|declaration| {
matches!(
declaration.kind(db),
DefinitionKind::AnnotatedAssignment(..)
)
})
})
{
continue;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a hack to avoid things like function definitions and nested class definitions showing up as dataclass fields. The filtering by AnnotatedAssignment is correct, I think, but we do not correctly handle weird things like

class C:
    if flag():
        def attr(): ...
    else:
        attr: int = 1

Another option to solve this might be to pass down some kind of definition-kind-filter to symbol_from_declarations? Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A filter for symbol_from_declarations seems reasonable -- or it could even be a filter in the use-def map fetching?

The even trickier part about supporting something like this is that I think it would mean we'd have to generate a union of __init__ methods with different signatures? It seems closely-related to the possibly-unbound handling you mention below, in that sense.

I definitely think this can be a TODO for now, I don't think any other type checker supports it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment for now.

continue;
}

if let Some(attr_ty) = attr.symbol.ignore_possibly_unbound() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR, I did not attempt to model any sort of possibly-unbound handling. It's also possible that there are edge cases with unions of dataclasses or dataclasses with unions of attributes that we don't handle correctly yet. I would like to postpone that to a post-alpha follow up, if that sounds okay. It's certainly not a problem on any of the ecosystem projects, because it would have shown up in new diagnostics otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think it's fine for this to be post-alpha, even post-beta probably.


## Signature of `__init__`

TODO: All of the following tests are missing the `self` argument in the `__init__` signature.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still not solved. Turning FunctionType into an enum sounds painful 🙃, but I'll look into it eventually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it won't be too bad? It's really more like putting a new wrapper enum around the existing FunctionType (though the new wrapper enum should probably get the name FunctionType.) Some APIs of FunctionType will be easy to proxy (e.g. signature), and some are easy because we can just give up (no way to provide a definition location to support goto-type-definition for a synthetic function). Not sure if there are some that may be tricky.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think I will I'll postpone this until after @dhruvmanila's overload work has been merged, as it looks to me like that would introduce conflicts, that we can easily avoid by just waiting a bit.

@sharkdp sharkdp marked this pull request as ready for review April 16, 2025 20:22
@sharkdp sharkdp force-pushed the david/dataclasses-pt2 branch from 2a31576 to eedb81a Compare April 16, 2025 20:32
@sharkdp sharkdp force-pushed the david/dataclasses-pt3 branch from 0370df5 to bb5c049 Compare April 16, 2025 20:39
@carljm
Copy link
Contributor

carljm commented Apr 16, 2025

These two seem to depend on missing features in generics

I don't think so? It looks to me like they depend on missing support for dataclass inheritance. The dataclass B should inherit the fields x and y from A, and they should be part of its synthesized __init__ method. I think this will result in false positives if we don't support it, so we may not want to land __init__ synthesis for super long without supporting this inheritance feature.

EDIT: sorry, ignore this! Should have read the PR first. Clearly it does support dataclass fields inheritance, so I think you're right that the issue here is support for inheriting from a generic class. I think maybe #17434 will fix this?

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impeccable work as usual.

Comment on lines +965 to +1027
if !declarations
.clone()
.all(|DeclarationWithConstraint { declaration, .. }| {
declaration.is_some_and(|declaration| {
matches!(
declaration.kind(db),
DefinitionKind::AnnotatedAssignment(..)
)
})
})
{
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A filter for symbol_from_declarations seems reasonable -- or it could even be a filter in the use-def map fetching?

The even trickier part about supporting something like this is that I think it would mean we'd have to generate a union of __init__ methods with different signatures? It seems closely-related to the possibly-unbound handling you mention below, in that sense.

I definitely think this can be a TODO for now, I don't think any other type checker supports it.

continue;
}

if let Some(attr_ty) = attr.symbol.ignore_possibly_unbound() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think it's fine for this to be post-alpha, even post-beta probably.

Comment on lines +886 to +919
let dunder_set = attr_ty.class_member(db, "__set__".into());
if let Some(dunder_set) = dunder_set.symbol.ignore_possibly_unbound() {
// This type of this attribute is a data descriptor. Instead of overwriting the
// descriptor attribute, data-classes will (implicitly) call the `__set__` method
// of the descriptor. This means that the synthesized `__init__` parameter for
// this attribute is determined by possible `value` parameter types with which
// the `__set__` method can be called. We build a union of all possible options
// to account for possible overloads.
let mut value_types = UnionBuilder::new(db);
for signature in &dunder_set.signatures(db) {
for overload in signature {
if let Some(value_param) = overload.parameters().get_positional(2) {
value_types = value_types.add(
value_param.annotated_type().unwrap_or_else(Type::unknown),
);
} else if overload.parameters().is_gradual() {
value_types = value_types.add(Type::unknown());
}
}
}
attr_ty = value_types.build();

// The default value of the attribute is *not* determined by the right hand side
// of the class-body assignment. Instead, the runtime invokes `__get__` on the
// descriptor, as if it had been called on the class itself, i.e. it passes `None`
// for the `instance` argument.

if let Some(ref mut default_ty) = default_ty {
*default_ty = default_ty
.try_call_dunder_get(db, Type::none(db), Type::ClassLiteral(self))
.map(|(return_ty, _)| return_ty)
.unwrap_or_else(Type::unknown);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing.

Base automatically changed from david/dataclasses-pt2 to main April 17, 2025 06:58
@sharkdp sharkdp force-pushed the david/dataclasses-pt3 branch from bb5c049 to a3abde4 Compare April 17, 2025 07:23
@sharkdp sharkdp merged commit b32407b into main Apr 17, 2025
23 checks passed
@sharkdp sharkdp deleted the david/dataclasses-pt3 branch April 17, 2025 07:31
dcreager added a commit that referenced this pull request Apr 17, 2025
* main:
  [red-knot] Detect version-related syntax errors (#16379)
  [`pyflakes`] Add fix safety section (`F841`) (#17410)
  [red-knot] Add `KnownFunction` variants for `is_protocol`, `get_protocol_members` and `runtime_checkable` (#17450)
  Bump 0.11.6 (#17449)
  Auto generate `visit_source_order` (#17180)
  [red-knot] Initial tests for protocols (#17436)
  [red-knot] Dataclasses: synthesize `__init__` with proper signature (#17428)
  [red-knot] Dataclasses: support `order=True` (#17406)
dcreager added a commit that referenced this pull request Apr 18, 2025
* main: (123 commits)
  [red-knot] Handle explicit class specialization in type expressions (#17434)
  [red-knot] allow assignment expression in call compare narrowing (#17461)
  [red-knot] fix building unions with literals and AlwaysTruthy/AlwaysFalsy (#17451)
  [red-knot] Type narrowing for assertions (take 2) (#17345)
  [red-knot] class bases are not affected by __future__.annotations (#17456)
  [red-knot] Add support for overloaded functions (#17366)
  [`pyupgrade`] Add fix safety section to docs (`UP036`) (#17444)
  [red-knot] more type-narrowing in match statements (#17302)
  [red-knot] Add some narrowing for assignment expressions (#17448)
  [red-knot] Understand `typing.Protocol` and `typing_extensions.Protocol` as equivalent (#17446)
  Server: Use `min` instead of `max` to limit the number of threads (#17421)
  [red-knot] Detect version-related syntax errors (#16379)
  [`pyflakes`] Add fix safety section (`F841`) (#17410)
  [red-knot] Add `KnownFunction` variants for `is_protocol`, `get_protocol_members` and `runtime_checkable` (#17450)
  Bump 0.11.6 (#17449)
  Auto generate `visit_source_order` (#17180)
  [red-knot] Initial tests for protocols (#17436)
  [red-knot] Dataclasses: synthesize `__init__` with proper signature (#17428)
  [red-knot] Dataclasses: support `order=True` (#17406)
  [red-knot] Super-basic generic inference at call sites (#17301)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants