Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Extend instance-attribute tests #15808

Merged
merged 2 commits into from
Jan 29, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
198 changes: 135 additions & 63 deletions crates/red_knot_python_semantic/resources/mdtest/attributes.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,97 +13,115 @@ accessed on the class itself.

```py
class C:
def __init__(self, value2: int, flag: bool = False) -> None:
# bound but not declared
self.pure_instance_variable1 = "value set in __init__"

# bound but not declared - with type inferred from parameter
self.pure_instance_variable2 = value2

# declared but not bound
self.pure_instance_variable3: bytes

# declared and bound
self.pure_instance_variable4: bool = True

# possibly undeclared/unbound
def __init__(self, param: int | None, flag: bool = False) -> None:
value = 1 if flag else "a"
self.inferred_from_value = value
self.inferred_from_other_attribute = self.inferred_from_value
self.inferred_from_param = param
self.declared_only: bytes
self.declared_and_bound: bool = True
if flag:
self.pure_instance_variable5: str = "possibly set in __init__"
self.possibly_undeclared_unbound: str = "possibly set in __init__"

c_instance = C(1)

# TODO: should be `Literal["value set in __init__"]`, or `Unknown | Literal[…]` to allow
# assignments to this unannotated attribute from other scopes.
reveal_type(c_instance.pure_instance_variable1) # revealed: @Todo(implicit instance attribute)
# TODO: Mypy/pyright infer `int | str` here. We want this to be `Unknown | Literal[1, "a"]`
reveal_type(c_instance.inferred_from_value) # revealed: @Todo(implicit instance attribute)

# TODO: should be `int`
reveal_type(c_instance.pure_instance_variable2) # revealed: @Todo(implicit instance attribute)
# TODO: Same here. This should be `Unknown | Literal[1, "a"]`
reveal_type(c_instance.inferred_from_other_attribute) # revealed: @Todo(implicit instance attribute)

# TODO: should be `int | None`
reveal_type(c_instance.inferred_from_param) # revealed: @Todo(implicit instance attribute)

# TODO: should be `bytes`
reveal_type(c_instance.pure_instance_variable3) # revealed: @Todo(implicit instance attribute)
reveal_type(c_instance.declared_only) # revealed: @Todo(implicit instance attribute)

# TODO: should be `bool`
reveal_type(c_instance.pure_instance_variable4) # revealed: @Todo(implicit instance attribute)
reveal_type(c_instance.declared_and_bound) # revealed: @Todo(implicit instance attribute)

# TODO: should be `str`
# We probably don't want to emit a diagnostic for this being possibly undeclared/unbound.
# mypy and pyright do not show an error here.
reveal_type(c_instance.pure_instance_variable5) # revealed: @Todo(implicit instance attribute)
reveal_type(c_instance.possibly_undeclared_unbound) # revealed: @Todo(implicit instance attribute)

# This assignment is fine, as we infer `Unknown | Literal[1, "a"]` for `inferred_from_value`.
c_instance.inferred_from_value = "value set on instance"

# TODO: If we choose to infer a precise `Literal[…]` type for the instance attribute (see
# above), this should be an error: incompatible types in assignment. If we choose to infer
# a gradual `Unknown | Literal[…]` type, this assignment is fine.
c_instance.pure_instance_variable1 = "value set on instance"
# This assignment is also fine:
c_instance.inferred_from_param = None

# TODO: this should be an error (incompatible types in assignment)
c_instance.pure_instance_variable2 = "incompatible"
c_instance.inferred_from_param = "incompatible"

# TODO: we already show an error here but the message might be improved?
# mypy shows no error here, but pyright raises "reportAttributeAccessIssue"
# error: [unresolved-attribute] "Type `Literal[C]` has no attribute `pure_instance_variable1`"
reveal_type(C.pure_instance_variable1) # revealed: Unknown
# error: [unresolved-attribute] "Type `Literal[C]` has no attribute `inferred_from_value`"
reveal_type(C.inferred_from_value) # revealed: Unknown

# TODO: this should be an error (pure instance variables cannot be accessed on the class)
# mypy shows no error here, but pyright raises "reportAttributeAccessIssue"
C.pure_instance_variable1 = "overwritten on class"
C.inferred_from_value = "overwritten on class"

c_instance.pure_instance_variable4 = False
# This assignment is fine:
c_instance.declared_and_bound = False

# TODO: After this assignment to the attribute within this scope, we may eventually want to narrow
# the `bool` type (see above) for this instance variable to `Literal[False]` here. This is unsound
# in general (we don't know what else happened to `c_instance` between the assignment and the use
# here), but mypy and pyright support this. In conclusion, this could be `bool` but should probably
# be `Literal[False]`.
reveal_type(c_instance.pure_instance_variable4) # revealed: @Todo(implicit instance attribute)
reveal_type(c_instance.declared_and_bound) # revealed: @Todo(implicit instance attribute)
```

#### Variable declared in class body and declared/bound in `__init__`
#### Variable declared in class body and possibly bound in `__init__`

The same rule applies even if the variable is *declared* (not bound!) in the class body: it is still
a pure instance variable.

```py
class C:
pure_instance_variable: str
declared_and_bound: str | None

def __init__(self) -> None:
self.pure_instance_variable = "value set in __init__"
self.declared_and_bound = "value set in __init__"

c_instance = C()

reveal_type(c_instance.pure_instance_variable) # revealed: str
reveal_type(c_instance.declared_and_bound) # revealed: str | None

# TODO: we currently plan to emit a diagnostic here. Note that both mypy
# and pyright show no error in this case! So we may reconsider this in
# the future, if it turns out to produce too many false positives.
reveal_type(C.pure_instance_variable) # revealed: str
reveal_type(C.declared_and_bound) # revealed: str | None

# TODO: same as above. We plan to emit a diagnostic here, even if both mypy
# and pyright allow this.
C.pure_instance_variable = "overwritten on class"
C.declared_and_bound = "overwritten on class"

# error: [invalid-assignment] "Object of type `Literal[1]` is not assignable to attribute `declared_and_bound` of type `str | None`"
c_instance.declared_and_bound = 1
```

#### Variable declared in class body and not bound anywhere
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section has only been moved.


If a variable is declared in the class body but not bound anywhere, we still consider it a pure
instance variable and allow access to it via instances.

```py
class C:
only_declared: str

c_instance = C()

reveal_type(c_instance.only_declared) # revealed: str

# TODO: mypy and pyright do not show an error here, but we plan to emit a diagnostic.
# The type could be changed to 'Unknown' if we decide to emit an error?
reveal_type(C.only_declared) # revealed: str

# error: [invalid-assignment] "Object of type `Literal[1]` is not assignable to attribute `pure_instance_variable` of type `str`"
c_instance.pure_instance_variable = 1
# TODO: mypy and pyright do not show an error here, but we plan to emit one.
C.only_declared = "overwritten on class"
```

#### Variable only defined in unrelated method
Expand All @@ -112,45 +130,66 @@ We also recognize pure instance variables if they are defined in a method that i

```py
class C:
def set_instance_variable(self) -> None:
self.pure_instance_variable = "value set in method"
def __init__(self, param: int | None, flag: bool = False) -> None:
self.initialize(param, flag)

c_instance = C()
def initialize(self, param: int | None, flag: bool) -> None:
value = 1 if flag else "a"
self.inferred_from_value = value
self.inferred_from_other_attribute = self.inferred_from_value
self.inferred_from_param = param
self.declared_only: bytes
self.declared_and_bound: bool = True

c_instance = C(1)

# TODO: Should be `Unknown | Literal[1, "a"]`
reveal_type(c_instance.inferred_from_value) # revealed: @Todo(implicit instance attribute)

# TODO: Should be `Unknown | Literal[1, "a"]`
reveal_type(c_instance.inferred_from_other_attribute) # revealed: @Todo(implicit instance attribute)

# Not that we would use this in static analysis, but for a more realistic example, let's actually
# call the method, so that the attribute is bound if this example is actually run.
c_instance.set_instance_variable()
# TODO: Should be `int | None`
reveal_type(c_instance.inferred_from_param) # revealed: @Todo(implicit instance attribute)

# TODO: should be `Literal["value set in method"]` or `Unknown | Literal[…]` (see above).
reveal_type(c_instance.pure_instance_variable) # revealed: @Todo(implicit instance attribute)
# TODO: Should be `bytes`
reveal_type(c_instance.declared_only) # revealed: @Todo(implicit instance attribute)

# TODO: Should be `bool`
reveal_type(c_instance.declared_and_bound) # revealed: @Todo(implicit instance attribute)

# TODO: We already show an error here, but the message might be improved?
# error: [unresolved-attribute]
reveal_type(C.pure_instance_variable) # revealed: Unknown
reveal_type(C.inferred_from_value) # revealed: Unknown

# TODO: this should be an error
C.pure_instance_variable = "overwritten on class"
C.inferred_from_value = "overwritten on class"
```

#### Variable declared in class body and not bound anywhere

If a variable is declared in the class body but not bound anywhere, we still consider it a pure
instance variable and allow access to it via instances.
#### Methods that does not use `self` as a first parameter

```py
class C:
pure_instance_variable: str
# This might trigger a stylistic lint like `invalid-first-argument-name-for-method`, but
# it should be supported in general:
def __init__(this) -> None:
this.declared_and_bound: str | None = "a"

c_instance = C()
# TODO: should be `str | None`
reveal_type(C().declared_and_bound) # revealed: @Todo(implicit instance attribute)
```

reveal_type(c_instance.pure_instance_variable) # revealed: str
#### Aliased `self` parameter

# TODO: mypy and pyright do not show an error here, but we plan to emit a diagnostic.
# The type could be changed to 'Unknown' if we decide to emit an error?
reveal_type(C.pure_instance_variable) # revealed: str
```py
class C:
def __init__(self) -> None:
this = self
this.declared_and_bound: str | None = "a"

# TODO: mypy and pyright do not show an error here, but we plan to emit one.
C.pure_instance_variable = "overwritten on class"
# TODO: This would ideally be `str | None`, but mypy/pyright don't support this either,
# so `Unknown` + a diagnostic is also fine.
reveal_type(C().declared_and_bound) # revealed: @Todo(implicit instance attribute)
Comment on lines +190 to +192
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh, interesting. Yeah, I can see how this could be hard. Simply checking that the this variable in __init__ has the same type as self would not be sufficient; you'd need to know that it refers to exactly the same object as self.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, pyright does actually seem to have a special treatment of self / cls:

When a type annotation for a method’s self or cls parameter is omitted, pyright will infer its type based on the class that contains the method. The inferred type is internally represented as a type variable that is bound to the class.

The type of self is represented as Self@ClassName where ClassName is the class that contains the method. Likewise, the cls parameter in a class method will have the type Type[Self@ClassName].

class C:
    def __init__(self) -> None:
        reveal_type(self)  # pyright:  Self@C

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If by "special treatment" you mean "special treatment based on the name of the parameter," I don't see evidence for that in the linked docs or in the playground? Pyright does implement a lint rule telling you to change the name to self/cls, but using a different name doesn't seem to otherwise impact its type inference. The phrase "a method's self or cls parameter" in those docs does not mean "a parameter named self or cls", it means "the first parameter of a method or classmethod."

If by "special treatment" you mean assigning a type to an un-annotated first parameter that is a typevar bound to the type of the containing class, we will also need to do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply checking that the this variable in __init__ has the same type as self would not be sufficient; you'd need to know that it refers to exactly the same object as self.

Once we implement correct Self typing, then checking that it has the same type as self will actually be sufficient, because the Self type can't apply to anything other than self.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we implement correct Self typing, then checking that it has the same type as self will actually be sufficient, because the Self type can't apply to anything other than self.

Is that true?

from typing import Self

class Foo:
    def __init__(self, other: Self | None = None):
        if other is not None:
            other.bar = 42

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point. I think it would also be pretty reasonable to consider other.bar = 42 the same way we would consider self.bar = 42; both are cases of a method of the very same class assigning to an attribute on its self type. The only real distinction would be if we were trying to validate full initialization of self, but I don't think we're aiming to do that? But it would also be OK not to, whatever is easier I think, since I doubt it happens often.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, that makes sense!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If by "special treatment" you mean assigning a type to an un-annotated first parameter that is a typevar bound to the type of the containing class, we will also need to do that.

Sorry, I should have been more clear. This is what I meant, yes. It's not yet clear to me why pyright sometimes infers C, sometimes *C (for "partially unknown" types?), and sometimes Self@C.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. The type of self should generally be Self@C, which is a type variable with upper bound C; this is different from C in that it accounts for the possibility of a subclass, which has implications for methods that return self (when called on a subclass, they will return an instance of the subclass, not C), and for valid-Liskov-override checking of method signatures (normally parameter types are contravariant, so a subclass method parameter could not accept a subtype of the corresponding superclass method parameter, but this is exactly what needs to happen with self). Not sure in what cases pyright will flatten that to just C, or what C* represents.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of the inconsistency in pyright's behaviour here may be because of what PEP484 specified before we added a Self type to the typing system: https://peps.python.org/pep-0484/#annotating-instance-and-class-methods

Pyright and other type checkers had to change various behaviours when Self was added

```

### Pure class variables (`ClassVar`)
Expand Down Expand Up @@ -277,6 +316,39 @@ reveal_type(C.variable_with_class_default1) # revealed: str
reveal_type(c_instance.variable_with_class_default1) # revealed: str
```

### Inheritance of class/instance attributes

#### Instance variable defined in a base class

```py
class Base:
declared_in_body: int | None = 1

can_not_be_redeclared: str | None = None

def __init__(self) -> None:
self.defined_in_init: str | None = "value in base"

class Intermediate(Base):
# TODO: Mypy does not report an error here, but pyright does:
# "… overrides symbol of same name in class "Base". Variable is mutable so its type is invariant"
# We should introduce a diagnostic for this. Whether or not that should be enabled by default can
# still be discussed.
can_not_be_redeclared: str = "a"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two possible rules here: "subclass can not re-declare, period", or "subclass can only re-declare with an equivalent type, because mutability implies invariance". The name can_not_be_redeclared here suggests the former, but pyright implements the latter. There's no unsoundness to redeclare with the same type on a subclass, and pyright allows it without error. Perhaps we could be a little clearer about that here? Even show an example of both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. #15826


def __init__(self) -> None:
super().__init__()

class Derived(Intermediate): ...

reveal_type(Derived.declared_in_body) # revealed: int | None

reveal_type(Derived().declared_in_body) # revealed: int | None

# TODO: Should be `str | None`
reveal_type(Derived().defined_in_init) # revealed: @Todo(implicit instance attribute)
```

## Union of attributes

```py
Expand Down
Loading