Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve UX for mocking dataclasses that are problematic to instantiate #124176

Open
ncoghlan opened this issue Sep 17, 2024 · 6 comments
Open

Improve UX for mocking dataclasses that are problematic to instantiate #124176

ncoghlan opened this issue Sep 17, 2024 · 6 comments
Assignees
Labels
topic-dataclasses type-feature A feature request or enhancement

Comments

@ncoghlan
Copy link
Contributor

ncoghlan commented Sep 17, 2024

Feature or enhancement

Proposal:

Passing instance=True to create_autospec misses fields without default values, even when the given spec is a dataclass object:

>>> from dataclasses import dataclass
>>> @dataclass
... class Example:
...     a: int
...     b: int = 0
...     def sum_fields(self):
...         return self.a + self.b
...
>>> from unittest.mock import create_autospec
>>> mock_instance = create_autospec(Example(1))
>>> mock_instance_from_class = create_autospec(Example, instance=True)
>>> set(dir(mock_instance)) - set(dir(mock_instance_from_class))
{'a'}

This is despite dataclass definitions making their field information readily available for introspection on the class object (without requiring instantiation):

>>> from dataclasses import fields
>>> [(field.name, field.type) for field in fields(Example)]
[('a', <class 'int'>), ('b', <class 'int'>)]

A similar problem occurs if the dataclass uses __post_init__ to set attributes that are not otherwise set:

>>> from dataclasses import dataclass, field
>>> @dataclass
... class PostInitExample:
...     a: int = field(init=False)
...     b: int = field(init=False)
...     def __post_init__(self):
...         self.a = self.b = 1
...
>>> mock_post_init_class = create_autospec(Example)
>>> mock_post_init_class = create_autospec(PostInitExample)
>>> mock_post_init_instance = create_autospec(PostInitExample())
>>> set(dir(mock_post_init_instance)) - set(dir(mock_post_init_class()))
{'b', 'a'}

While for most dataclasses it is straightforward to instantiate a specific instance and derive the mock autospec from that, this may not be desirable (or even feasible) if the dataclass requires references to real external resources to create a real instance.

There are various potential workarounds available for this functional gap, but they're all relatively clumsy, and come with their own problems (like not being able to use spec_set=True if the missing fields are added manually, or not handling defined methods properly if setting an explicit list of fields instead of using autospec, or still not adding the fields only defined in __post_init__ if instantiating a class mock).

By contrast, if create_autospec were to be made explicitly aware of data classes, it could do a pass over dataclasses.fields(spec) and use the type information to fill in any missing fields that don't have class level default values set.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

Previously filed here, but closed on the basis of create_autospec covering the use case: #80761

This is only true if the dataclass can be readily instantiated, hence this feature request.

There is also some previous discussion (and assorted attempted workarounds with various flaws) on this Stack Overflow question: https://stackoverflow.com/questions/51640505/how-to-use-spec-when-mocking-data-classes-in-python

Linked PRs

@ncoghlan ncoghlan added the type-feature A feature request or enhancement label Sep 17, 2024
@sobolevn
Copy link
Member

Great idea! @ncoghlan do you want to work on this? Or can I take this over? :)

@ncoghlan
Copy link
Contributor Author

I already worked around it for my use case (leaving spec_set as False and adding in the missing attributes manually), so please feel free to take it on!

I think the basic cases where the field types are just a simple type will be straightforward (use create_autospec(..., instance=True) recursively), and the some_type|None/Optional[type] case can ignore the None branch and do the same. ClassVar and InitVar will already be filtered out by the fields(...) call.

Checking the result of dir(str|int) it might be OK to simply not do anything special for non-trival unions and other more complex type definitions like generic containers. The mock may end up with some additional methods it wouldn't otherwise have, but that's probably an acceptable limitation. For example:

>>> dir(str|int)
['__args__', '__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__or__', '__parameters__', '__reduce__', '__reduce_ex__', '__repr__', '__ror__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']

>>> set(dir(tuple[str, int, str])) - set(dir(tuple))
{'__deepcopy__', '__unpacked__', '__typing_unpacked_tuple_args__', '__parameters__', '__copy__', '__args__', '__mro_entries__', '__origin__'}

@sobolevn
Copy link
Member

This will land in 3.14! 🎉

@ncoghlan
Copy link
Contributor Author

ncoghlan commented Sep 27, 2024

@sobolevn , it belatedly occurred to me to ask: should this feature do something special for Annotated types? dataclasses doesn't do anything special with those, so the fully annotated type is reported in the fields output:

>>> @dataclass
... class Example:
...     a: Annotated[int, "Annotation"]
...
>>> from dataclasses import fields
>>> [(f.name, f.type) for f in fields(Example)]
[('a', typing.Annotated[int, 'Annotation'])]

Annotated types make the underlying type available in __origin__, so this could be handled via:

>>> @dataclass
... class Example:
...     a: int
...     b: Annotated[int, "Want to ignore this"]
...
>>> [(f.name, t.__origin__ if get_origin((t := f.type)) is Annotated else t) for f in fields(Example)]
[('a', <class 'int'>), ('b', <class 'int'>)]

I also realised we want to avoid a behaviour change for dataclasses that do define a default value of a specific type within a more general type category:

>>> @dataclass
... class Example:
...     narrow_default: int|None = field(default=30)
...
>>> spec_mock = create_autospec(Example, instance=True)
>>> spec_mock.narrow_default
<NonCallableMagicMock name='mock.narrow_default' spec='int' id='140646074973744'>
>>> [(f.name, f.type) for f in fields(Example)]
[('narrow_default', int | None)]

Since the initial implementation processes every fields entry after processing the dir entries, the mock in this example would switch from mocking int to mocking int|None.

While I think that would be a reasonable design if this native dataclass support had been added when dataclasses first entered the standard library, at this point we don't want to change the mocked types of attributes that were already being picked up by the previous dir-only implementation.

For the attributes that do appear in dir, we want to keep the existing attribute look up based processing, and only use fields for the attributes that would otherwise be missing.

@ncoghlan ncoghlan reopened this Sep 27, 2024
@sobolevn
Copy link
Member

I agree with the second part 100%, this is a regression in my new change. PR is on its way.
I propose to discuss the first part separately. Because, we might want to cover other corner cases.

@ncoghlan
Copy link
Contributor Author

ncoghlan commented Oct 7, 2024

I agree, the first part would be a new feature request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-dataclasses type-feature A feature request or enhancement
Projects
Status: In Progress
Development

No branches or pull requests

3 participants