Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: @dataclasses.dataclass(slots=True) doesn't support methods using zero-arg super() #90562

Closed
Tinche mannequin opened this issue Jan 16, 2022 · 23 comments
Closed

RFE: @dataclasses.dataclass(slots=True) doesn't support methods using zero-arg super() #90562

Tinche mannequin opened this issue Jan 16, 2022 · 23 comments
Labels
stdlib Python modules in the Lib dir topic-dataclasses type-feature A feature request or enhancement

Comments

@Tinche
Copy link
Mannequin

Tinche mannequin commented Jan 16, 2022

BPO 46404
Nosy @vstinner, @ericvsmith, @encukou, @ericsnowcurrently, @hynek, @Tinche, @frenzymadness, @sweeneyde

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2022-01-16.22:48:06.738>
labels = ['interpreter-core', 'type-feature', '3.11']
title = "RFE: @dataclasses.dataclass(slots=True) doesn't support methods using closures"
updated_at = <Date 2022-03-28.15:16:37.392>
user = 'https://github.com/Tinche'

bugs.python.org fields:

activity = <Date 2022-03-28.15:16:37.392>
actor = 'vstinner'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2022-01-16.22:48:06.738>
creator = 'tinchester'
dependencies = []
files = []
hgrepos = []
issue_num = 46404
keywords = []
message_count = 8.0
messages = ['410730', '410745', '411462', '415291', '415300', '415303', '416169', '416174']
nosy_count = 8.0
nosy_names = ['vstinner', 'eric.smith', 'petr.viktorin', 'eric.snow', 'hynek', 'tinchester', 'frenzy', 'Dennis Sweeney']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue46404'
versions = ['Python 3.11']

Linked PRs

@Tinche
Copy link
Mannequin Author

Tinche mannequin commented Jan 16, 2022

We've received a report over at the attrs issue tracker about our test suite failing on Python 3.11. Here's the link: python-attrs/attrs#907

It turns out to be an issue with the no-arg super() calls in slotted classes. Here's a minimal reproducer example:

from attrs import define


@define
class A:
    pass


@define
class B(A):
    def test(self):
        super()


B().test()
Traceback (most recent call last):
  File "/Users/tintvrtkovic/pg/attrs/a01.py", line 15, in <module>
    B().test()
    ^^^^^^^^^^
  File "/Users/tintvrtkovic/pg/attrs/a01.py", line 12, in test
    super()
    ^^^^^^^
TypeError: super(type, obj): obj must be an instance or subtype of type

This is a known issue for which we have implemented workarounds. The workarounds aren't effective for 3.11 though. I have implemented a fix in attrs (python-attrs/attrs#910), but I still thought I'd post this here to maybe get the core devs opinion.

Dataclasses exhibit the exact same issue when used with slots=True, both in 3.10 when slots was added and in 3.11. I guess no one reported it or tried fixing it.

A comprehensive description of the issue follows: since it's impossible to add slotness (i.e. set __slots__) to a class after it has been created, when creating a slotted class the class decorators in attrs and dataclasses actually replace the class they are applied to with a copy of it, with slots added. This works, except in the case of the no-arg super() being used in any of the class methods (and maybe another edge case that I can't remember). When the compiler encounters the no-arg super() form, it adds some state to the function __closure__ cells. This state causes the exception shown above, since it's incorrect when the class gets replaced.

So these closure cells need to be rewritten when the class is replaced. In Python versions prior to 3.11, the closure cells were immutable so extra effort was needed to rewrite them. The functions are here: https://github.com/python-attrs/attrs/blob/9727008fd1e40bc55cdc6aee71e0f61553f33127/src/attr/_compat.py#L145.

In 3.11, our old closure cell rewriting doesn't work any more, but closure cells don't appear to be immutable either, so the fix in my attr PR linked above is simple. Still, it's another branch in the code to support a specific version.

I don't know if there's anything actionable here for Python, apart from confirming or denying if this behavior is expected.

@Tinche Tinche mannequin added type-bug An unexpected behavior, bug, or error 3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jan 16, 2022
@sweeneyde
Copy link
Member

bisected to here:

631f993 is the first new commit
commit 631f993
Author: Eric Snow <ericsnowcurrently@gmail.com>
Date: Mon Jun 7 16:52:00 2021 -0600

bpo-43693: Add the MAKE_CELL opcode and interleave fast locals offsets. (gh-26396)

This moves logic out of the frame initialization code and into the compiler and eval loop.  Doing so simplifies the runtime code and allows us to optimize it better.

https://bugs.python.org/issue43693

@encukou
Copy link
Member

encukou commented Jan 24, 2022

I guess at least there should be a warning about this in dataclasses docs?
The reproducer with dataclasses (which exhibits the same error on 3.10 and 3.11):

import dataclasses

@dataclasses.dataclass(slots=True)
class A:
    pass


@dataclasses.dataclass(slots=True)
class B(A):
    def test(self):
        super()

B().test()

@frenzymadness
Copy link
Mannequin

frenzymadness mannequin commented Mar 15, 2022

In my opinion, we should keep it simple for attrs and dataclasses to fix closure cells when a class is replaced and therefore it seems to be correct to have it mutable as it currently is in 3.11.

My plan is to implement the fix for dataclasses and some tests for these use cases so the behavior should not change in the future. For attrs it means one more branch in the code now but much simpler code when the support for older releases gets dropped.

Any other opinions?

@ericvsmith
Copy link
Member

@frenzy: I'm not sure what your fix would do. You could either describe it in rough terms (if you'd like a pre-PR opinion on the approach), or I'm happy to wait to see your PR.

@frenzymadness
Copy link
Mannequin

frenzymadness mannequin commented Mar 15, 2022

We have the same problem reported in attrs here in dataclasses and because it's not tested the way to manipulate __closure__ cells changes frequently.

My plan is to implement something similar to this into dataclasses: https://github.com/python-attrs/attrs/blob/5c040f30e3e4b3c9c0f27c8ac6ff13d604c1818c/src/attr/_make.py#L895-L916

Basically, when a new dataclass is created (with slots=True), look for references to the original class and fix them.

This fixes the problem reported to attrs in dataclasses and when we fix it and add some tests for it, the future behavior should be more stable.

What do you think?

@vstinner
Copy link
Member

I changed the issue title to focus this issue on enhance dataclasses to support @dataclasses.dataclass(slots=True) on methods using closures: it would be a new Python 3.11 feature.

I created https://bugs.python.org/issue47143 "Add functools.copy_class() which updates closures".

@vstinner vstinner changed the title 3.11a4: a small attrs regression RFE: @dataclasses.dataclass(slots=True) doesn't support methods using closures Mar 28, 2022
@vstinner vstinner added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Mar 28, 2022
@vstinner
Copy link
Member

See also bpo-45520: "Frozen dataclass deep copy doesn't work with __slots__" which is related but a different issue.

@wookie184
Copy link
Contributor

wookie184 commented May 1, 2023

I had a go at implementing the fix @frenzymadness mentioned, @ericvsmith @vstinner if you're able to take a look: #104038

Apologies if you were still planning on implementing this @frenzymadness, as you may have been still waiting for a go-ahead. Since this had stalled for a bit I figured a PR was the easiest way to try and push this forwards.

@arhadthedev arhadthedev added 3.12 bugs and security fixes and removed 3.11 only security fixes labels May 1, 2023
@carljm carljm changed the title RFE: @dataclasses.dataclass(slots=True) doesn't support methods using closures RFE: @dataclasses.dataclass(slots=True) doesn't support methods using zero-arg super() May 1, 2023
@carljm
Copy link
Member

carljm commented May 1, 2023

I updated the issue title to clarify that the problem here is specifically about the __class__ cell, used by zero-arg super(), not about closures in general.

One observation about the proposed fix is that (because it modifies the contents of the cell in place) it "breaks" the original class, in case anyone had a reference to it, e.g. something like this:

class Orig:
    def method(self):
        return super().method()

SlotifiedDataclass = dataclass(slots=True)(Orig)

With the fix proposed above, SlotifiedDataclass.method() will now work, but Orig.method() will now be broken (it will have the wrong class in its __class__ cell.)

Perhaps this is acceptable collateral damage, breaking a rare case in order to fix the more common case? But it could break currently-working code. (Edited to add: I think it's quite unlikely in practice that anyone would have an example like this in current code, where they expect Orig.method() to work but don't care that SlotifiedDataclass.method() is broken.)

Even putting a new cell into the function's closure (rather than updating the cell content in place) wouldn't fix this, since the closure (and the function object itself) are also shared between the original and the new class. The only way to avoid it would be to make a new function object, using the same code object as the original one. But this is quite tricky given that the function object could be wrapped by arbitrary decorators etc.

So perhaps we just have to accept that in order to make the new slotified class fully functional, we have to break the old class.

If so, it still seems unfortunate that we have to go scrabbling through all the methods, hoping that at least one of them is not wrapped in a decorator we don't know how to introspect, so we can update the single cell that we actually need to update, just because Python hides it from us. I wonder if there's any reason the class cell shouldn't be left exposed as __classcell__ in a class namespace, rather than deleted from the class namespace? Then this fix would be trivial: just get OldClass.__classcell__ and update its contents.

All in all, this issue mostly makes me think that slots support via class-copying was a mistake. FWIW, there would be another option, also ugly in its own way: you can actually add slots to a class post-creation (in C code), but it's only safe to do so if you know that there aren't any instances of the class yet. This is proxy-detectable via the reference count, since every instance holds a reference to its class. So (via a C helper) we could add slots to the existing class, and error out if you try to make a slotified dataclass out of a class you've already somehow created instances of. This limitation seems arguably better than the more complex limitations arising from this issue.

@ericvsmith
Copy link
Member

At the core sprint last year we discussed the "add slots if no instances have been created" proposal. I think it would be the preferred way to go, but I'm concerned about the impact on other python implementations.

@carljm
Copy link
Member

carljm commented May 1, 2023

At the core sprint last year we discussed the "add slots if no instances have been created" proposal. I think it would be the preferred way to go, but I'm concerned about the impact on other python implementations.

I think the more general concept of "mark a class when its first instance is created" is reasonable to expect other implementations to be able to provide in some way? Even in CPython, if we found the refcount-based approach to be too subtle, we could pretty easily do a more explicit marker via type flag (though we may not prefer to spend a type flag on this, or do one extra check in instance creation.)

@wookie184
Copy link
Contributor

One observation about the proposed fix is that (because it modifies the contents of the cell in place) it "breaks" the original class

Since dataclasses normally work by modifying the class, and the new class generation is an (albeit leaked and documented) implementation detail, I think it's expected that using the old class isn't really supported. Either way, it's no more broken than the actual dataclass is currently, which I think is a fair trade in brokenness!

If so, it still seems unfortunate that we have to go scrabbling through all the methods, hoping that at least one of them is not wrapped in a decorator we don't know how to introspect, so we can update the single cell that we actually need to update, just because Python hides it from us.

Yeah, it's a pretty ugly solution. I think it's an improvement on the status quo, given the cases where it doesn't work should be fairly uncommon, and avoidable, but it's certainly not perfect so better options are definitely worth considering.

I wonder if there's any reason the class cell shouldn't be left exposed as classcell in a class namespace, rather than deleted from the class namespace? Then this fix would be trivial: just get OldClass.classcell and update its contents.

I'd also wondered this, although thought it would be a bit of a shame to have to add a new attribute just for this. Maybe it could be useful for other things though.

FWIW, there would be another option, also ugly in its own way: you can actually add slots to a class post-creation (in C code), but it's only safe to do so if you know that there aren't any instances of the class yet. This is proxy-detectable via the reference count, since every instance holds a reference to its class. So (via a C helper) we could add slots to the existing class, and error out if you try to make a slotified dataclass out of a class you've already somehow created instances of. This limitation seems arguably better than the more complex limitations arising from this issue.

The no-instances constraint doesn't sound like an issue. I can't comment on impact on other implementations, but this does sound like the neatest solution from a user perspective.

hugovk added a commit that referenced this issue May 21, 2024
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: Erlend E. Aasland <erlend@python.org>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 21, 2024
)

(cherry picked from commit e3ed574)

Co-authored-by: Josh Cannon <joshdcannon@gmail.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: Erlend E. Aasland <erlend@python.org>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 21, 2024
)

(cherry picked from commit e3ed574)

Co-authored-by: Josh Cannon <joshdcannon@gmail.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: Erlend E. Aasland <erlend@python.org>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
hugovk added a commit that referenced this issue May 23, 2024
…119350)

Co-authored-by: Josh Cannon <joshdcannon@gmail.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: Erlend E. Aasland <erlend@python.org>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
hugovk added a commit that referenced this issue May 23, 2024
…119351)

Co-authored-by: Josh Cannon <joshdcannon@gmail.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: Erlend E. Aasland <erlend@python.org>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
@ericvsmith
Copy link
Member

I have a code fix that I think addresses this. Not sure I can get it into 3.13, but I'll try.

@blhsing
Copy link
Contributor

blhsing commented Jul 9, 2024

I have a code fix that I think addresses this. Not sure I can get it into 3.13, but I'll try.

Thanks @ericvsmith. Can you let us know if you have started working on this, and which approach you plan on taking or have taken?

@carljm has helpfully reasoned through several options. Among them, adding slots to a class post-creation sounded awesome, until I realized that it's technically impossible to do because memory for member descriptors must be allocated at the time of type creation with the type itself as one block of memory to allow offset-based access:

type = (PyTypeObject *)metatype->tp_alloc(metatype, ctx->nslot);

I therefore think exposing __classcell__ is the easiest and most viable option. Don't think it can cause any harm either since it's documented that dunders are reserved so there should be no existing code base that uses __classcell__. If you haven't started working on this I can give this approach a shot.

@ericvsmith
Copy link
Member

I have a mostly working version of a fix. I'll try to find it and post it here.

estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: Erlend E. Aasland <erlend@python.org>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
@picnixz picnixz added topic-dataclasses stdlib Python modules in the Lib dir and removed 3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Aug 16, 2024
@loic-simon
Copy link

An issue I think is related: dataclasses cannot be generic (pre- or post-695) when using both slots=True and frozen=True:

>>> @dataclass(frozen=True, slots=True)
... class C[T]:
...     a: T
... 
>>> C[str]
__main__.C[str]
>>> C[str]("foo")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "[...]/3.12.1/lib/python3.12/typing.py", line 1142, in __call__
    result.__orig_class__ = self
    ^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 5, in __setattr__
TypeError: super(type, obj): obj must be an instance or subtype of type

If the dataclass is only frozen or only slotted, no issue:

>>> @dataclass(frozen=True)
... class C[T]:
...     a: T
... 
>>> C[str]("foo")
C(a='foo')
>>> @dataclass(slots=True)
... class C[T]:
...     a: T
... 
>>> C[str]("foo")
C(a='foo')

Would this case be covered by your fix @ericvsmith? 🙏

@ericvsmith
Copy link
Member

I'm not sure, but I'll check. I'm sprinting on it this week and will check this use case.

@ericvsmith
Copy link
Member

@loic-simon : I can't duplicate this in an unmodified 3.12.3:

Python 3.12.3 (main, Apr  9 2024, 08:09:14) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from dataclasses import dataclass
>>> @dataclass(frozen=True, slots=True)
... class C[T]:
...  a:T
... 
>>> C[str]
__main__.C[str]
>>> C[str]("foo")
C(a='foo')

@loic-simon
Copy link

Oh, indeed! I just checked in 3.12.2 and 3.12.3, this has apparently been fixed in 3.12.3, possibly by GH-115165?

Sorry for the noise anyway!

@ericvsmith
Copy link
Member

No problem! I'll remove my test case for it.

@ericvsmith
Copy link
Member

Thanks @ericvsmith. Can you let us know if you have started working on this, and which approach you plan on taking or have taken?

And to actually answer this question: I'm basically going with the approach in #104038 of fixing up the closure cells. I'm still working through tests and corner cases, and looking at @carljm's comments in #90562 (comment).

I talked to @markshannon at this year's PyCon, and he dissuaded me from the "can modify __slots__ until an instance is created" approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-dataclasses type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests