gh-103000: Optimise dataclasses asdict/astuple for common types #103005

DavidCEllis · 2023-03-24T15:11:57Z

PR for issue #103000

I'm not attached to _ATOMIC_TYPES as the name for the set but 'atomic' matched how they're referred to in copy and I didn't have anything more suitable.

Issue: Dataclasses - Improve the performance of asdict/astuple for common types and default values #103000

Lib/dataclasses.py

carljm

One idea for a test that could be added here: we could add a test that dataclasses' _ATOMIC_TYPES is a subset of deepcopy's. This would protect against a future change to either of those lists breaking the semantic consistency that everything is deep-copied in asdict/astuple.

Lib/dataclasses.py

carljm · 2023-03-24T17:06:50Z

I think this PR should include a news entry for the performance improvement.

carljm

Requesting the mentioned inline changes, the suggested test, the news entry, and the change to the simpler version that tests against _ATOMIC_TYPES just once at the top of the functions.

Thanks for finding and working on this optimization! Quite an impressive perf win either way.

bedevere-bot · 2023-03-24T17:24:23Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Improve comment describing _ATOMIC_TYPES Co-authored-by: Carl Meyer <carl@oddbird.net>

Remove comment referring to weakref Co-authored-by: Carl Meyer <carl@oddbird.net>

…by the python stdlib json module).

DavidCEllis · 2023-03-24T18:59:29Z

A few unresolved things:

dict_factory=dict optimisation will now be done in a separate follow-up PR.
If I make a test for _ATOMIC_TYPES is there a specific spot it whould go?
In the discussion thread Eric posted about getting copy to handle/provide the list/set of types
- I think that's out of scope for this change and conceals why these types matter, but maybe worth looking into if it would be useful elsewhere
- This could always be changed to use such a set if it was provided and if that was the right choice

With regard to the 'simpler' version, while it is a cleaner change I'm not sure it's the right decision to give up on the performance improvement that we can get now for the potential that changes to the interpreter might get some of it back in the future?

However I recognise I'm also not the one who has to maintain this and I do understand that it is a bit messier so I'll make the change if that's still requested. (The thumbs up on my comment explaining why I hadn't just done that has left me a bit confused on that front).

AlexWaygood · 2023-03-24T19:09:20Z

If I make a test for _ATOMIC_TYPES is there a specific spot it whould go?

I'm sort-of -0 on importing anything private from copy.py, whether it's for dataclasses itself or a dataclasses test. I don't like the idea of coupling the logic of the two modules like that. But I'm not a dataclasses maintainer, so you should probably listen to Carl and Eric over me on this.

Anyway, the test, if you do add it, should go somewhere in Lib/test/test_dataclasses.py.

AlexWaygood · 2023-03-24T19:18:58Z

With regard to the 'simpler' version, while it is a cleaner change I'm not sure it's the right decision to give up on the performance improvement that we can get now for the potential that changes to the interpreter might get some of it back in the future?

However I recognise I'm also not the one who has to maintain this and I do understand that it is a bit messier so I'll make the change if that's still requested. (The thumbs up on my comment explaining why I hadn't just done that has left me a bit confused on that front).

Another way of looking at that is: should we make the code significantly less readable now for an optimisation based on an implementation detail of the current interpreter, that could change at any time? Once the code is checked in, it will be hard to justify changing it in the future for readability reasons, as we highly value code stability in the CPython repo -- we generally only make changes if they fix user-visible bugs or are a meaningful performance improvement. The readability loss could be permanent.

I would still favour the simpler code. I'll let @carljm explain the meaning of his thumbs-up ;)

carljm · 2023-03-24T20:12:52Z

(The thumbs up on my comment explaining why I hadn't just done that has left me a bit confused on that front).

Sorry, my mistake! I've removed the thumbs-up to clarify :) Somehow the first time around I missed the second sentence of that comment, and only saw the first part saying that you had already done the benchmark. I have no explanation for how I managed to miss the second sentence, considering I was actually a bit confused why you were mentioning that you'd already done the benchmark without offering any further conclusions!

Another way of looking at that is: should we make the code significantly less readable now for an optimisation based on an implementation detail of the current interpreter, that could change at any time? Once the code is checked in, it will be hard to justify changing it in the future for readability reasons, as we highly value code stability in the CPython repo -- we generally only make changes if they fix user-visible bugs or are a meaningful performance improvement. The readability loss could be permanent.

I would still favour the simpler code.

I agree with this. There is always a balance between readability and performance, and we don't always prefer any detectable improvement in performance at any cost in readability and maintainability. We can get most of the wins here with much nicer code.

I'm sort-of -0 on importing anything private from copy.py, whether it's for dataclasses itself or a dataclasses test. I don't like the idea of coupling the logic of the two modules like that.

Logically, the coupling is introduced by the optimization itself; the test is just verifying the assumptions of the optimization aren't broken. That said, I think it's very unlikely that deepcopy would ever reduce the set of objects that it treats as atomic, and it's more likely they might rearrange the details of how that dispatch happens, so probably the test would be much more likely to fail for spurious and annoying reasons than for real ones. So I've changed my mind: forget the test :)

AlexWaygood

Looks very good to me now, but still needs a news entry!

DavidCEllis · 2023-03-24T21:02:50Z

I think that's all the comments covered?

I'm not sure I'll do a PR for the dict_factory special case, I think that was less significant than skipping the function call overhead? Logically now you'd probably write it as a comprehension (with the inline checks this was much more awkward) and I don't remember that getting as much improvement (perhaps when PEP709 lands).

DavidCEllis · 2023-03-24T21:20:52Z

I have made the requested changes; please review again

bedevere-bot · 2023-03-24T21:20:57Z

Thanks for making the requested changes!

@carljm: please review the changes made to this pull request.

ericvsmith · 2023-03-24T21:48:51Z

I'm basically okay with this, although as I mentioned on discuss.python.org I'm concerned about maintaining the list of types here and not in copy. This seems to be a generally useful optimization, why enable it only here? Should every module that wants to do something similar maintain it's own copy of _ATOMIC_TYPES?

DavidCEllis · 2023-03-24T22:24:36Z

The way I look at it now is that it's not really about deepcopy. deepcopy ignoring these objects permits us to special case and return them, but if they weren't useful types it would be irrelevant.

The overlap with the types the stdlib json can handle and the other potentially common types (complex, bytes) are what made this seem worthwhile to me. The remaining types are mostly there because I didn't have a compelling reason not to include them? Including everything at least fit the pattern of 'ignored by deepcopy' rather than being an arbitrary subset that I had chosen (and I definitely didn't think I should be the one choosing).

carljm · 2023-03-27T22:31:54Z

I'm concerned about maintaining the list of types here and not in copy. This seems to be a generally useful optimization, why enable it only here? Should every module that wants to do something similar maintain it's own copy of _ATOMIC_TYPES?

I think this is mostly a dataclasses-specific optimization, where one component of it is avoiding an unnecessary call to deepcopy. A chunk of the win comes from short-circuiting the other type checks done by _as*_inner before it even tries deepcopy. Dataclasses, like deepcopy, is implementing a recursive algorithm that should descend only into "non-atomic" types, which is why this optimization can do "double duty" for dataclasses. I doubt it would make sense for other callers of deepcopy who aren't in a similar situation to bother with a pre-check for atomic types.

Given that I don't see other similar uses of deepcopy in the stdlib that would benefit from this, I'm not sure it's worth exposing a new public attribute for this on the copy module? Duplication seems OK here. But I'm also not opposed to making the change in copy to expose it.

carljm

Code changes LGTM here. Will defer to @ericvsmith on whether he wants copy to expose the atomic-types list instead.

carljm · 2023-04-05T00:20:39Z

@ericvsmith considering my comment above, do you still feel that this should be updated to expose a list of atomic types as an attribute of the copy module?

Misc/NEWS.d/next/Library/2023-03-24-20-49-48.gh-issue-103000.6eVNZI.rst

AlexWaygood

This LGTM, other than my small nit about the NEWS entry.

Given that I don't see other similar uses of deepcopy in the stdlib that would benefit from this, I'm not sure it's worth exposing a new public attribute for this on the copy module? Duplication seems OK here. But I'm also not opposed to making the change in copy to expose it.

I agree with @carljm here. I think a large part of this PR is about avoiding the inner loop in _asdict_inner and _astuple_inner, rather than any overhead in the copy module. It's true that exposing the _ATOMIC_TYPES list in the copy module might mean that other code could then add similar optimisations more easily. But I think that can be considered separately -- I'd advocate merging this now, and then considering whether to enhance the copy module API in a followup issue.

ericvsmith · 2023-04-10T21:14:22Z

@carljm : I'm okay with just leaving it in dataclasses, at least for starters. We can always move it in the future if needed. Feel free to merge this if you have the time and inclination.

…eVNZI.rst

AlexWaygood · 2023-04-10T21:50:21Z

The docs failures are a known issue, and are unrelated to this PR.

Thanks @DavidCEllis, this is a great speedup!

…python#103005) Co-authored-by: Carl Meyer <carl@oddbird.net> Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>

DavidCEllis added 6 commits March 20, 2023 17:41

Special case 'atomic' types that are not deepcopied

0af5e1d

Special case dict_factory=dict

2c76ef8

Add comment explaining _ATOMIC_TYPES

c58017c

Merge branch 'python:main' into faster_dataclasses_serialize

8fdbdb2

Merge branch 'python:main' into faster_dataclasses_serialize

4730a13

Reorder atomic types and clarify intention

3a650a1

DavidCEllis requested a review from ericvsmith as a code owner March 24, 2023 15:11

bedevere-bot mentioned this pull request Mar 24, 2023

Dataclasses - Improve the performance of asdict/astuple for common types and default values #103000

Closed

bedevere-bot added the awaiting review label Mar 24, 2023

AlexWaygood requested a review from carljm March 24, 2023 16:15

AlexWaygood added type-feature A feature request or enhancement performance Performance or resource usage stdlib Python modules in the Lib dir 3.12 bugs and security fixes labels Mar 24, 2023

AlexWaygood reviewed Mar 24, 2023

View reviewed changes

Lib/dataclasses.py Outdated Show resolved Hide resolved

Lib/dataclasses.py Outdated Show resolved Hide resolved

Lib/dataclasses.py Outdated Show resolved Hide resolved

carljm reviewed Mar 24, 2023

View reviewed changes

Lib/dataclasses.py Outdated Show resolved Hide resolved

Lib/dataclasses.py Outdated Show resolved Hide resolved

Lib/dataclasses.py Outdated Show resolved Hide resolved

carljm requested changes Mar 24, 2023

View reviewed changes

bedevere-bot added awaiting changes and removed awaiting review labels Mar 24, 2023

DavidCEllis and others added 5 commits March 24, 2023 18:09

Update Lib/dataclasses.py

b6d8a6f

Improve comment describing _ATOMIC_TYPES Co-authored-by: Carl Meyer <carl@oddbird.net>

Update Lib/dataclasses.py

bf320be

Remove comment referring to weakref Co-authored-by: Carl Meyer <carl@oddbird.net>

Convert _ATOMIC_TYPES to frozenset

6280f2e

Correction - complex and bytes are not JSON serializable (by default …

7343fe6

…by the python stdlib json module).

Remove dict special case for now

b77c17e

DavidCEllis added 2 commits March 24, 2023 20:25

Revert function call skip for _asdict_inner

0209f3c

Revert function call skip for _astuple_inner

986371c

AlexWaygood reviewed Mar 24, 2023

View reviewed changes

DavidCEllis changed the title ~~gh-103000: Optimise dataclasses asdict/astuple for common types and the default dict_factory~~ gh-103000: Optimise dataclasses asdict/astuple for common types Mar 24, 2023

📜🤖 Added by blurb_it.

801af82

bedevere-bot added awaiting change review and removed awaiting changes labels Mar 24, 2023

bedevere-bot requested a review from carljm March 24, 2023 21:20

carljm approved these changes Mar 27, 2023

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting change review labels Mar 27, 2023

AlexWaygood reviewed Apr 5, 2023

View reviewed changes

Misc/NEWS.d/next/Library/2023-03-24-20-49-48.gh-issue-103000.6eVNZI.rst Outdated Show resolved Hide resolved

AlexWaygood approved these changes Apr 5, 2023

View reviewed changes

Update Misc/NEWS.d/next/Library/2023-03-24-20-49-48.gh-issue-103000.6…

903b2fb

…eVNZI.rst

AlexWaygood merged commit d034590 into python:main Apr 10, 2023

bedevere-bot removed the awaiting merge label Apr 10, 2023

DavidCEllis deleted the faster_dataclasses_serialize branch April 10, 2023 22:07

warsaw pushed a commit to warsaw/cpython that referenced this pull request Apr 11, 2023

pythongh-103000: Optimise dataclasses asdict/astuple for common types (…

dd85475

…python#103005) Co-authored-by: Carl Meyer <carl@oddbird.net> Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>

aisk pushed a commit to aisk/cpython that referenced this pull request Apr 18, 2023

pythongh-103000: Optimise dataclasses asdict/astuple for common types (…

96fdaa7

…python#103005) Co-authored-by: Carl Meyer <carl@oddbird.net> Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>

eendebakpt mentioned this pull request Jan 18, 2024

gh-114264: Optimize performance of copy.deepcopy by adding a fast path for atomic types #114266

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-103000: Optimise dataclasses asdict/astuple for common types #103005

gh-103000: Optimise dataclasses asdict/astuple for common types #103005

DavidCEllis commented Mar 24, 2023 •

edited by bedevere-bot

Loading

carljm left a comment

carljm commented Mar 24, 2023

carljm left a comment

bedevere-bot commented Mar 24, 2023

DavidCEllis commented Mar 24, 2023

AlexWaygood commented Mar 24, 2023 •

edited

Loading

AlexWaygood commented Mar 24, 2023 •

edited

Loading

carljm commented Mar 24, 2023

AlexWaygood left a comment

DavidCEllis commented Mar 24, 2023

DavidCEllis commented Mar 24, 2023

bedevere-bot commented Mar 24, 2023

ericvsmith commented Mar 24, 2023

DavidCEllis commented Mar 24, 2023

carljm commented Mar 27, 2023

carljm left a comment

carljm commented Apr 5, 2023

AlexWaygood left a comment •

edited

Loading

ericvsmith commented Apr 10, 2023

AlexWaygood commented Apr 10, 2023

gh-103000: Optimise dataclasses asdict/astuple for common types #103005

gh-103000: Optimise dataclasses asdict/astuple for common types #103005

Conversation

DavidCEllis commented Mar 24, 2023 • edited by bedevere-bot Loading

carljm left a comment

Choose a reason for hiding this comment

carljm commented Mar 24, 2023

carljm left a comment

Choose a reason for hiding this comment

bedevere-bot commented Mar 24, 2023

DavidCEllis commented Mar 24, 2023

AlexWaygood commented Mar 24, 2023 • edited Loading

AlexWaygood commented Mar 24, 2023 • edited Loading

carljm commented Mar 24, 2023

AlexWaygood left a comment

Choose a reason for hiding this comment

DavidCEllis commented Mar 24, 2023

DavidCEllis commented Mar 24, 2023

bedevere-bot commented Mar 24, 2023

ericvsmith commented Mar 24, 2023

DavidCEllis commented Mar 24, 2023

carljm commented Mar 27, 2023

carljm left a comment

Choose a reason for hiding this comment

carljm commented Apr 5, 2023

AlexWaygood left a comment • edited Loading

Choose a reason for hiding this comment

ericvsmith commented Apr 10, 2023

AlexWaygood commented Apr 10, 2023

DavidCEllis commented Mar 24, 2023 •

edited by bedevere-bot

Loading

AlexWaygood commented Mar 24, 2023 •

edited

Loading

AlexWaygood commented Mar 24, 2023 •

edited

Loading

AlexWaygood left a comment •

edited

Loading