Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-98169 dataclasses.astuple support DefaultDict #98170

Merged
merged 5 commits into from
Mar 13, 2023
Merged

Conversation

kwsp
Copy link
Contributor

@kwsp kwsp commented Oct 11, 2022

Basically applying the same fix for asdict from #32056 to astuple. These two should've been the same PR, but I didn't know about astuple previously.

Also modified the handling of DefaultDict in asdict a little so that we only call isinstance(obj, dict) once. This results in a ~3% speed up on my machine on this example:

# benchmark
from dataclasses import dataclass, asdict
from collections import defaultdict
from typing import DefaultDict, Dict, List, NamedTuple
import time


class NT(NamedTuple):
    x: int
    y: float

@dataclass
class C:
    dd: DefaultDict[str, List]
    nt: NT
    d1: Dict[str, List]
    d2: Dict[str, int]

def get_instance():
    instance = C(dd=defaultdict(list), nt=NT(1, 1.), d1={}, d2={})
    instance.dd["x"].append(12)
    instance.dd["x"].append(13)
    instance.d1["y"] = [1, 2, 3]
    instance.d2["z"] = 1
    return instance

instance = get_instance()

def timeit(callable, n_runs = 100_000, name=""):
    t1 = time.perf_counter_ns()
    for _ in range(n_runs):
        callable()

    elapsed_ms = round(1e-6 * (time.perf_counter_ns() - t1), 2)
    print(f"{name}\t{n_runs} runs: {elapsed_ms} ms")

timeit(lambda: asdict(instance), name="asdict")
timeit(lambda: astuple(instance), name="astuple")

Lib/dataclasses.py Show resolved Hide resolved
Lib/dataclasses.py Outdated Show resolved Hide resolved
@eendebakpt
Copy link
Contributor

@kwsp Can you address the review comments?

cache the output of `type(obj)` to avoid computing it 3 times.

Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
@kwsp
Copy link
Contributor Author

kwsp commented Mar 13, 2023

@eendebakpt sorry I missed the notification. Addressed your change.

Could you please comment on my response to if instanceof(obj, defaultdict)?

@eendebakpt
Copy link
Contributor

@eendebakpt sorry I missed the notification. Addressed your change.

Could you please comment on my response to if instanceof(obj, defaultdict)?

@kwsp I cannot see the comment

Lib/dataclasses.py Outdated Show resolved Hide resolved
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
@kwsp
Copy link
Contributor Author

kwsp commented Mar 13, 2023

#98170 (comment)

I responded:

Good suggestion. I did this originally to be consistent with the existing code that checks for named tuples, and to avoid importing the colelctions module.

elif isinstance(obj, tuple) and hasattr(obj, '_fields'):

I originally assumed that for the namedtuple case, they could've done a is instance(obj, namedtuple) but didn't to avoid an import from from collections import namedtuple, but I just realised is instance(obj, namedtuple) wouldn't work because namedtuple is a factory function...

So now the question is should we import collections in this module? I assume it's imported elsewhere during Python startup so adding the import here should be free right?

@eendebakpt
Copy link
Contributor

I do not now the origins of the hasattr check, but I agree the import time of collections should not be an issue. But let's keep the code consistent for now and keep the hasattr just as in the asdict code.

Copy link
Member

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it would probably be better to just do isinstance(obj, defaultdict) in both of these cases, but I also think that's a bit out of scope for this PR. The code here LGTM, seems like a straightforward extension of the feature that was already merged for asdict.

@carljm carljm merged commit 71e37d9 into python:main Mar 13, 2023
carljm added a commit to carljm/cpython that referenced this pull request Mar 14, 2023
* main: (50 commits)
  pythongh-102674: Remove _specialization_stats from Lib/opcode.py (python#102685)
  pythongh-102660: Handle m_copy Specially for the sys and builtins Modules (pythongh-102661)
  pythongh-102354: change python3 to python in docs examples (python#102696)
  pythongh-81057: Add a CI Check for New Unsupported C Global Variables (pythongh-102506)
  pythonGH-94851: check unicode consistency of static strings in debug mode (python#102684)
  pythongh-100315: clarification to `__slots__` docs. (python#102621)
  pythonGH-100227: cleanup initialization of global interned dict (python#102682)
  doc: Remove a duplicate 'versionchanged' in library/asyncio-task (pythongh-102677)
  pythongh-102013: Add PyUnstable_GC_VisitObjects (python#102014)
  pythonGH-102670: Use sumprod() to simplify, speed up, and improve accuracy of statistics functions (pythonGH-102649)
  pythongh-102627: Replace address pointing toward malicious web page (python#102630)
  pythongh-98831: Use DECREF_INPUTS() more (python#102409)
  pythongh-101659: Avoid Allocation for Shared Exceptions in the _xxsubinterpreters Module (pythongh-102659)
  pythongh-101524: Fix the ChannelID tp_name (pythongh-102655)
  pythongh-102069: Fix `__weakref__` descriptor generation for custom dataclasses (python#102075)
  pythongh-98169 dataclasses.astuple support DefaultDict (python#98170)
  pythongh-102650: Remove duplicate include directives from multiple source files (python#102651)
  pythonGH-100987: Don't cache references to the names and consts array in `_PyEval_EvalFrameDefault`. (python#102640)
  pythongh-87092: refactor assemble() to a number of separate functions, which do not need the compiler struct (python#102562)
  pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (python#102631)
  ...
Fidget-Spinner pushed a commit to Fidget-Spinner/cpython that referenced this pull request Mar 27, 2023
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
warsaw pushed a commit to warsaw/cpython that referenced this pull request Apr 11, 2023
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants