Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: groupby(...).__len__ #57595

Merged
merged 2 commits into from
Feb 24, 2024
Merged

Conversation

rhshadrach
Copy link
Member

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Enabled by #55738. Prior to that, ngroups was not reliable.

size = 100_000
df = pd.DataFrame(
    {
        "a": np.random.randint(0, 100, size),
        "b": np.random.randint(0, 100, size),
    }
)
%timeit len(df.groupby(["a", "b"]))

# 161 ms ± 962 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)    <-- main
# 4.96 ms ± 229 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  <-- PR

@rhshadrach rhshadrach added Groupby Performance Memory or execution speed performance labels Feb 23, 2024
@rhshadrach rhshadrach added this to the 3.0 milestone Feb 23, 2024
@pytest.fixture(params=[True, False, None])
@pytest.fixture(params=[True, False])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stumbled into this because of the added test. This should have been removed by #57330

@mroeschke mroeschke merged commit 1d70500 into pandas-dev:main Feb 24, 2024
46 of 47 checks passed
@mroeschke
Copy link
Member

Thanks @rhshadrach

@rhshadrach rhshadrach deleted the perf_groupby_len branch February 24, 2024 00:24
pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
* PERF: groupby(...).__len__

* GH#
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants