Skip to content

PERF: optimize Block.getitem_block #34978

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 25, 2020

Conversation

jbrockmendel
Copy link
Member

Performance comparison is based on the the asv groupby.Apply.time_scalar_function_single_col, which is the one in which disabling the libreduction path has the biggest impact.

import pandas as pd
import numpy as np

N = 10 ** 4
labels = np.random.randint(0, 2000, size=N)
labels2 = np.random.randint(0, 3, size=N)
df = pd.DataFrame(
    {
        "key": labels,
        "key2": labels2,
        "value1": np.random.randn(N),
        "value2": ["foo", "bar", "baz", "qux"] * (N // 4),
    }
)

In [4]: %prun -s cumtime df.groupby("key").apply(lambda x: 1) 

master-but-with-fast_apply-disabled:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.081    0.081 groupby.py:822(apply)
        1    0.000    0.000    0.081    0.081 groupby.py:871(_python_apply_general)
        1    0.006    0.006    0.080    0.080 ops.py:157(apply)
     1989    0.002    0.000    0.069    0.000 ops.py:933(__iter__)
     1988    0.003    0.000    0.066    0.000 ops.py:966(_chop)
     1988    0.004    0.000    0.059    0.000 managers.py:724(get_slice)
     1988    0.002    0.000    0.035    0.000 managers.py:730(<listcomp>)
     5964    0.009    0.000    0.033    0.000 blocks.py:283(getitem_block)
     5971    0.005    0.000    0.021    0.000 blocks.py:247(make_block_same_class)
     5974    0.007    0.000    0.013    0.000 blocks.py:115(__init__)
     1988    0.003    0.000    0.011    0.000 base.py:4064(__getitem__)
     1990    0.003    0.000    0.008    0.000 managers.py:120(__init__)
     1988    0.002    0.000    0.007    0.000 numeric.py:105(_shallow_copy)
     1992    0.002    0.000    0.007    0.000 blocks.py:2379(__init__)
     1988    0.002    0.000    0.005    0.000 base.py:485(_shallow_copy)
     1990    0.002    0.000    0.004    0.000 frame.py:432(__init__)
     1992    0.002    0.000    0.003    0.000 base.py:450(_simple_new)

PR-but-with-fast_apply-disabled

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.061    0.061 groupby.py:822(apply)
        1    0.000    0.000    0.061    0.061 groupby.py:871(_python_apply_general)
        1    0.005    0.005    0.058    0.058 ops.py:157(apply)
     1991    0.002    0.000    0.048    0.000 ops.py:933(__iter__)
     1990    0.003    0.000    0.046    0.000 ops.py:966(_chop)
     1990    0.004    0.000    0.039    0.000 managers.py:724(get_slice)
     1990    0.002    0.000    0.017    0.000 managers.py:730(<listcomp>)
     5970    0.009    0.000    0.015    0.000 blocks.py:297(getitem_block)
     1990    0.002    0.000    0.011    0.000 base.py:4064(__getitem__)
     1992    0.003    0.000    0.007    0.000 managers.py:120(__init__)
     1990    0.002    0.000    0.007    0.000 numeric.py:105(_shallow_copy)
     1990    0.002    0.000    0.005    0.000 base.py:485(_shallow_copy)
     1992    0.002    0.000    0.004    0.000 frame.py:432(__init__)
     5970    0.002    0.000    0.003    0.000 blocks.py:116(_simple_new)
     1994    0.002    0.000    0.003    0.000 base.py:450(_simple_new)
     1992    0.001    0.000    0.002    0.000 managers.py:126(<listcomp>)
    22438    0.002    0.000    0.002    0.000 {built-in method builtins.isinstance}

master

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.009    0.009 groupby.py:822(apply)
        1    0.000    0.000    0.009    0.009 groupby.py:871(_python_apply_general)
        1    0.000    0.000    0.008    0.008 ops.py:157(apply)
        1    0.000    0.000    0.005    0.005 ops.py:961(fast_apply)
        1    0.003    0.003    0.005    0.005 {pandas._libs.reduction.apply_frame_axis0}
     1994    0.001    0.000    0.002    0.000 base.py:4064(__getitem__)
        1    0.000    0.000    0.001    0.001 ops.py:135(_get_splitter)
        1    0.000    0.000    0.001    0.001 ops.py:268(group_info)
        1    0.000    0.000    0.001    0.001 generic.py:1206(_wrap_applied_output)
        1    0.000    0.000    0.001    0.001 ops.py:285(_get_compressed_codes)

@jreback jreback added Internals Related to non-user accessible pandas implementation Performance Memory or execution speed performance labels Jun 24, 2020
@jreback jreback added this to the 1.1 milestone Jun 25, 2020
@@ -16,6 +16,7 @@ cnp.import_array()
from pandas._libs.algos import ensure_int64


@cython.final
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these actually make a diff?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i havent measured this independently, so not sure how much it matters in this case (tried it for some Timestamp methods and got a decent boost). it allows cython to do some inlining, at the cost of disallowing subclassing.

@jreback jreback merged commit 4ffd1f1 into pandas-dev:master Jun 25, 2020
@jbrockmendel jbrockmendel deleted the perf-getitem_block branch June 25, 2020 15:25
fangchenli pushed a commit to fangchenli/pandas that referenced this pull request Jun 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants