ASV: reduce overall run time for GroupByMethods benchmarks #44604

jorisvandenbossche · 2021-11-24T14:01:24Z

This reduces 1) the data size for describe, mad and skew (all the slowest benchmarks that take more than a second is for one of those three), and 2) reduces the parametrization for ncols (this was added in #42841 to also benchmark the case of multiple columns (block-wise), but so I think by keeping two options (single col / multiple col) that aspect should still be captured adequately).

This reduces the total runtime for this single class of benchmarks from 18min to 3min (and the --quick runtime (as used on CI) from 4min to 10s. Note this is only the sum of the actual (repeated) timings, so not including the setup and other overhead of running the benchmarks).

jbrockmendel · 2021-11-24T21:59:10Z

asv_bench/benchmarks/groupby.py

@@ -464,7 +465,12 @@ def setup(self, dtype, method, application, ncols):
            # DataFrameGroupBy doesn't have these methods
            raise NotImplementedError

-        ngroups = 1000
+        if method == "describe" and ncols == 5:


could do this without the ncols check?

im imagining looking at results and seeing the ncols==5 case being faster than the ncols==1 case (or just much less than 5x slower) and being confused until i remember this special case

i agree or add a comment here

Yes, certainly (I initially used ngroups = 100 for each of the three methods for both ncols parameters, but then only describe for ncols=5 was still on the slow side.
Using the smaller ngroups for both cases of describe will make the ncols=1 still faster, but that can't hurt I suppose.

ASV: reduce overall run time for GroupByMethods benchmarks

48c6ce6

jorisvandenbossche added the Benchmark Performance (ASV) benchmarks label Nov 24, 2021

jorisvandenbossche added this to the 1.4 milestone Nov 24, 2021

jorisvandenbossche requested a review from jbrockmendel November 24, 2021 14:01

jbrockmendel reviewed Nov 24, 2021

View reviewed changes

ngroups=20 for all describe

41c810d

jreback merged commit c8a0804 into pandas-dev:master Nov 25, 2021

jorisvandenbossche deleted the asv-reduce-groupby branch November 26, 2021 06:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ASV: reduce overall run time for GroupByMethods benchmarks #44604

ASV: reduce overall run time for GroupByMethods benchmarks #44604

Uh oh!

jorisvandenbossche commented Nov 24, 2021 •

edited

Loading

Uh oh!

jbrockmendel Nov 24, 2021

Uh oh!

jbrockmendel Nov 24, 2021

Uh oh!

jreback Nov 25, 2021

Uh oh!

jorisvandenbossche Nov 25, 2021

Uh oh!

Uh oh!

Uh oh!

ASV: reduce overall run time for GroupByMethods benchmarks #44604

ASV: reduce overall run time for GroupByMethods benchmarks #44604

Uh oh!

Conversation

jorisvandenbossche commented Nov 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbrockmendel Nov 24, 2021

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Nov 24, 2021

Choose a reason for hiding this comment

Uh oh!

jreback Nov 25, 2021

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Nov 25, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jorisvandenbossche commented Nov 24, 2021 •

edited

Loading