Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: remove libreduction.apply_frame_axis0 #42992

Merged
merged 3 commits into from
Aug 12, 2021

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Aug 11, 2021

Discussed on call today. Just started an asv run.

@jreback jreback added this to the 1.4 milestone Aug 12, 2021
@jreback
Copy link
Contributor

jreback commented Aug 12, 2021

wow, can you also add a whatsnew note on the change (i guess perf section)

@jbrockmendel
Copy link
Member Author

jbrockmendel commented Aug 12, 2021

time asv continuous -E virtualenv -f 1.01 --record-samples --append-samples master HEAD -b groupby
[...]
       before           after         ratio
     [860ff03e]       [83dacf98]
     <master>       <ref-libreduction>
+      5.27±0.4ms         12.1±1ms     2.29  groupby.Apply.time_scalar_function_single_col(4)
+      16.1±0.8ms         34.8±2ms     2.16  groupby.Apply.time_scalar_function_multi_col(4)
+         141±5μs          146±5μs     1.03  groupby.GroupByMethods.time_dtype_as_group('float', 'head', 'transformation')
+         129±7μs          133±8μs     1.03  groupby.GroupByMethods.time_dtype_as_group('float', 'cumcount', 'transformation')
+         130±5μs          133±6μs     1.03  groupby.GroupByMethods.time_dtype_as_group('float', 'cumcount', 'direct')
+         159±7μs          163±6μs     1.03  groupby.GroupByMethods.time_dtype_as_group('uint', 'cumprod', 'transformation')
+         142±6μs          146±4μs     1.03  groupby.GroupByMethods.time_dtype_as_group('float', 'head', 'direct')
+        61.4±3μs         62.8±4μs     1.02  groupby.GroupByMethods.time_dtype_as_group('float', 'cummax', 'transformation')
+        53.7±3μs         54.9±4μs     1.02  groupby.GroupByMethods.time_dtype_as_field('uint', 'any', 'direct')
+        74.2±7μs         75.9±2μs     1.02  groupby.GroupByMethods.time_dtype_as_group('uint', 'count', 'transformation')
+         195±6μs          199±5μs     1.02  groupby.GroupByMethods.time_dtype_as_group('float', 'sum', 'transformation')
+        94.2±4μs         96.1±3μs     1.02  groupby.GroupByMethods.time_dtype_as_group('uint', 'last', 'transformation')
+        54.8±2μs         55.8±1μs     1.02  groupby.GroupByMethods.time_dtype_as_group('uint', 'any', 'direct')
+        94.0±3μs         95.6±2μs     1.02  groupby.GroupByMethods.time_dtype_as_group('uint', 'last', 'direct')
+         308±9μs         312±20μs     1.01  groupby.GroupByMethods.time_dtype_as_group('uint', 'sem', 'direct')
+         156±5μs         158±10μs     1.01  groupby.GroupByMethods.time_dtype_as_group('float', 'bfill', 'transformation')
-         281±6ms         277±20ms     0.99  groupby.AggEngine.time_dataframe_numba(False)
-        511±10ms         504±20ms     0.99  groupby.AggEngine.time_series_numba(True)
-        511±20ms         503±20ms     0.99  groupby.AggEngine.time_dataframe_numba(True)
-      1.58±0.2ms      1.55±0.03ms     0.98  rolling.ExpandingMethods.time_expanding_groupby('Series', 'float', 'median')
-      1.96±0.2ms      1.92±0.06ms     0.98  groupby.CountMultiInt.time_multi_int_count
-        60.4±1ms       59.1±0.9ms     0.98  groupby.GroupByMethods.time_dtype_as_field('uint', 'skew', 'direct')
-        122±10ms          119±3ms     0.98  groupby.MultiColumn.time_lambda_sum
-        282±10ms         276±10ms     0.98  groupby.AggEngine.time_series_numba(False)
-        20.2±2ms       19.4±0.7ms     0.96  groupby.AggEngine.time_dataframe_cython(False)
-        39.7±2ms         37.8±4ms     0.95  groupby.Apply.time_copy_function_multi_col(5)
-     1.23±0.09ms      1.06±0.04ms     0.86  groupby.FillNA.time_df_bfill
-      1.23±0.1ms      1.03±0.09ms     0.84  groupby.FillNA.time_df_ffill
-        397±10ms          223±7ms     0.56  groupby.Apply.time_copy_overhead_single_col(4)
-      1.20±0.03s         604±30ms     0.50  groupby.Apply.time_copy_function_multi_col(4)

Updated with a larger sample size, looks more reasonable

@jreback jreback merged commit d037ff6 into pandas-dev:master Aug 12, 2021
@jreback
Copy link
Contributor

jreback commented Aug 12, 2021

great thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: groupby fast_apply vs python apply handles same-indexed result differently
2 participants