You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When doing profiling for #51722 I found a number of cases where operating group-by-group performed better than our cython implementation. The group-by-group iteration is expensive, which suggests that the non-iteration portion of that call must be performant. That would go through DataFrame.quantile, which would go through np.percentile (in core.array_algos.quantile). This suggests that the np.percentile implementation may be doing something that we should try to port to group_quantile.
Copy/pasting from my notes-to-self at the time
- Investigate numpy's percentile code
- Our nanmedian does casting and type inference in a way I think is unnecessary
- Profiling groupby.quantile (xref https://github.com/pandas-dev/pandas/pull/51722) suggests that numpy's percentile may just be much more performant than what we have
- https://github.com/numpy/numpy/blob/v1.24.0/numpy/lib/function_base.py#L3920-L4206
- https://github.com/numpy/numpy/blob/v1.24.0/numpy/lib/function_base.py#L3774-L3857
The text was updated successfully, but these errors were encountered:
When doing profiling for #51722 I found a number of cases where operating group-by-group performed better than our cython implementation. The group-by-group iteration is expensive, which suggests that the non-iteration portion of that call must be performant. That would go through DataFrame.quantile, which would go through np.percentile (in core.array_algos.quantile). This suggests that the np.percentile implementation may be doing something that we should try to port to group_quantile.
Copy/pasting from my notes-to-self at the time
The text was updated successfully, but these errors were encountered: