Skip to content

PERF: investigate numpy's percentile implementation #55535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jbrockmendel opened this issue Oct 15, 2023 · 0 comments
Open

PERF: investigate numpy's percentile implementation #55535

jbrockmendel opened this issue Oct 15, 2023 · 0 comments
Labels
Enhancement Groupby Performance Memory or execution speed performance

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Oct 15, 2023

When doing profiling for #51722 I found a number of cases where operating group-by-group performed better than our cython implementation. The group-by-group iteration is expensive, which suggests that the non-iteration portion of that call must be performant. That would go through DataFrame.quantile, which would go through np.percentile (in core.array_algos.quantile). This suggests that the np.percentile implementation may be doing something that we should try to port to group_quantile.

Copy/pasting from my notes-to-self at the time

- Investigate numpy's percentile code
	- Our nanmedian does casting and type inference in a way I think is unnecessary
	- Profiling groupby.quantile (xref https://github.com/pandas-dev/pandas/pull/51722) suggests that numpy's percentile may just be much more performant than what we have
	- https://github.com/numpy/numpy/blob/v1.24.0/numpy/lib/function_base.py#L3920-L4206
	- https://github.com/numpy/numpy/blob/v1.24.0/numpy/lib/function_base.py#L3774-L3857
@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 15, 2023
@rhshadrach rhshadrach added Enhancement Groupby Performance Memory or execution speed performance and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Groupby Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

2 participants