Skip to content

ENH: allow multiple rolling quantiles in one pass #12093

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
StephenKappel opened this issue Jan 19, 2016 · 9 comments
Closed

ENH: allow multiple rolling quantiles in one pass #12093

StephenKappel opened this issue Jan 19, 2016 · 9 comments
Labels
Enhancement Performance Memory or execution speed performance quantile quantile method Window rolling, ewma, expanding

Comments

@StephenKappel
Copy link
Contributor

Currently, rolling_quantile() accepts only a single float for the quantile argument. I find myself wanting to compute multiple quantiles over the same data. Instead of doing three calls to rolling_quantile(), I'd like to be able to call rolling_quantile() once with a sequence of floats as the quantile argument and get back a list of results. This has benefits both in terms of code conciseness and efficiency.

This suggested behavior would be analogous to how np.percentile works. http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.percentile.html

Currently:

>>> import pandas as pd
>>> ser = pd.Series(np.array([1,2,3,4,5,6,7,8,9]))
>>> ser.rolling(window=3).quantile(quantile=0.5)
0   NaN
1   NaN
2     2
3     3
4     4
5     5
6     6
7     7
8     8
dtype: float64

Desired enhancement:

>>> ser.rolling(window=3).quantile(quantile=[0.25,0.5,1])
[0   NaN
1   NaN
2     1
3     2
4     3
5     4
6     5
7     6
8     7
dtype: float64, 0   NaN
1   NaN
2     2
3     3
4     4
5     5
6     6
7     7
8     8
dtype: float64, 0   NaN
1   NaN
2     3
3     4
4     5
5     6
6     7
7     8
8     9
dtype: float64]
@jreback
Copy link
Contributor

jreback commented Jan 19, 2016

Dataframe/Series.quantile has this behavior so not w/o precedent

@jreback
Copy link
Contributor

jreback commented Jan 19, 2016

would return a Dataframe not a list of Series

@max-sixty
Copy link
Contributor

My humbled view is unless this is a vectorized routine that is materially faster than a python list comprehension, it's muddies the API without that much benefit.

A list comprehension really isn't that verbose:
pd.DataFrame({q:ser.rolling(window=3).quantile(quantile=q) for q in [0.25,0.5,1])}

@kawochen
Copy link
Contributor

yes this would only need one pass of the data, so much faster than list comp

@max-sixty
Copy link
Contributor

yes this would only need one pass of the data, so much faster than list comp

Then 👍. That's not what quantile does now FWIW: https://github.com/pydata/pandas/blob/723a147a4a197960d4a9da01dd5f107d59aafc46/pandas/core/series.py#L1313

@kawochen
Copy link
Contributor

Yes in the rolling version the skip list construction can be shared. I will probably work on switching to the C implementation next, and see if I can squeeze this change in as well.

@jreback
Copy link
Contributor

jreback commented Jan 20, 2016

@kawochen ok I'll mark this as a perf enhancer, implementing directly in the cython would be best.

@jreback jreback added Performance Memory or execution speed performance Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Difficulty Advanced labels Jan 20, 2016
@jreback jreback added this to the Next Major Release milestone Jan 20, 2016
@jbrockmendel jbrockmendel added quantile quantile method and removed Effort Medium labels Oct 21, 2019
@mroeschke mroeschke added Window rolling, ewma, expanding Enhancement and removed Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Apr 5, 2020
@cottrell
Copy link
Contributor

I had a hack on this one as I was doing something similar, it was a little trickier getting the rolling args through and I've basically just hacked things until it ran for now. I'm sure someone will know a better way. I haven't tested this for correctness or anything yet ... just wanted to see if this was worth pursuing before putting any real time into into it.

Timings are not a huge speed up so far. Maybe 5x. Not sure if that's worth the complexity or not.

Branch is here: https://github.com/cottrell/pandas/tree/roll_quantile

In [20]: s = pd.Series(np.random.randn(100000))

In [21]: n = 10

In [22]: q = np.linspace(1./n, 1 - 1./n, n - 1)

In [23]: r = s.rolling(window=100)

In [24]: %time a = pd.DataFrame({qq: r.quantile(qq) for qq in q})
CPU times: user 333 ms, sys: 13.9 ms, total: 347 ms
Wall time: 347 ms

In [25]: %time b = r.quantile(q)
CPU times: user 97.9 ms, sys: 7.96 ms, total: 106 ms
Wall time: 106 ms

In [26]: r = s.rolling(window=1000)

In [27]: %time a = pd.DataFrame({qq: r.quantile(qq) for qq in q})
CPU times: user 477 ms, sys: 9.32 ms, total: 486 ms
Wall time: 486 ms

In [28]: %time b = r.quantile(q)
CPU times: user 148 ms, sys: 0 ns, total: 148 ms
Wall time: 156 ms

In [29]: n = 100

In [30]: q = np.linspace(1./n, 1 - 1./n, n - 1)

In [31]: %time a = pd.DataFrame({qq: r.quantile(qq) for qq in q})
CPU times: user 5.33 s, sys: 41.7 ms, total: 5.38 s
Wall time: 5.38 s

In [32]: %time b = r.quantile(q)
CPU times: user 1.25 s, sys: 25.6 ms, total: 1.28 s
Wall time: 1.28 s

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@mroeschke
Copy link
Member

mroeschke commented Aug 25, 2024

Looks like this never got much traction so closing. Happy to reopen if there renewed interest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Performance Memory or execution speed performance quantile quantile method Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants