-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: allow multiple rolling quantiles in one pass #12093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Dataframe/Series.quantile has this behavior so not w/o precedent |
would return a Dataframe not a list of Series |
My humbled view is unless this is a vectorized routine that is materially faster than a python list comprehension, it's muddies the API without that much benefit. A list comprehension really isn't that verbose: |
yes this would only need one pass of the data, so much faster than list comp |
Then 👍. That's not what |
Yes in the rolling version the skip list construction can be shared. I will probably work on switching to the C implementation next, and see if I can squeeze this change in as well. |
@kawochen ok I'll mark this as a perf enhancer, implementing directly in the cython would be best. |
I had a hack on this one as I was doing something similar, it was a little trickier getting the rolling args through and I've basically just hacked things until it ran for now. I'm sure someone will know a better way. I haven't tested this for correctness or anything yet ... just wanted to see if this was worth pursuing before putting any real time into into it. Timings are not a huge speed up so far. Maybe 5x. Not sure if that's worth the complexity or not. Branch is here: https://github.com/cottrell/pandas/tree/roll_quantile In [20]: s = pd.Series(np.random.randn(100000))
In [21]: n = 10
In [22]: q = np.linspace(1./n, 1 - 1./n, n - 1)
In [23]: r = s.rolling(window=100)
In [24]: %time a = pd.DataFrame({qq: r.quantile(qq) for qq in q})
CPU times: user 333 ms, sys: 13.9 ms, total: 347 ms
Wall time: 347 ms
In [25]: %time b = r.quantile(q)
CPU times: user 97.9 ms, sys: 7.96 ms, total: 106 ms
Wall time: 106 ms
In [26]: r = s.rolling(window=1000)
In [27]: %time a = pd.DataFrame({qq: r.quantile(qq) for qq in q})
CPU times: user 477 ms, sys: 9.32 ms, total: 486 ms
Wall time: 486 ms
In [28]: %time b = r.quantile(q)
CPU times: user 148 ms, sys: 0 ns, total: 148 ms
Wall time: 156 ms
In [29]: n = 100
In [30]: q = np.linspace(1./n, 1 - 1./n, n - 1)
In [31]: %time a = pd.DataFrame({qq: r.quantile(qq) for qq in q})
CPU times: user 5.33 s, sys: 41.7 ms, total: 5.38 s
Wall time: 5.38 s
In [32]: %time b = r.quantile(q)
CPU times: user 1.25 s, sys: 25.6 ms, total: 1.28 s
Wall time: 1.28 s |
Looks like this never got much traction so closing. Happy to reopen if there renewed interest |
Currently,
rolling_quantile()
accepts only a single float for thequantile
argument. I find myself wanting to compute multiple quantiles over the same data. Instead of doing three calls torolling_quantile()
, I'd like to be able to callrolling_quantile()
once with a sequence of floats as thequantile
argument and get back a list of results. This has benefits both in terms of code conciseness and efficiency.This suggested behavior would be analogous to how np.percentile works. http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.percentile.html
Currently:
Desired enhancement:
The text was updated successfully, but these errors were encountered: