Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API/ENH: master issue for pd.rolling_apply #8659

Closed
8 of 14 tasks
leeong05 opened this issue Oct 28, 2014 · 6 comments
Closed
8 of 14 tasks

API/ENH: master issue for pd.rolling_apply #8659

leeong05 opened this issue Oct 28, 2014 · 6 comments
Labels
API Design Apply Apply, Aggregate, Transform, Map Enhancement Master Tracker High level tracker for similar issues Window rolling, ewma, expanding

Comments

@leeong05
Copy link

leeong05 commented Oct 28, 2014

Catchall for rolling_* issues:


Hi all,

I intended to apply a function that gives on each day a ranking based on means calculated from previous n-day's data. The natural way is to use pd.rolling_apply. A toy example:

In [93]: df = pd.DataFrame(np.random.randint(10, size=20).reshape(4, 5))

In [94]: df
Out[94]: 
   0  1  2  3  4
0  2  0  0  2  0
1  9  5  5  6  1
2  2  3  6  8  8
3  5  1  2  9  0

In [95]: import bottleneck as bn

In [96]: bn.nanrankdata(df.mean())
Out[96]: array([ 4. ,  1.5,  3. ,  5. ,  1.5])

Up to now, it is cool. Then:

In [97]: pd.rolling_apply(df, 2, lambda x: bn.nanrankdata(bn.nanmean(x, axis=0)))
Out[97]: 
    0   1   2   3   4
0 NaN NaN NaN NaN NaN
1   1   1   1   1   1
2   1   1   1   1   1
3   1   1   1   1   1

This is clearly wrong. Is this a bug?

@jreback
Copy link
Contributor

jreback commented Oct 28, 2014

import pandas as pd
import numpy as np
import bottleneck as bn

df = pd.DataFrame(np.random.randint(10, size=20).reshape(4, 5))

def f(x):
    import pdb; pdb.set_trace()
    result = bn.nanrankdata(bn.nanmean(x, axis=0))
    print result
    return result

pd.rolling_apply(df, 2, f)

rankdata is being passed a scalar (the result of the mean). The input is only 1-d, NOT 2-d

whenever I have something like this I create a function and step thru

@leeong05
Copy link
Author

Thanks. With pdb, the pd.rolling_apply works in this way: i) it works column by column in order ii) on each column, it splits this long 1-D array into small array with size window and apply the function to it.

What I had in mind is that pd.rolling_apply splits the 2-D array into chunks along axis=0 and apply the function to the small 2-D array.

For this specific question, the workaround is:

pd.rolling_apply(df, 2, bn.nanmean).rank(axis=1)

But is the 2-D to 2-D rolling feature not desirable?

@jreback
Copy link
Contributor

jreback commented Oct 29, 2014

you can just do pd.rolling_mean(df,2).rank(axis=1) FYI (bottleneck mean is dispatched if its available on appropriate dtypes anyhow)

yes, I don't think this case is 'handled per se'. But this is really a rolling-grid type of things yes?

It is implemented column by column for simplicity ATM (and uses np.apply_along_axis which is pretty inefficient IMHO).

a couple of open issues w.r.t. rolling_*, see: #4964, #4130, #3185 and I think you want #5071

so yes I would agree that is needed.

like to work on it?

cc @seth-p whom has done quite a bit of work on fixing some of these as well

@jreback jreback added API Design Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Reshaping Concat, Merge/Join, Stack/Unstack, Explode Enhancement labels Oct 29, 2014
@jreback jreback added this to the 0.16.0 milestone Oct 29, 2014
@jreback jreback changed the title Unexpected result using pd.rolling_apply API/ENH: master issue for pd.rolling_apply Oct 29, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@jreback jreback added the Master Tracker High level tracker for similar issues label Mar 6, 2015
@jreback jreback modified the milestones: Next Major Release, High Level Issue Tracking Sep 24, 2017
@TomAugspurger TomAugspurger removed the Master Tracker High level tracker for similar issues label Jul 6, 2018
@TomAugspurger TomAugspurger removed this from the High Level Issue Tracking milestone Jul 6, 2018
@wesm
Copy link
Member

wesm commented Jul 6, 2018

What is the status of this issue?

@jreback jreback added the Master Tracker High level tracker for similar issues label Jul 6, 2018
@jreback
Copy link
Contributor

jreback commented Jul 6, 2018

its just tracking many open issues here

@WillAyd WillAyd added the Window rolling, ewma, expanding label Oct 5, 2018
@mroeschke mroeschke removed the Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff label Apr 14, 2020
@mroeschke mroeschke added Apply Apply, Aggregate, Transform, Map and removed Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Apr 14, 2020
@mroeschke
Copy link
Member

I don't think we need this high level tracking issue. There are not too many windowing issues and most are enhancement requests for other non apply methods. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Apply Apply, Aggregate, Transform, Map Enhancement Master Tracker High level tracker for similar issues Window rolling, ewma, expanding
Projects
None yet
Development

No branches or pull requests

7 participants