Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: pandas rolling_quantile does not use interpolation #9413

Closed
leo4183 opened this issue Feb 4, 2015 · 6 comments · Fixed by #16247
Closed

BUG: pandas rolling_quantile does not use interpolation #9413

leo4183 opened this issue Feb 4, 2015 · 6 comments · Fixed by #16247
Labels
API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@leo4183
Copy link

leo4183 commented Feb 4, 2015

i recently bumped an unexpected issue with pandas rolling funcs. rolling_quantile for example:

>> row = 10
>> col = 5
>> idx = pd.date_range(20100101,periods=row,freq='B')
>> a = pd.DataFrame(np.random.rand(row*col).reshape((row,-1)),index=idx)
>> a
                   0           1           2           3           4
2010-01-01  0.341434    0.497274    0.596341    0.259909    0.872207
2010-01-04  0.222653    0.056723    0.064019    0.936307    0.785647
2010-01-05  0.179067    0.647165    0.931266    0.557698    0.713282
2010-01-06  0.049766    0.259756    0.945736    0.380948    0.282667
2010-01-07  0.385036    0.517609    0.575958    0.050758    0.850735
2010-01-08  0.628169    0.510453    0.325973    0.263361    0.444959
2010-01-11  0.099133    0.976571    0.602235    0.181185    0.506316
2010-01-12  0.987344    0.902289    0.080000    0.254695    0.753325
2010-01-13  0.759198    0.014548    0.139858    0.822900    0.251972
2010-01-14  0.404149    0.349788    0.038714    0.280568    0.197865

>> a.quantile([0.25,0.5,0.75],axis=0)
               0           1           2           3           4
0.25    0.189963    0.282264    0.094964    0.255999    0.323240
0.50    0.363235    0.503864    0.450966    0.271964    0.609799
0.75    0.572164    0.614776    0.600761    0.513510    0.777567

>> np.percentile(a,[25,50,75],axis=0)
[array([ 0.18996316,  0.28226404,  0.09496441,  0.25599853,  0.32323997]),
 array([ 0.36323529,  0.50386356,  0.45096554,  0.27196429,  0.60979881]),
 array([ 0.57216415,  0.61477607,  0.6007611 ,  0.51351021,  0.7775667 ])]

>> pd.rolling_quantile(a,row,0.25).tail(1)
                   0           1       2           3           4
2010-01-14  0.179067    0.259756    0.08    0.254695    0.282667

looks like pandas.DataFrame.quantile member func is consistent with the numpy.percentile func. however the pandas.rolling_quantile func returns diff results. reduce the row number to 5, the problem will be gone (all three methods return the same results). any thoughts?

ps: i also tested rolling_std func which will "randomly" generate errors with 10^-7 ~ 10^-8 scales (compared to pandas.DataFrame std member func or numpy/scipy std funcs which could limit the error close to np.spacing(1) level) for long (row-wise) pandas.DataFrames

python environment:

python 3.4.2
cython 0.21.1
numpy 1.8.2
scipy 0.14.0
pandas 0.15.1
statsmodels 0.6.0

@shoyer
Copy link
Member

shoyer commented Feb 5, 2015

It looks like the difference here is that quantile and percentile take the weighted average of the nearest points, whereas rolling_quantile simply uses one the nearest point (no averaging):

In [35]: a = pd.DataFrame(np.arange(row*col).reshape((row,-1)))

In [36]: a
Out[36]:
    0   1   2   3   4
0   0   1   2   3   4
1   5   6   7   8   9
2  10  11  12  13  14
3  15  16  17  18  19
4  20  21  22  23  24
5  25  26  27  28  29
6  30  31  32  33  34
7  35  36  37  38  39
8  40  41  42  43  44
9  45  46  47  48  49

In [37]: a.quantile([0.25,0.5,0.75],axis=0)
Out[37]:
          0      1      2      3      4
0.25  11.25  12.25  13.25  14.25  15.25
0.50  22.50  23.50  24.50  25.50  26.50
0.75  33.75  34.75  35.75  36.75  37.75

In [38]: pd.rolling_quantile(a,row,0.5).tail(1)
Out[38]:
    0   1   2   3   4
9  20  21  22  23  24

So yes, I think we can consider this a bug. rolling_median for example, gets this right:

In [40]: pd.rolling_median(a,row).tail(1)
Out[40]:
      0     1     2     3     4
9  22.5  23.5  24.5  25.5  26.5

@leo4183 Thanks for the report! Please raise another issue for rolling_std (though I suspect the precision issues are known, if perhaps not well documented).

@shoyer shoyer changed the title pandas rolling function issues BUG: pandas rolling_quantile does not use interpolation Feb 5, 2015
@shoyer shoyer added the Bug label Feb 5, 2015
@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design and removed Bug labels Feb 5, 2015
@jreback
Copy link
Contributor

jreback commented Feb 5, 2015

I have xref #8659 there are a number of related issued w.r.t. rolling functions. Pull requests are welcome!

@jreback jreback added this to the 0.17.0 milestone Feb 5, 2015
@leo4183
Copy link
Author

leo4183 commented Feb 5, 2015

thx @jreback . as suggested by @shoyer , i created another ticket for rolling_std #9420

@queise
Copy link

queise commented Mar 27, 2016

This issue still happens with the new rolling() objects (pandas 0.18.0).
Following the same example used above by @leo4183 , the quantile() function does not interpolate:
a.rolling(row).quantile(0.25).tail(1)
and thus gives different results than:
np.percentile(a,25,axis=0)
(except for example for row=5 or row=25, when no interpolation is needed).

A workaround is to apply the numpy function, which works for any number in 'row'.
a.rolling(row).apply(func=np.percentile, args=(25,)).tail(1)

On the other hand, median always works:
a.rolling(row).median().tail(1)
that is, it gives the same results as:
np.percentile(a,50,axis=0)

A similar issue may be also happening with the functions std() and var(), which also give different results than the numpy equivalents.

@jreback
Copy link
Contributor

jreback commented Mar 27, 2016

@queise this issue is still open and the impl has not changed

pull requests to fix are welcome

@queise
Copy link

queise commented Mar 27, 2016

thank you @jreback for your quick reply. I'll take a look and try my best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
4 participants