Skip to content

rolling_cov and rolling_corr error out when len(series) < min_periods #7764

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
creeson opened this issue Jul 15, 2014 · 11 comments
Closed

rolling_cov and rolling_corr error out when len(series) < min_periods #7764

creeson opened this issue Jul 15, 2014 · 11 comments
Labels
Bug Error Reporting Incorrect or improved errors from pandas Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@creeson
Copy link

creeson commented Jul 15, 2014

I am updating a codebase from an OLD version of pandas (0.7.3) to current version (0.14.1) in python 2.7.

pandas: 0.14.1
numpy: 1.8.0

Not sure if this is properly classified as a bug, or if it is an expected code change. However, it seems rolling_cov / rolling_corr are specifying the window as min(window given, max(len(arg1), len(arg2)), and then erring out if arg1 AND arg2 are shorter than min_periods. Before, we would just get NaNs back.

Here is some test code:

import numpy as np
import pandas as pd

np.random.seed(0)

s1 = pd.Series(np.random.randn(4))
s2 = pd.Series(np.random.randn(4))

print '\nrolling_corr:'
try:
    print pd.rolling_corr(s1, s2, window=10, min_periods=5)
except Exception as inst:
    print type(inst)
    print inst
print '\nrolling_cov:'
try:
    print pd.rolling_cov(s1, s2, window=10, min_periods=5)
except Exception as inst:
    print type(inst)
    print inst

In 0.14.1:

rolling_corr:
<type 'exceptions.ValueError'>
min_periods (5) must be <= window (4)
rolling_cov:
<type 'exceptions.ValueError'>
min_periods (5) must be <= window (4)

In 0.7.3:

rolling_corr:
0   NaN
1   NaN
2   NaN
3   NaN

rolling_cov:
0   NaN
1   NaN
2   NaN
3   NaN

Note that changing either s1 OR s2 to be 6, the code doesn't error out, and instead outputs:

rolling_corr:
0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   NaN
dtype: float64

rolling_cov:
0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   NaN
dtype: float64

Meanwhile, in 0.14.1, other rolling functions continue to behave as before:

s1 = pd.Series(np.random.randn(4))

print '\rolling_std:'
print pd.rolling_std(s1, window=10, min_periods=5)
print '\nrolling_max:'
print pd.rolling_max(s1, window=10, min_periods=5)

Outputs:

rolling_std:
0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

rolling_max:
0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64
@jreback
Copy link
Contributor

jreback commented Jul 16, 2014

cc @seth-p

IIRC the changes were made only in roling_corr/cov, right?

@jreback
Copy link
Contributor

jreback commented Jul 16, 2014

@creeson I believe that the exception is correct, and that the rest of the rolling_* should raise as well in this case. Welcome a pull-request to fix.

@jreback jreback added this to the 0.15.0 milestone Jul 16, 2014
@seth-p
Copy link
Contributor

seth-p commented Jul 16, 2014

I don't think my changes in 14.1 (#7604) affected this. I didn't touch the treatment of min_periods. If anything my changes to rolling_corr/cov would lead to window being larger than it was previously, making this error less likely. (Note that I moved the adj_window = min(window, len(X), len(Y)) inside the local _get_cov/corr(X,Y), by which point X and Y should be aligned and potentially longer than arg1 and arg2.) It looks to me like someone at some point made an explicit decision to error out when min_periods > window.

Separately, I don't like the way min_periods defaults to window (https://groups.google.com/forum/#!topic/pydata/3laB7dmAWh4), but that's a separate issue...

@creeson
Copy link
Author

creeson commented Jul 16, 2014

Thanks guys. Is it considered more pythonic to error out? Otherwise I strongly prefer the behavior of letting the code continue with NaNs, after all, if you are using min_periods, you expect to get at least some NaNs. And I'm not at my work machine now, but I am 99% sure I can avoid this error by adding dummy NaNs to the beginning and removing them afterwards, which doesn't make any sense to me. But please let me know if there is some obvious downside to letting the code continue with NaNs?

@seth-p
Copy link
Contributor

seth-p commented Jul 16, 2014

I think the observed 14.1 behavior of rolling_corr/cov -- reducing window to the actual data length, and so potentially triggering the min_periods > window error, (a) has been there since before 14.1, and (b) is trivial to fix by simply not mining the window. I don't really know what the purpose of the min is/was, but I decided to leave it there when I did #7604).

Actually, only difference I can see is that when min_periods=None, since it defaults to window, reducing a window greater than the data length to the actual data length causes values to be calculated for the final index, whereas not doing so would lead to all NaNs. As I mentioned, I don't like the default setting of min_periods to window, but given that that's what's being done, I think it would be most consistent to not reduce window.

Am happy to submit a PR to eliminate adj_window = min(window, len(X), len(Y)) if you agree.

@jreback
Copy link
Contributor

jreback commented Jul 16, 2014

well we should decide which is more correct and make it consistent

if u both would investigate pros/cons would be great

@seth-p
Copy link
Contributor

seth-p commented Jul 16, 2014

I think it would be most consistent to eliminate the adj_window = min(window, len(X), len(Y)). No obvious cons come to mind.

@seth-p
Copy link
Contributor

seth-p commented Jul 16, 2014

FWIW, I just submitted a pull request. Though for some reason the Travis build failed (https://travis-ci.org/pydata/pandas/jobs/30036354) in a way that seems unrelated to my changes. (I had no problems running tests on my local 3.4.1 setup.)

@seth-p
Copy link
Contributor

seth-p commented Jul 17, 2014

I rebased #7766, and now it builds successfully.

@seth-p
Copy link
Contributor

seth-p commented Jul 17, 2014

I updated #7766 to test all rolling_* functions, and also fixed the same problem in rolling_min/max.

@jreback
Copy link
Contributor

jreback commented Jul 23, 2014

closed by #7766

@jreback jreback closed this as completed Jul 23, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants