Skip to content

update rolling doc string #772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 20, 2016
Merged

update rolling doc string #772

merged 1 commit into from
Feb 20, 2016

Conversation

jhamman
Copy link
Member

@jhamman jhamman commented Feb 20, 2016

minor update of rolling doc string. Missed this update after @shoyer's last review.

xref: #668

jhamman pushed a commit that referenced this pull request Feb 20, 2016
@jhamman jhamman merged commit 4242f70 into pydata:master Feb 20, 2016
@jhamman jhamman deleted the rolling_doc_string branch February 20, 2016 17:01
min_periods : int, default None
Minimum number of observations in window required to have a value
(otherwise result is NA).
(otherwise result is NA). The default, None, is equivalent to
setting min_periods equal to the size of the window.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does pandas follow this same convention for handling missing values? It's probably worth checking...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think bottleneck and pandas differ on how they handle the min_periods argument.

Bottleneck:

min_count: {int, None}, optional :

If the number of non-NaN values in a window is less than min_count, then a value of NaN is assigned to the window. By default min_count is None, which is equivalent to setting min_count equal to window.

Pandas doesn't say in its doc string:

min_periods : int, default None

Minimum number of observations in window required to have a value (otherwise result is NA).

So, comparing their behavior, we see they both set min_periods to the size of the window.

In [1]: import pandas as pd
In [2]: s = pd.Series(range(8))    
In [3]: pd.rolling_mean(s, 3)
Out[3]: 
0   NaN
1   NaN
2     1
3     2
4     3
5     4
6     5
7     6
dtype: float64
In [4]: import bottleneck as bn
In [6]: bn.move_mean(s, 3)
Out[6]: array([ nan,  nan,   1.,   2.,   3.,   4.,   5.,   6.])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something seems to be out of sync for NaN handling, though:

In [27]: d = xr.DataArray([0, np.nan, 1, 2, np.nan, 3, 4, 5, np.nan, 6, 7], dims='x')

In [28]: d.rolling(x=2).mean()
Out[28]:
<xarray.DataArray (x: 11)>
array([ nan,  0. ,  1. ,  1.5,  2. ,  3. ,  3.5,  4.5,  5. ,  6. ,  6.5])
Coordinates:
  * x        (x) int64 0 1 2 3 4 5 6 7 8 9 10

# using the pandas RC for v0.18
In [29]: d.to_series().rolling(2).mean().to_xarray()
Out[29]:
<xarray.DataArray (x: 11)>
array([ nan,  nan,  nan,  1.5,  nan,  nan,  3.5,  4.5,  nan,  nan,  6.5])
Coordinates:
  * x        (x) int64 0 1 2 3 4 5 6 7 8 9 10

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, you must not have bottleneck in your environment because I get:

In [6]: d.rolling(x=2).mean()
Out[6]: 
<xarray.DataArray (x: 11)>
array([ nan,  nan,  nan,  1.5,  nan,  nan,  3.5,  4.5,  nan,  nan,  6.5])
Coordinates:
  * x        (x) int64 0 1 2 3 4 5 6 7 8 9 10

I suppose we should open an issue on this. I guess we need to use the non-nan safe numpy methods in Rolling.reduce to get the same behavior. We'll have to come up with a solution to get this to work in a vectorized manor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, no bottleneck on my work machine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made a new issue :#776

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants