-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
update rolling doc string #772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
min_periods : int, default None | ||
Minimum number of observations in window required to have a value | ||
(otherwise result is NA). | ||
(otherwise result is NA). The default, None, is equivalent to | ||
setting min_periods equal to the size of the window. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does pandas follow this same convention for handling missing values? It's probably worth checking...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think bottleneck and pandas differ on how they handle the min_periods
argument.
Bottleneck:
min_count: {int, None}, optional :
If the number of non-NaN values in a window is less than min_count, then a value of NaN is assigned to the window. By default min_count is None, which is equivalent to setting min_count equal to window.
Pandas doesn't say in its doc string:
min_periods : int, default None
Minimum number of observations in window required to have a value (otherwise result is NA).
So, comparing their behavior, we see they both set min_periods
to the size of the window.
In [1]: import pandas as pd
In [2]: s = pd.Series(range(8))
In [3]: pd.rolling_mean(s, 3)
Out[3]:
0 NaN
1 NaN
2 1
3 2
4 3
5 4
6 5
7 6
dtype: float64
In [4]: import bottleneck as bn
In [6]: bn.move_mean(s, 3)
Out[6]: array([ nan, nan, 1., 2., 3., 4., 5., 6.])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something seems to be out of sync for NaN handling, though:
In [27]: d = xr.DataArray([0, np.nan, 1, 2, np.nan, 3, 4, 5, np.nan, 6, 7], dims='x')
In [28]: d.rolling(x=2).mean()
Out[28]:
<xarray.DataArray (x: 11)>
array([ nan, 0. , 1. , 1.5, 2. , 3. , 3.5, 4.5, 5. , 6. , 6.5])
Coordinates:
* x (x) int64 0 1 2 3 4 5 6 7 8 9 10
# using the pandas RC for v0.18
In [29]: d.to_series().rolling(2).mean().to_xarray()
Out[29]:
<xarray.DataArray (x: 11)>
array([ nan, nan, nan, 1.5, nan, nan, 3.5, 4.5, nan, nan, 6.5])
Coordinates:
* x (x) int64 0 1 2 3 4 5 6 7 8 9 10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, you must not have bottleneck in your environment because I get:
In [6]: d.rolling(x=2).mean()
Out[6]:
<xarray.DataArray (x: 11)>
array([ nan, nan, nan, 1.5, nan, nan, 3.5, 4.5, nan, nan, 6.5])
Coordinates:
* x (x) int64 0 1 2 3 4 5 6 7 8 9 10
I suppose we should open an issue on this. I guess we need to use the non-nan safe numpy methods in Rolling.reduce
to get the same behavior. We'll have to come up with a solution to get this to work in a vectorized manor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, no bottleneck on my work machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made a new issue :#776
minor update of
rolling
doc string. Missed this update after @shoyer's last review.xref: #668