-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
API: Series.sum() will now return 0.0 for all-NaN series #10815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# 9422 | ||
s = Series([np.nan]) | ||
if not _np_version_under1p10: | ||
self.assertEqual(s.prod(skipna=True),0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nanprod of all nans should actually be 1 (the multiplicative identity), not 0
So you think these are 'bugs' in numpy or just conventions?
|
Those look like numpy bugs to me On Fri, Aug 14, 2015 at 11:56 AM, Jeff Reback notifications@github.com
|
allow_str=False, allow_date=False, allow_tdelta=False) | ||
|
||
# validate that nanprod of all nans is 0, True for numpy >= 1.10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo, thxs
I am thinking this is actually a bad idea to make this change. This loses information in the @shoyer
So your result now depends on whether you have 1 non-NaN in a reduction. And if it happens to be 0, they are indistinguishable. |
@jorisvandenbossche @shoyer opinions on this? |
Hmm, tough one, and I don't really have a strong opinion on this. However, there is also some logic/consistency on returning 0, as an empty series gives you also 0:
And if you skip the NA's, you have in fact an empty series |
I also don't have a strong opinion here. In practice, sum gets used much On Tue, Aug 18, 2015 at 4:47 PM, Joris Van den Bossche <
|
actually my point wasn't the actual behavior, it was losing the propogation of all-NaN columns when you do a reduction. IOW, you essentially lose the notion that you aggregated on an all-NaN column. The only reason I am pressing on this is I just installed bottleneck 1.0 and have a couple of tests I either need to mark as knownfails or fix this :) |
… compat with numpy >= 1.8.2 and bottleneck >= 1.0, pandas-dev#9422 note that passing skipna=False will still return a NaN
I came across this issue after spending way too much time trying to debug my code after a sloppy package upgrade session that included bottleneck 1.0. For what it's worth, I very much prefer these behaviors: In [1]: pd.Series([]).sum() # less important
Out[1]: NaN
In [2]: pd.Series([np.nan]).sum()
Out[2]: NaN In [1]: df = DataFrame({'A' : [1,2,3], 'B' : np.nan})
In [2]: df
Out[2]:
A B
0 1 NaN
1 2 NaN
2 3 NaN
In [3]: df.sum()
Out[3]:
A 6
B NaN
dtype: float64
In [4]: df.sum(1)
Out[4]:
0 1
1 2
2 3
dtype: float64 |
@wholmgren FWIW, I agree with you. but this is getting pushed off anyhow, so current behavior will remain. |
going to close for now |
compat with
numpy
>= 1.8.2 andbottleneck
>= 1.0, #9422note that passing skipna=False will still return a NaN