Skip to content

API: Series.sum() will now return 0.0 for all-NaN series #10815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Aug 13, 2015

compat with numpy >= 1.8.2 and bottleneck >= 1.0, #9422
note that passing skipna=False will still return a NaN

@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate API Design Numeric Operations Arithmetic, Comparison, and Logical operations Compat pandas objects compatability with Numpy or Python functions labels Aug 13, 2015
@jreback jreback added this to the 0.17.0 milestone Aug 13, 2015
# 9422
s = Series([np.nan])
if not _np_version_under1p10:
self.assertEqual(s.prod(skipna=True),0.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nanprod of all nans should actually be 1 (the multiplicative identity), not 0

@jreback
Copy link
Contributor Author

jreback commented Aug 14, 2015

@shoyer

So you think these are 'bugs' in numpy or just conventions?
and then should we follow these. ?

In [6]: np.nansum(np.array([np.nan],dtype='float64'))     
Out[6]: 0.0

In [7]: np.nansum(np.array([np.nan],dtype='object'))
Out[7]: nan

In [8]: np.nansum(np.array([np.timedelta64('nat')]))     
Out[8]: numpy.timedelta64('NaT')

@shoyer
Copy link
Member

shoyer commented Aug 14, 2015

Those look like numpy bugs to me

On Fri, Aug 14, 2015 at 11:56 AM, Jeff Reback notifications@github.com
wrote:

@shoyer
So you think these are 'bugs' in numpy or just conventions?
and then should we follow these. ?

In [6]: np.nansum(np.array([np.nan],dtype='float64'))     
Out[6]: 0.0
In [7]: np.nansum(np.array([np.nan],dtype='object'))
Out[7]: nan
In [8]: np.nansum(np.array([np.timedelta64('nat')]))     
Out[8]: numpy.timedelta64('NaT')

Reply to this email directly or view it on GitHub:
#10815 (comment)

allow_str=False, allow_date=False, allow_tdelta=False)

# validate that nanprod of all nans is 0, True for numpy >= 1.10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, thxs

@jreback
Copy link
Contributor Author

jreback commented Aug 14, 2015

I am thinking this is actually a bad idea to make this change. This loses information in the .sum(). Maybe numpy doesn't care as it doesn't propogate nans. But I think users do.

@shoyer
@jorisvandenbossche
@sinhrks

In [1]: df = DataFrame({'A' : [1,2,3], 'B' : np.nan})

In [2]: df
Out[2]: 
   A   B
0  1 NaN
1  2 NaN
2  3 NaN

In [3]: df.sum()
Out[3]: 
A    6
B    0
dtype: float64

In [4]: df.sum(1)
Out[4]: 
0    1
1    2
2    3
dtype: float64

In [5]: df2 = DataFrame({'A' : [1,2,3], 'B' : [np.nan,0,0]})

In [6]: df2.sum()
Out[6]: 
A    6
B    0
dtype: float64

In [7]: df2.sum(1)
Out[7]: 
0    1
1    2
2    3
dtype: float64

So your result now depends on whether you have 1 non-NaN in a reduction. And if it happens to be 0, they are indistinguishable.

@jreback
Copy link
Contributor Author

jreback commented Aug 18, 2015

@jorisvandenbossche @shoyer opinions on this?

@jorisvandenbossche
Copy link
Member

Hmm, tough one, and I don't really have a strong opinion on this.

However, there is also some logic/consistency on returning 0, as an empty series gives you also 0:

In [30]: pd.Series([]).sum()
Out[30]: 0

And if you skip the NA's, you have in fact an empty series

@shoyer
Copy link
Member

shoyer commented Aug 19, 2015

I also don't have a strong opinion here. In practice, sum gets used much
less than mean, and even when it does I'm often dividing eventually by
counts, anyways, which means that this distinction doesn't matter. My
inclination is to copy NumPy, though this would also have implications for
other pandas functions (e.g., grouped and moving sum).

On Tue, Aug 18, 2015 at 4:47 PM, Joris Van den Bossche <
notifications@github.com> wrote:

Hmm, tough one, and I don't really have a strong opinion on this.

However, there is also some logic/consistency on returning 0, as an empty
series gives you also 0:

In [30]: pd.Series([]).sum()
Out[30]: 0

And if you skip the NA's, you have in fact an empty series


Reply to this email directly or view it on GitHub
#10815 (comment).

@jreback
Copy link
Contributor Author

jreback commented Aug 19, 2015

actually my point wasn't the actual behavior, it was losing the propogation of all-NaN columns when you do a reduction. IOW, you essentially lose the notion that you aggregated on an all-NaN column.

The only reason I am pressing on this is I just installed bottleneck 1.0 and have a couple of tests I either need to mark as knownfails or fix this :)

… compat with numpy >= 1.8.2 and bottleneck >= 1.0, pandas-dev#9422

     note that passing skipna=False will still return a NaN
@wholmgren
Copy link
Contributor

I came across this issue after spending way too much time trying to debug my code after a sloppy package upgrade session that included bottleneck 1.0. For what it's worth, I very much prefer these behaviors:

In [1]: pd.Series([]).sum() # less important
Out[1]: NaN

In [2]: pd.Series([np.nan]).sum()
Out[2]: NaN
In [1]: df = DataFrame({'A' : [1,2,3], 'B' : np.nan})

In [2]: df
Out[2]: 
   A   B
0  1 NaN
1  2 NaN
2  3 NaN

In [3]: df.sum()
Out[3]: 
A    6
B    NaN
dtype: float64

In [4]: df.sum(1)
Out[4]: 
0    1
1    2
2    3
dtype: float64

@jreback
Copy link
Contributor Author

jreback commented Sep 3, 2015

@wholmgren FWIW, I agree with you. but this is getting pushed off anyhow, so current behavior will remain.

@jreback
Copy link
Contributor Author

jreback commented Sep 3, 2015

going to close for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Compat pandas objects compatability with Numpy or Python functions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants