API: Series.sum() will now return 0.0 for all-NaN series #10815

jreback · 2015-08-13T15:29:06Z

compat with numpy >= 1.8.2 and bottleneck >= 1.0, #9422
note that passing skipna=False will still return a NaN

shoyer · 2015-08-13T17:20:54Z

pandas/tests/test_nanops.py

+        # 9422
+        s = Series([np.nan])
+        if not _np_version_under1p10:
+            self.assertEqual(s.prod(skipna=True),0.0)


nanprod of all nans should actually be 1 (the multiplicative identity), not 0

jreback · 2015-08-14T18:56:03Z

@shoyer

So you think these are 'bugs' in numpy or just conventions?
and then should we follow these. ?

In [6]: np.nansum(np.array([np.nan],dtype='float64'))     
Out[6]: 0.0

In [7]: np.nansum(np.array([np.nan],dtype='object'))
Out[7]: nan

In [8]: np.nansum(np.array([np.timedelta64('nat')]))     
Out[8]: numpy.timedelta64('NaT')

shoyer · 2015-08-14T19:09:31Z

Those look like numpy bugs to me

On Fri, Aug 14, 2015 at 11:56 AM, Jeff Reback notifications@github.com
wrote:

@shoyer
So you think these are 'bugs' in numpy or just conventions?
and then should we follow these. ?
In [6]: np.nansum(np.array([np.nan],dtype='float64'))     
Out[6]: 0.0
In [7]: np.nansum(np.array([np.nan],dtype='object'))
Out[7]: nan
In [8]: np.nansum(np.array([np.timedelta64('nat')]))     
Out[8]: numpy.timedelta64('NaT')
Reply to this email directly or view it on GitHub:
#10815 (comment)

kawochen · 2015-08-14T19:39:33Z

pandas/tests/test_nanops.py

                        allow_str=False, allow_date=False, allow_tdelta=False)

+        # validate that nanprod of all nans is 0, True for numpy >= 1.10


jreback · 2015-08-14T20:49:14Z

I am thinking this is actually a bad idea to make this change. This loses information in the .sum(). Maybe numpy doesn't care as it doesn't propogate nans. But I think users do.

@shoyer
@jorisvandenbossche
@sinhrks

In [1]: df = DataFrame({'A' : [1,2,3], 'B' : np.nan})

In [2]: df
Out[2]: 
   A   B
0  1 NaN
1  2 NaN
2  3 NaN

In [3]: df.sum()
Out[3]: 
A    6
B    0
dtype: float64

In [4]: df.sum(1)
Out[4]: 
0    1
1    2
2    3
dtype: float64

In [5]: df2 = DataFrame({'A' : [1,2,3], 'B' : [np.nan,0,0]})

In [6]: df2.sum()
Out[6]: 
A    6
B    0
dtype: float64

In [7]: df2.sum(1)
Out[7]: 
0    1
1    2
2    3
dtype: float64

So your result now depends on whether you have 1 non-NaN in a reduction. And if it happens to be 0, they are indistinguishable.

jreback · 2015-08-18T23:43:07Z

@jorisvandenbossche @shoyer opinions on this?

jorisvandenbossche · 2015-08-18T23:47:14Z

Hmm, tough one, and I don't really have a strong opinion on this.

However, there is also some logic/consistency on returning 0, as an empty series gives you also 0:

In [30]: pd.Series([]).sum()
Out[30]: 0

And if you skip the NA's, you have in fact an empty series

shoyer · 2015-08-19T00:50:31Z

I also don't have a strong opinion here. In practice, sum gets used much
less than mean, and even when it does I'm often dividing eventually by
counts, anyways, which means that this distinction doesn't matter. My
inclination is to copy NumPy, though this would also have implications for
other pandas functions (e.g., grouped and moving sum).

On Tue, Aug 18, 2015 at 4:47 PM, Joris Van den Bossche <
notifications@github.com> wrote:

Hmm, tough one, and I don't really have a strong opinion on this.

However, there is also some logic/consistency on returning 0, as an empty
series gives you also 0:

In [30]: pd.Series([]).sum()
Out[30]: 0

And if you skip the NA's, you have in fact an empty series

—
Reply to this email directly or view it on GitHub
#10815 (comment).

jreback · 2015-08-19T00:53:25Z

actually my point wasn't the actual behavior, it was losing the propogation of all-NaN columns when you do a reduction. IOW, you essentially lose the notion that you aggregated on an all-NaN column.

The only reason I am pressing on this is I just installed bottleneck 1.0 and have a couple of tests I either need to mark as knownfails or fix this :)

… compat with numpy >= 1.8.2 and bottleneck >= 1.0, pandas-dev#9422 note that passing skipna=False will still return a NaN

wholmgren · 2015-09-03T04:05:57Z

I came across this issue after spending way too much time trying to debug my code after a sloppy package upgrade session that included bottleneck 1.0. For what it's worth, I very much prefer these behaviors:

In [1]: pd.Series([]).sum() # less important
Out[1]: NaN

In [2]: pd.Series([np.nan]).sum()
Out[2]: NaN

In [1]: df = DataFrame({'A' : [1,2,3], 'B' : np.nan})

In [2]: df
Out[2]: 
   A   B
0  1 NaN
1  2 NaN
2  3 NaN

In [3]: df.sum()
Out[3]: 
A    6
B    NaN
dtype: float64

In [4]: df.sum(1)
Out[4]: 
0    1
1    2
2    3
dtype: float64

jreback · 2015-09-03T14:06:14Z

@wholmgren FWIW, I agree with you. but this is getting pushed off anyhow, so current behavior will remain.

jreback · 2015-09-03T14:19:26Z

going to close for now

jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate API Design Numeric Operations Arithmetic, Comparison, and Logical operations Compat pandas objects compatability with Numpy or Python functions labels Aug 13, 2015

jreback added this to the 0.17.0 milestone Aug 13, 2015

shoyer reviewed Aug 13, 2015
View reviewed changes

jreback force-pushed the nansum branch from 0d1d983 to b02edf3 Compare August 14, 2015 18:03

jreback mentioned this pull request Aug 14, 2015

bug in nansum with non-float64 dtypes numpy/numpy#6209

Open

kawochen reviewed Aug 14, 2015
View reviewed changes

jreback force-pushed the nansum branch from b02edf3 to 48dd154 Compare August 21, 2015 00:07

API: Series.sum() will now return 0.0 for all-NaN series; this is for…

630187f

… compat with numpy >= 1.8.2 and bottleneck >= 1.0, pandas-dev#9422 note that passing skipna=False will still return a NaN

jreback force-pushed the nansum branch from 48dd154 to 630187f Compare August 21, 2015 12:44

jreback mentioned this pull request Aug 21, 2015

Fixed bug #9733 where stat functions returned a python scalar for empty series #9829

Closed

jreback modified the milestones: Next Major Release, 0.17.0 Aug 21, 2015

jreback closed this Sep 3, 2015

wholmgren mentioned this pull request Jan 7, 2016

API: sum of Series of all NaN should return 0 or NaN ? #9422

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API: Series.sum() will now return 0.0 for all-NaN series #10815

API: Series.sum() will now return 0.0 for all-NaN series #10815

Uh oh!

jreback commented Aug 13, 2015

Uh oh!

shoyer Aug 13, 2015

Uh oh!

jreback commented Aug 14, 2015

Uh oh!

shoyer commented Aug 14, 2015

Uh oh!

kawochen Aug 14, 2015

Uh oh!

jreback Aug 14, 2015

Uh oh!

jreback commented Aug 14, 2015

Uh oh!

jreback commented Aug 18, 2015

Uh oh!

jorisvandenbossche commented Aug 18, 2015

Uh oh!

shoyer commented Aug 19, 2015

Uh oh!

jreback commented Aug 19, 2015

Uh oh!

wholmgren commented Sep 3, 2015

Uh oh!

jreback commented Sep 3, 2015

Uh oh!

jreback commented Sep 3, 2015

Uh oh!

Uh oh!

		allow_str=False, allow_date=False, allow_tdelta=False)

		# validate that nanprod of all nans is 0, True for numpy >= 1.10

Uh oh!

API: Series.sum() will now return 0.0 for all-NaN series #10815

API: Series.sum() will now return 0.0 for all-NaN series #10815

Uh oh!

Conversation

jreback commented Aug 13, 2015

Uh oh!

shoyer Aug 13, 2015

Choose a reason for hiding this comment

Uh oh!

jreback commented Aug 14, 2015

Uh oh!

shoyer commented Aug 14, 2015

Uh oh!

kawochen Aug 14, 2015

Choose a reason for hiding this comment

Uh oh!

jreback Aug 14, 2015

Choose a reason for hiding this comment

Uh oh!

jreback commented Aug 14, 2015

Uh oh!

jreback commented Aug 18, 2015

Uh oh!

jorisvandenbossche commented Aug 18, 2015

Uh oh!

shoyer commented Aug 19, 2015

Uh oh!

jreback commented Aug 19, 2015

Uh oh!

wholmgren commented Sep 3, 2015

Uh oh!

jreback commented Sep 3, 2015

Uh oh!

jreback commented Sep 3, 2015

Uh oh!

Uh oh!