Breaking changes for sum / prod of empty / all-NA #18921

TomAugspurger · 2017-12-23T12:48:38Z

Changes the defaults for min_count so that sum([]) and sum([np.nan]) are 0 by default, and NaN with min_count>=1.

I'd recommend looking at only the latest commit until #18876 is merged. I'll probably force push changes here to keep all the relevant changes in the last commit until #18876 is in, rebase on that, and then start pushing changes regularly.

cc @jreback @jorisvandenbossche @shoyer @wesm

TomAugspurger · 2017-12-23T12:49:41Z

doc/source/whatsnew/v0.22.0.txt

+   idx = pd.DatetimeIndex(['2017-01-01', '2017-01-02'])
+   pd.Series([1, 2], index=idx).resample("12H").sum()
+
+   pd.Series([1, 2], index=idx).resample("12H").sum(min_count=1)


TODO: Add .rolling and .expanding with all-NA and min_count=0.

TomAugspurger · 2017-12-23T12:52:41Z

pandas/tests/frame/test_analytics.py

            def wrapper(x):
                return alternative(x.values)

+            if skipna_alternative:


This seemed like the easiest way of getting these tests to pass. skipna_alternative is only used in sum and prod, and is nansum or nanprod.

TomAugspurger · 2017-12-23T12:53:23Z

pandas/tests/groupby/test_aggregate.py

        ('mean', np.mean),
        ('median', lambda x: np.median(x) if len(x) > 0 else np.nan),
        ('var', lambda x: np.var(x, ddof=1)),
-        ('add', lambda x: np.sum(x) if len(x) > 0 else np.nan),


I moved these to their own tests in e390d1c#diff-1c9535bf6ad763b307a780e37cc4185cR833 due to #18869

TomAugspurger · 2017-12-23T12:54:36Z

pandas/tests/series/test_analytics.py

        tm.assert_series_equal(result, expected)

    @pytest.mark.parametrize(
-        "method", ['sum', 'mean', 'median', 'std', 'var'])


TODO: make sure I have coverage for this elsewhere... I don't remember why I removed it here.

These are handled up above in test_empty.

TomAugspurger · 2017-12-23T12:55:42Z

pandas/tests/test_base.py

-    def test_nanops(self):
+    @pytest.mark.parametrize('klass', [Index, Series])
+    @pytest.mark.parametrize('op', ['max', 'min'])
+    def test_nanops_max_min(self, op, klass):


Hmm, I thought I did this parametrization in #18876. I can move it there if you want.

Actually, these seem to be a remnant of an earlier version where I was adding min_count everywhere. I'll just remove it.

TomAugspurger · 2017-12-23T12:57:33Z

pandas/tests/test_nanops.py

            for skipna in [False, True]:
                targartempval = targarval if skipna else targarnanval
-                try:
+                if skipna and empty_targfunc and pd.isna(targartempval).all():


I'm reasonably sure I refactored these correctly, but would appreciate a close review here. empty_targfunc should be like nansum or nanprod, in which case we want to call it on targartempval. I should double check why just passing nansum for targfunc didn't work though...

use the imported isna

TomAugspurger · 2017-12-23T12:59:00Z

pandas/tests/test_window.py

            assert notna(result[4])

            # min_periods=0
-            result0 = get_result(arr, 20, min_periods=0)


I had to disable this subset of tests, since I don't think it's true that min_period=0 is the same as min_period=1.

so make new ones that make this diff clear (i agree they should be different).

Done, added a keyword to control this check, since it is true for most operations. sum is tested up above in test_missing_minp_zero.

jreback · 2017-12-23T13:33:41Z

pandas/_libs/window.pyx


+    if minp == 0:
+        # in get_window_indexer, we ensure that minp >= 1. That's fine for
+        # all cases except nobs = 0 (all missing values) and minp=0. For


all of the minp logic is handled externally to window.pyx (in window.py)

Are you saying to move this there? I don't think that's possible without rewriting roll_sum. All of it's logic assumes that minp is at least 1. The only thing I need the user-provided minp is for passing to calc_sum later.

then it should be changed. This is way too hacky. ALL logic related to minp is passed in.

jreback · 2017-12-23T13:35:24Z

pandas/core/nanops.py

-                        result.fill(fill_value)
-                        return result
+                if values.size == 0 and kwds.get('min_count') is None:
+                    # We are empty, returning NA for our type


isn’t min_count always passed?

No, just when the caller is sum or prod. Methods like mean don't pass it.

I find this very error prone. The internal functions can always pass min_count I think, so it will be not-None. or at the very least comment on this.

The issue here passing through the correct arguments to the nanop. pandas.core.nanops.sum takes min_count, while mean_doesn't. The function calling f knows that, but f doesn't. So I would either

refactor f to not do this special casing, and have each nanop properly handle size-0 arrays

add an unused min_count to each nanop, and always pass it through

have f add min_count to kwds when it's not None.

I think 1 is the best option, but will take a bit of time. 3 doesn't strike me as any better than the current method.

I think you could just pass it thru in make_stat_function? e.g. .you always just pass

# not sum/prod (I know you have a separate function for this) min_count=-1 return self._reduce(f, name, axis=axis, skipna=skipna, numeric_only=numeric_only, min_count=min_count)

That'd be option 2. I'd have to add a min_count keyword to each of our methods in pandas.core.nanops, which I don't think we want.

jreback · 2017-12-23T13:39:44Z

pandas/tests/frame/test_analytics.py

+        ('prod', 1),
+    ])
+    def test_sum_prod_nanops(self, method, unit):
+        idx = ['a', 'b', 'c']


you could add tests for Timedelta here as well (should work)

jreback · 2017-12-23T13:41:24Z

pandas/tests/series/test_analytics.py

+    ])
+    def test_empty(self, method, unit, use_bottleneck):
        with pd.option_context("use_bottleneck", use_bottleneck):
            # GH 9422


add this issue number

jreback · 2017-12-23T13:43:27Z

pandas/tests/test_resample.py

-            pad = DataFrame([[np.nan, np.nan, np.nan, np.nan]], index=[3],
+
+            if func == 'sum':
+                fill_value = 0


can u parametrize on func name and fill_value

TomAugspurger · 2017-12-23T20:19:04Z

This turned up a slight inconsistency between:

a) groupby([pd.Grouper(key1, freq), key2]).agg
b) groupby(key2).resample(freq).agg

The first doesn't upsample to include "unobserved" values, while the latter does. An example:

        df = pd.DataFrame({
            'A': [1] * 6 + [2] * 6,
            'B': pd.to_datetime(['2017'] * 3 +
                                ['2019'] * 3 +
                                ['2017'] * 3 +
                                ['2019'] * 3),
            'C': [1] * 12
        })
        a = df.groupby([pd.Grouper(key='B', freq='AS'), 'A']).C.count()
        b = df.set_index("B").sort_index().groupby("A").resample("AS").C.count()

In [53]: a  # groupby([grouper, key]).agg
Out[53]:
B           A
2017-01-01  1    3
            2    3
2019-01-01  1    3
            2    3
Name: C, dtype: int64

In [54]: b  # groupby(key).resample().agg
Out[54]:
A  B
1  2017-01-01    3
   2018-01-01    0
   2019-01-01    3
2  2017-01-01    3
   2018-01-01    0
   2019-01-01    3
Name: C, dtype: int64

I've use count here, but the same thing is affecting sum and prod. Do we want these to be the same?

pep8speaks · 2017-12-23T20:23:02Z

Hello @TomAugspurger! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 29, 2017 at 12:25 Hours UTC

shoyer · 2017-12-23T23:54:40Z

RE: Grouoer/resample: that does look inconsistent. I think the resample behavior should be used for both?

jorisvandenbossche · 2017-12-26T08:30:36Z

Somehow I could give an explanation for the difference in behaviour between the groupby and resample example (resample: conform to new index given the frequency -> full time series, groupby: makes groups of my data based on given frequency -> only those groups contained in data). But of course there are also good reasons to have this consistent.
But therefore: does this need to be fixed here? (is keeping it as it is complicating the PR?) If not, I would open a separate issue for it.

jorisvandenbossche

Reviewed only the whatsnew note.

In general, for the "changes" part of the whatsnew, should we also note somehow that whether this changed for you compared to pre-0.21 behaviour, depends on whether bottlneneck was installed or not (although that is maybe only for the main sum and not for groupby or resample?)

jorisvandenbossche · 2017-12-26T08:33:46Z

doc/source/whatsnew/v0.22.0.txt

+* We've added a ``min_count`` parameter to ``.sum`` and ``.prod`` to control
+  the minimum number of valid values for the result to be valid. If fewer than
+  ``min_count`` valid values are present, the result is NA. The default is
+  ``0``. To restore the 0.21 behavior, use ``min_count=1``.


I would explicitly say in the summary here that this result is np.nan

jorisvandenbossche · 2017-12-26T08:41:20Z

doc/source/whatsnew/v0.22.0.txt

+  ``min_count`` valid values are present, the result is NA. The default is
+  ``0``. To restore the 0.21 behavior, use ``min_count=1``.
+
+Some background: In pandas 0.21.1, we fixed a long-standing inconsistency


0.21.1 -> 0.21.0 ?

jorisvandenbossche · 2017-12-26T11:55:29Z

doc/source/whatsnew/v0.22.0.txt

+   grouper = pd.Categorical(['a', 'a'], categories=['a', 'b'])
+   pd.Series([1, 2]).groupby(grouper).sum()
+
+   pd.Series([1, 2]).groupby(groupuer).sum(min_count=1)


gruupeur -> grouper

jorisvandenbossche · 2017-12-26T11:56:13Z

doc/source/whatsnew/v0.22.0.txt

+
+.. code-block:: ipython
+
+   In [1]: import pandas as pd; import numpy as np;


this line is not needed

jorisvandenbossche · 2017-12-26T11:58:10Z

doc/source/whatsnew/v0.22.0.txt

+
+   s.resample('2d').sum(min_count=1)
+
+Upsampling in particular is affected, as this will introduce all-NaN series even


Should we emphasize here somewhere this change is only for sum/prod, and not for all other aggregation functions? (I know it is already said above, but I think we have to stress enough it that those examples only apply for specific cases)

TomAugspurger · 2017-12-26T12:28:21Z

But therefore: does this need to be fixed here? (is keeping it as it is complicating the PR?) If not, I would open a separate issue for it.

I'm not sure. We have two tests that relied on that equivalence to construct the expected result.

TomAugspurger · 2017-12-27T14:54:21Z

pandas/_libs/window.pyx

        ndarray[int64_t] start, end
        ndarray[double_t] output

+    if minp == 0:


@jreback thoughts on this approach? no_min is a boolean that indicates whether or not to consider min_p when choosing if the result is valid in calc_sum

I really don't like this. Why can't you just dispatch on minp==0 above

e.g. line 416

if nobs >= minp or minp==0

?

That's what I had earlier with the minp2 stuff.

The problem there was that was that in roll_sum, minp gets set to min(minp, 1), so we don't know if the user specified min_periods=0.

hmm I don't see that. maybe can just remove that?

@jreback, a25e793 should be better.

It looks bigger than it is because I had to ensure VariableWindowIndexer passes through floor. The main change is just using floor=0 in roll_sum and iterating like range(min(1, minp), ...) instead of just range(minmp, ...).

TomAugspurger · 2017-12-27T18:54:27Z

The only failing tests are the equivalence tests between df.groupby(key).resample and df.groupby([key, Grouper]).

Somehow I could give an explanation for the difference in behaviour between the groupby and resample example

@jorisvandenbossche could you explain a bit further? I don't quite understand. IMO, saying Grouper(key, freq='AS') necessarily implies a resample.

jreback · 2017-12-27T18:47:56Z

doc/source/whatsnew/v0.22.0.txt

+Pandas 0.22.0 changes the handling of empty and all-NA sums and products. The
+summary is that
+
+* The sum of an all-NA or empty series is now 0


maybe add Series in capitalize with double-backticks.

possibly also use double-backticks around all-NA (or maybe italics)

jreback · 2017-12-27T18:50:53Z

doc/source/whatsnew/v0.22.0.txt

+Based on feedback, we've partially reverted those changes. The default sum for
+all-NA and empty series is now 0 (1 for ``prod``).
+
+*pandas 0.21*


pandas 0.21.x

jreback · 2017-12-27T18:52:22Z

doc/source/whatsnew/v0.22.0.txt

+Returning ``NaN`` was the default behavior for pandas 0.20.3 without bottleneck
+installed.
+
+Note that this affects some other places in the library:


I would break out these to separate sub-sections I think (e.g the numbered ones)

jreback · 2017-12-27T18:54:20Z

pandas/_libs/window.pyx

        ndarray[int64_t] start, end
        ndarray[double_t] output

+    if minp == 0:


I really don't like this. Why can't you just dispatch on minp==0 above

e.g. line 416

if nobs >= minp or minp==0

?

jreback · 2017-12-27T18:58:03Z

pandas/core/generic.py

+_sum_examples = """\
+Examples
+--------
+By default, the sum of an empty series is ``0``.


add all-NaN as well

jreback · 2017-12-27T19:01:42Z

pandas/core/nanops.py



+def _na_for_min_count(values, axis):
+    # we either return np.nan or pd.NaT


can you add doc-string

jreback · 2017-12-27T19:02:08Z

pandas/tests/frame/test_analytics.py

            def wrapper(x):
                return alternative(x.values)

+            if skipna_alternative:


jreback · 2017-12-27T19:08:08Z

pandas/tests/groupby/test_timegrouper.py

            assert_series_equal(result2, expected)

+    def test_groupby_resample_equivalence(self):
+        # Currently failing


this is failing?

It's a simplification of test_timegrouper_with_reg_groups, they're failing for the same reason (#18921 (comment))

jreback · 2017-12-27T19:10:08Z

pandas/tests/test_resample.py

-                            columns=['A', 'B', 'C', 'D'])
+        normal_result = getattr(normal_grouped, func)()
+        dt_result = getattr(dt_grouped, func)()
+


was this test moved above?

I don't think so, just parametrized.

jreback · 2017-12-27T19:11:03Z

pandas/tests/test_window.py

            assert notna(result[4])

            # min_periods=0
-            result0 = get_result(arr, 20, min_periods=0)


so make new ones that make this diff clear (i agree they should be different).

TomAugspurger · 2017-12-27T21:25:28Z

Returning to #18921 (comment), it's only an issue for when there are multiple groupers:

In [14]: df = pd.DataFrame({"A": pd.to_datetime(['2015', '2017']), "B": [1, 1]})

In [15]: df.groupby(pd.Grouper(key="A", freq="AS")).B.count()
Out[15]:
A
2015-01-01    1
2016-01-01    0
2017-01-01    1
Freq: AS-JAN, Name: B, dtype: int64

In [16]: df.groupby([pd.Grouper(key="A", freq="AS"), [0, 0]]).B.count()
Out[16]:
A
2015-01-01  0    1
2017-01-01  0    1
Name: B, dtype: int64

Still trying to track down the appropriate spot for the fix. Will we want to include this bugfix in 0.22, or save it for 0.23.0?

TomAugspurger · 2017-12-27T22:58:30Z

Fixing this is turning into quite the rabbit hole :/

TomAugspurger · 2017-12-28T12:45:18Z

IIUC, the issue is in how the new groups are determined

In [9]: df = pd.DataFrame({"A": pd.to_datetime(['2015', '2017']), "B": [1, 1]})

In [10]: gr1 = df.groupby(pd.Grouper(key="A", freq="AS")).B

In [11]: gr2 = df.groupby([pd.Grouper(key="A", freq="AS"), [0, 0]])

In [12]: gr1.grouper.result_index
Out[12]: DatetimeIndex(['2015-01-01', '2016-01-01', '2017-01-01'], dtype='datetime64[ns]', name='A', freq='AS-JAN')

In [13]: gr2.grouper.result_index
Out[13]:
MultiIndex(levels=[[2015-01-01 00:00:00, 2016-01-01 00:00:00, 2017-01-01 00:00:00], [0]],
           labels=[[0, 2], [0, 0]],
           names=['A', None])

So gr2.grouper.result_index has a level with no observations, the 2016-01-01. Then when we go to construct the groups it's dropped:

In [21]: gr2.grouper.groups
Out[21]:
{(Timestamp('2015-01-01 00:00:00'), 0): Int64Index([0], dtype='int64'),
 (Timestamp('2017-01-01 00:00:00'), 0): Int64Index([1], dtype='int64')}

I don't know if that's the root problem yet, or just a symptom.

codecov · 2017-12-28T20:07:13Z

Codecov Report

Merging #18921 into master will decrease coverage by 0.04%.
The diff coverage is 93.33%.

@@            Coverage Diff             @@
##           master   #18921      +/-   ##
==========================================
- Coverage    91.6%   91.56%   -0.05%     
==========================================
  Files         150      150              
  Lines       48939    48977      +38     
==========================================
+ Hits        44833    44845      +12     
- Misses       4106     4132      +26

Flag	Coverage Δ
#multiple	`89.92% <93.33%> (-0.05%)`	⬇️
#single	`41.72% <40%> (+0.54%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/generic.py	`95.95% <ø> (ø)`	⬆️
pandas/core/groupby.py	`92.14% <100%> (ø)`	⬆️
pandas/core/resample.py	`96.39% <100%> (ø)`	⬆️
pandas/core/nanops.py	`96.68% <100%> (ø)`	⬆️
pandas/util/testing.py	`84.95% <90%> (+0.05%)`	⬆️
pandas/plotting/_converter.py	`65.22% <0%> (-1.74%)`	⬇️
pandas/core/indexes/interval.py	`92.61% <0%> (-1.21%)`	⬇️
pandas/core/sparse/array.py	`91.82% <0%> (-0.47%)`	⬇️
pandas/core/dtypes/cast.py	`88.42% <0%> (-0.18%)`	⬇️
pandas/core/indexes/datetimelike.py	`97.04% <0%> (-0.16%)`	⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dbec3c9...a97e133. Read the comment docs.

jreback

just a couple of tiny comments

jreback · 2017-12-28T20:57:43Z

doc/source/whatsnew/v0.22.0.txt

+Some background: In pandas 0.21, we fixed a long-standing inconsistency
+in the return value of all-*NA* series depending on whether or not bottleneck
+was installed. See :ref:`whatsnew_0210.api_breaking.bottleneck`_. At the same
+time, we changed the sum and prod of an empty Series to also be ``NaN``.


double backticks around Series.

jreback · 2017-12-28T20:58:39Z

doc/source/whatsnew/v0.22.0.txt

+   pd.Series([]).sum()
+   pd.Series([np.nan]).sum()
+
+The default behavior is the same as pandas 0.20.3 with bottleneck installed. It


say matches numpy? (I know you are saying np.nansum, but can't hurt to actually say numpy)

jreback · 2017-12-28T20:58:58Z

doc/source/whatsnew/v0.22.0.txt

+also matches the behavior of ``np.nansum`` on empty and all-*NA* arrays.
+
+To have the sum of an empty series return ``NaN``, use the ``min_count``
+keyword. Thanks to the ``skipna`` parameter, the ``.sum`` on an all-*NA*


skipna=True

jreback · 2017-12-28T20:59:13Z

doc/source/whatsnew/v0.22.0.txt

+To have the sum of an empty series return ``NaN``, use the ``min_count``
+keyword. Thanks to the ``skipna`` parameter, the ``.sum`` on an all-*NA*
+series is conceptually the same as on an empty. The ``min_count`` parameter
+refers to the minimum number of *valid* values required for a non-NA sum


valid -> not-null

jreback · 2017-12-28T21:00:22Z

doc/source/whatsnew/v0.22.0.txt

+   pd.Series([1, 2]).groupby(grouper).sum()
+
+To restore the 0.21 behavior of returning ``NaN`` of unobserved groups,
+use ``min_count>=1``.


do we actually tests min_count > 1?

Yeah, I added a few more in frame/test_analytics.py, test_resample.py, and test_groupby.py.

jreback · 2017-12-28T21:01:03Z

doc/source/whatsnew/v0.22.0.txt

+^^^^^^^^^^^^^^^^^^^^^
+
+Rolling and expanding already have a ``min_periods`` keyword that behaves
+similarly to ``min_count``. The only case that changes is when doing a rolling


jreback · 2017-12-28T22:30:58Z

doc/source/whatsnew/v0.22.0.txt

@@ -1,14 +1,213 @@
-.. _whatsnew_0220:


don't u need this?

jreback · 2017-12-28T22:31:44Z

tiny comment. ok merging on green.

TomAugspurger · 2017-12-29T11:47:28Z

OK, all green.

I'm going to

go through this diff one more time
Merge it to master
Push a 0.23.0.dev0 tag to master
branch 0.22.x off 0.21.x
backport these two PRs to 0.22.x

TomAugspurger · 2017-12-29T13:05:24Z

All green. Here we go.

* API: Change the sum of all-NA / all-Empty sum / prod * Max, not min * Update whatsnew * Parametrize test * Minor cleanups * Refactor skipna_alternative * Split test * Added issue * More updates * linting * linting * Added skips * Doc fixup * DOC: More whatsnew (cherry picked from commit dedfce9)

topper-123 · 2017-12-30T19:33:42Z

I not sure understand the rationale for making Series([np.nan]).prod() equal 1. e.g. for percent-change calculations, a calculation for all nan series, should logically give nan, IMO, but with this change it will give 0.

>>> s = pd.Series([0, 0.05, 0.12])
>>> (1+s).prod() -1
0.17600000000000016  # ok
>>> s = pd.Series([0, 0.0, 0.0])
>>> (1+s).prod() -1
0.0  # ok
>>> s = pd.Series([np.nan, np.nan, np.nan])
>>> (1+s).prod() -1
0.0  # how do you interpret this?.

IMO percent calculation will be difficult to interpret, when product of nan's will equal 1, as no-change and not-defined will give back the same result of 0.0.

Is there a write-up on this, in case I'm misunderstanding this?

shoyer · 2017-12-30T23:18:24Z

I not sure understand the rationale for making Series([np.nan]).prod() equal 1.

The rationale is that this is consistent with how sum() works: 0 is the additive identity, and 1 is the multiplicative identity. It would be confusing to use a different rule.

There are absolutely use cases where these definitions for sum/prod is not desired, but it's impossible for the default behavior to work in all cases. Your use case of calculating percent change sounds like a good use for min_count=1 or even skipna=False.

* API: Change the sum of all-NA / all-Empty sum / prod * Max, not min * Update whatsnew * Parametrize test * Minor cleanups * Refactor skipna_alternative * Split test * Added issue * More updates * linting * linting * Added skips * Doc fixup * DOC: More whatsnew

Version 0.22.0 * tag 'v0.22.0': (777 commits) RLS: v0.22.0 DOC: Fix min_count docstring (pandas-dev#19005) DOC: More 0.22.0 updates (pandas-dev#19002) TST: Remove pow test in expressions COMPAT: Avoid td.skip decorator DOC: 0.22.0 release docs (pandas-dev#18983) DOC: Include 0.22.0 whatsnew Breaking changes for sum / prod of empty / all-NA (pandas-dev#18921) ENH: Added a min_count keyword to stat funcs (pandas-dev#18876) RLS: v0.21.1 DOC: Add date to whatsnew (pandas-dev#18740) DOC: Include 0.21.1 whatsnew DOC: Update relase notes (pandas-dev#18739) CFG: Ignore W503 DOC: fix options table (pandas-dev#18730) ENH: support non default indexes in writing to Parquet (pandas-dev#18629) BUG: Fix to_latex with longtable (pandas-dev#17959) (pandas-dev#17960) Parquet: Add error message for no engine found (pandas-dev#18717) BUG: Categorical data fails to load from hdf when all columns are NaN (pandas-dev#18652) DOC: clean-up whatsnew file for 0.21.1 (pandas-dev#18690) ...

* releases: (777 commits) RLS: v0.22.0 DOC: Fix min_count docstring (pandas-dev#19005) DOC: More 0.22.0 updates (pandas-dev#19002) TST: Remove pow test in expressions COMPAT: Avoid td.skip decorator DOC: 0.22.0 release docs (pandas-dev#18983) DOC: Include 0.22.0 whatsnew Breaking changes for sum / prod of empty / all-NA (pandas-dev#18921) ENH: Added a min_count keyword to stat funcs (pandas-dev#18876) RLS: v0.21.1 DOC: Add date to whatsnew (pandas-dev#18740) DOC: Include 0.21.1 whatsnew DOC: Update relase notes (pandas-dev#18739) CFG: Ignore W503 DOC: fix options table (pandas-dev#18730) ENH: support non default indexes in writing to Parquet (pandas-dev#18629) BUG: Fix to_latex with longtable (pandas-dev#17959) (pandas-dev#17960) Parquet: Add error message for no engine found (pandas-dev#18717) BUG: Categorical data fails to load from hdf when all columns are NaN (pandas-dev#18652) DOC: clean-up whatsnew file for 0.21.1 (pandas-dev#18690) ...

TomAugspurger added API Design Numeric Operations Arithmetic, Comparison, and Logical operations labels Dec 23, 2017

TomAugspurger added this to the 0.22.0 milestone Dec 23, 2017

TomAugspurger commented Dec 23, 2017

View reviewed changes

TomAugspurger force-pushed the sum-mincount-breaking branch from e390d1c to 8cbd59a Compare December 23, 2017 13:35

jreback requested changes Dec 23, 2017

View reviewed changes

TomAugspurger force-pushed the sum-mincount-breaking branch from 8cbd59a to 1902af6 Compare December 23, 2017 20:22

TomAugspurger force-pushed the sum-mincount-breaking branch from 1902af6 to c213aeb Compare December 23, 2017 20:34

jorisvandenbossche reviewed Dec 26, 2017

View reviewed changes

TomAugspurger force-pushed the sum-mincount-breaking branch 2 times, most recently from 46590dd to d5e475f Compare December 27, 2017 14:52

TomAugspurger commented Dec 27, 2017

View reviewed changes

TomAugspurger force-pushed the sum-mincount-breaking branch 3 times, most recently from a034a70 to b8d1f2b Compare December 27, 2017 15:28

jreback requested changes Dec 27, 2017

View reviewed changes

jreback mentioned this pull request Dec 27, 2017

TST: organize and cleanup pandas/tests/groupby/test_aggregate.py #18931

Merged

TomAugspurger added 6 commits December 28, 2017 13:55

Update whatsnew

a267c2f

Parametrize test

8c06739

Minor cleanups

df7c69a

Refactor skipna_alternative

66a3ab6

Split test

fb5937c

Added issue

4c65c9c

jreback requested changes Dec 28, 2017

View reviewed changes

TomAugspurger added 2 commits December 28, 2017 15:21

More updates

52e4e6f

linting

d6a6c22

jreback reviewed Dec 28, 2017

View reviewed changes

doc/source/whatsnew/v0.22.0.txt

@@ -1,14 +1,213 @@

.. _whatsnew_0220:

Copy link

Contributor

jreback Dec 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't u need this?

jreback approved these changes Dec 28, 2017

View reviewed changes

TomAugspurger added 2 commits December 28, 2017 21:48

linting

fcd57f4

Added skips

05b44f9

TomAugspurger added 2 commits December 29, 2017 06:05

Doc fixup

cdf5692

DOC: More whatsnew

a97e133

TomAugspurger merged commit dedfce9 into pandas-dev:master Dec 29, 2017

TomAugspurger deleted the sum-mincount-breaking branch December 29, 2017 13:05

chris-b1 mentioned this pull request Jan 17, 2018

groupby().first() much slower with a str column present in the data. #19283

Closed


		.. code-block:: ipython

		In [1]: import pandas as pd; import numpy as np;


		s.resample('2d').sum(min_count=1)

		Upsampling in particular is affected, as this will introduce all-NaN series even



		def _na_for_min_count(values, axis):
		# we either return np.nan or pd.NaT

Uh oh!

Breaking changes for sum / prod of empty / all-NA #18921

Breaking changes for sum / prod of empty / all-NA #18921

Uh oh!

Conversation

TomAugspurger commented Dec 23, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomAugspurger commented Dec 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Dec 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on December 29, 2017 at 12:25 Hours UTC

Uh oh!

shoyer commented Dec 23, 2017

Uh oh!

jorisvandenbossche commented Dec 26, 2017

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

TomAugspurger commented Dec 23, 2017 •

edited

Loading

pep8speaks commented Dec 23, 2017 •

edited

Loading