BUG: preserve categorical & sparse types when grouping / pivot #27071

jreback · 2019-06-27T02:14:22Z

pandas/tests/groupby/test_function.py

pandas/core/groupby/groupby.py

pandas/tests/groupby/test_function.py

jreback · 2019-06-27T03:21:54Z

that test is not appropriate for checking the results of this change as most of those ops don't work on ordered categoricals; i have covered the most common of first/last/min/max above.

codecov · 2019-06-27T03:24:38Z

Codecov Report

Merging #27071 into master will decrease coverage by 1.38%.
The diff coverage is 93.1%.

@@            Coverage Diff             @@
##           master   #27071      +/-   ##
==========================================
- Coverage   92.04%   90.66%   -1.39%     
==========================================
  Files         180      180              
  Lines       50714    50727      +13     
==========================================
- Hits        46680    45991     -689     
- Misses       4034     4736     +702

Flag	Coverage Δ
#multiple	`90.66% <93.1%> (-0.02%)`	⬇️
#single	`?`

Impacted Files	Coverage Δ
pandas/core/groupby/generic.py	`88.48% <100%> (-0.86%)`	⬇️
pandas/core/groupby/ops.py	`96% <100%> (ø)`	⬆️
pandas/core/nanops.py	`94.76% <100%> (ø)`	⬆️
pandas/core/groupby/groupby.py	`97.32% <100%> (+0.15%)`	⬆️
pandas/core/internals/construction.py	`96.21% <100%> (+0.25%)`	⬆️
pandas/core/internals/blocks.py	`94.95% <71.42%> (-0.19%)`	⬇️
pandas/core/computation/pytables.py	`62.5% <0%> (-27.75%)`	⬇️
pandas/io/pytables.py	`64.86% <0%> (-25.44%)`	⬇️
pandas/io/gbq.py	`88.88% <0%> (-11.12%)`	⬇️
pandas/core/computation/common.py	`84.21% <0%> (-5.27%)`	⬇️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d94146c...424c466. Read the comment docs.

codecov · 2019-06-27T03:24:39Z

Codecov Report

Merging #27071 into master will increase coverage by 50.06%.
The diff coverage is 93.1%.

@@             Coverage Diff             @@
##           master   #27071       +/-   ##
===========================================
+ Coverage   41.96%   92.02%   +50.06%     
===========================================
  Files         180      180               
  Lines       50707    50727       +20     
===========================================
+ Hits        21277    46681    +25404     
+ Misses      29430     4046    -25384

Flag	Coverage Δ
#multiple	`90.66% <93.1%> (?)`
#single	`41.85% <24.13%> (-0.11%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/groupby/generic.py	`88.48% <100%> (+73.66%)`	⬆️
pandas/core/groupby/ops.py	`96% <100%> (+76.23%)`	⬆️
pandas/core/nanops.py	`94.76% <100%> (+63.17%)`	⬆️
pandas/core/groupby/groupby.py	`97.32% <100%> (+73.45%)`	⬆️
pandas/core/internals/construction.py	`96.21% <100%> (+31.81%)`	⬆️
pandas/core/internals/blocks.py	`94.95% <71.42%> (+41.89%)`	⬆️
pandas/core/computation/pytables.py	`90.24% <0%> (+0.3%)`	⬆️
pandas/io/pytables.py	`90.3% <0%> (+0.96%)`	⬆️
pandas/core/panel.py	`17.8% <0%> (+1.7%)`	⬆️
pandas/util/_test_decorators.py	`93.84% <0%> (+4.61%)`	⬆️
... and 138 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c1673cf...9b8f2b4. Read the comment docs.

doc/source/whatsnew/v0.25.0.rst

TomAugspurger · 2019-06-27T15:45:29Z

pandas/core/internals/blocks.py

+        try:
+
+            result = self._holder._from_sequence(
+                np.asarray(result).ravel(), dtype=dtype)


I'm a bit concerned by the asarray here. Is that just so we can do the .ravel?

Consider a silly example like

df.groupby('key').apply(lambda x: x.array)

Will that end up hitting this, and so calling asarray and converting to ndarray?

does this make sense?

df.groupby('A').B.apply(lambda x: x.array)

(Pdb) p df A B 0 1 2000-01-01 18:00:00-06:00 1 1 2000-01-01 18:00:00-06:00 2 2 NaT 3 2 NaT 4 3 1999-12-31 18:00:00-06:00 5 3 1999-12-31 18:00:00-06:00 6 1 2000-01-01 18:00:00-06:00 7 4 2000-01-02 18:00:00-06:00 (Pdb) p result A 1 [2000-01-01 18:00:00-06:00, 2000-01-01 18:00:0... 2 [NaT, NaT] 3 [1999-12-31 18:00:00-06:00, 1999-12-31 18:00:0... 4 [2000-01-02 18:00:00-06:00] Name: B, dtype: object

pandas/core/groupby/generic.py

pandas/tests/sparse/test_pivot.py

jreback · 2019-06-27T21:31:06Z

@WillAyd @jbrockmendel ?

jbrockmendel · 2019-06-27T21:37:41Z

Does this also preserve the dtypes under transpose?

WillAyd

lgtm

pandas/tests/extension/decimal/test_decimal.py

jreback · 2019-06-27T21:47:25Z

@jbrockmendel

Does this also preserve the dtypes under transpose?

no that's a more general issues

jreback · 2019-06-27T23:10:14Z

home/vsts/work/1/s/doc/source/whatsnew/v0.25.0.rst:856: WARNING: Unknown interpreted text role "method".

@jorisvandenbossche any idea what this means?

closes pandas-dev#18502

jbrockmendel · 2020-12-04T04:02:28Z

pandas/tests/resample/test_datetime_index.py

+    result = ts.resample('3T').mean()
+    expected = Series([1, 4, 7],
+                      index=pd.date_range('1/1/2000', periods=3, freq='3T'),
+                      dtype='Int64')


@jreback @jorisvandenbossche why returning Int64 here? I would expect float64 or Float64.

e.g. if we do ts[-1] += 1 before the resample, the mean comes back as float64.

Yes, this should be Float64, because it is only accidentally that the results are all integer-like.
This is one of the cases that I listed in #37494

jreback added Groupby Compat pandas objects compatability with Numpy or Python functions Categorical Categorical Data Type labels Jun 27, 2019

jreback added this to the 0.25.0 milestone Jun 27, 2019

WillAyd reviewed Jun 27, 2019

View reviewed changes

pandas/tests/groupby/test_function.py Show resolved Hide resolved

pandas/core/groupby/groupby.py Show resolved Hide resolved

WillAyd requested changes Jun 27, 2019

View reviewed changes

pandas/tests/groupby/test_function.py Show resolved Hide resolved

TomAugspurger reviewed Jun 27, 2019

View reviewed changes

jreback mentioned this pull request Jun 27, 2019

DOC/TST: provide documentation & testing on groupby filtering & aggregation ops for EA #27078

Closed

jreback force-pushed the groupby_dtypes branch 2 times, most recently from 9b8f2b4 to 25e8d1b Compare June 27, 2019 18:24

jbrockmendel reviewed Jun 27, 2019

View reviewed changes

pandas/core/groupby/generic.py Outdated Show resolved Hide resolved

jbrockmendel reviewed Jun 27, 2019

View reviewed changes

pandas/tests/sparse/test_pivot.py Outdated Show resolved Hide resolved

jreback force-pushed the groupby_dtypes branch from 25e8d1b to 9b551dd Compare June 27, 2019 20:17

WillAyd approved these changes Jun 27, 2019

View reviewed changes

pandas/tests/extension/decimal/test_decimal.py Outdated Show resolved Hide resolved

jreback force-pushed the groupby_dtypes branch from 9b551dd to ac848c5 Compare June 27, 2019 21:49

jreback added 8 commits June 27, 2019 18:12

BUG: preserve categorical & sparse types when grouping / pivot

cdd78db

closes pandas-dev#18502

typo

6751d0b

moar tests

b8be789

use a fixed random seed

31d4635

xfail on np 1.17

ea98679

lint

7ab00fa

groupby tests

bad7553

use strict=False

41e11e1

jreback added 3 commits June 27, 2019 18:12

review comments

ccfcca0

typo

48e7c32

fix doc warning on master

3a6a0c0

jreback force-pushed the groupby_dtypes branch from ac848c5 to 3a6a0c0 Compare June 27, 2019 23:13

jreback merged commit ce86c21 into pandas-dev:master Jun 27, 2019

jbrockmendel reviewed Dec 4, 2020

View reviewed changes

jorisvandenbossche mentioned this pull request Dec 5, 2020

API: add EA._from_scalars / stricter casting of result values back to EA dtype #38315

Closed

2 tasks

Uh oh!

BUG: preserve categorical & sparse types when grouping / pivot #27071

BUG: preserve categorical & sparse types when grouping / pivot #27071

Uh oh!

Conversation

jreback commented Jun 27, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jreback commented Jun 27, 2019

Uh oh!

codecov bot commented Jun 27, 2019

Codecov Report

Uh oh!

codecov bot commented Jun 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

TomAugspurger Jun 27, 2019

Choose a reason for hiding this comment

Uh oh!

jreback Jun 27, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jreback commented Jun 27, 2019

Uh oh!

jbrockmendel commented Jun 27, 2019

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jreback commented Jun 27, 2019

Uh oh!

jreback commented Jun 27, 2019

Uh oh!

jbrockmendel Dec 4, 2020

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Dec 4, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Jun 27, 2019 •

edited

Loading