BUG: Attributes are lost when subsetting columns in groupby #35444

rhshadrach · 2020-07-29T00:46:49Z

closes columns selection after groupby reset group_keys to True #9959
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Avoiding the behavior of Series here, e.g. df.groupby('a')['b'] as I think that will involve some API changes. Will follow up: ref #35443

simonjayhawkins · 2020-08-01T16:38:19Z

@rhshadrach test is failing on Linux py37_np_dev

=========================== short test summary info ============================
FAILED pandas/tests/groupby/test_groupby.py::test_subsetting_columns_keeps_attrs[squeeze-True]
=== 1 failed, 70023 passed, 4046 skipped, 1022 xfailed in 541.95s (0:09:01) ====

pep8speaks · 2020-08-01T21:16:44Z

Hello @rhshadrach! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-08-31 20:16:14 UTC

rhshadrach · 2020-08-01T22:11:04Z

Thanks @simonjayhawkins, passing now.

…o group_keys

jreback · 2020-08-03T23:35:48Z

does this also close: #35014 ?

arw2019 · 2020-08-04T01:35:06Z

does this also close: #35014 ?

@jreback I ran the OP in 35014 on this branch, the error persists

jreback · 2020-08-06T23:32:03Z

@rhshadrach can you merge master and ping on green.

rhshadrach · 2020-08-13T23:24:42Z

@jreback master has been merged, failures are unrelated

simonjayhawkins · 2020-08-14T10:34:56Z

failures are unrelated

restarted


C:\Miniconda\envs\pandas-dev\lib\site-packages\hypothesis\core.py:637: in execute_once
    self.__flaky(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <hypothesis.core.StateForActualGivenExecution object at 0x00000234496FE130>
message = 'Hypothesis test_shift_across_dst(offset=<0 * MonthBegins>) produces unreliable results: Falsified on the first call but did not on a subsequent one'

    def __flaky(self, message):
        if len(self.falsifying_examples) <= 1:
>           raise Flaky(message)
E           hypothesis.errors.Flaky: Hypothesis test_shift_across_dst(offset=<0 * MonthBegins>) produces unreliable results: Falsified on the first call but did not on a subsequent one

C:\Miniconda\envs\pandas-dev\lib\site-packages\hypothesis\core.py:847: Flaky
--------------------------------- Hypothesis ----------------------------------
Falsifying example: test_shift_across_dst(
    offset=<0 * MonthBegins>,
)
Unreliable test timings! On an initial run, this test took 884.81ms, which exceeded the deadline of 500.00ms, but on a subsequent run it took 1.18 ms, which did not. If you expect this sort of variability in your test timings, consider turning deadlines off for this test by setting deadline=None.

You can reproduce this example by temporarily adding @reproduce_failure('5.24.3', b'AAAAAAA=') as a decorator on your test case

jreback · 2020-08-14T21:54:15Z

wait, what happend to the tests?

rhshadrach · 2020-08-16T15:24:55Z

@arw2019 @jreback: I added all attributes to the ndim=1 case except as_index and axis as these might be considered an API change. Failures are unrelated.

…oup_keys � Conflicts: � doc/source/whatsnew/v1.2.0.rst

jreback · 2020-08-21T22:03:21Z

pandas/tests/groupby/test_groupby.py

+    result = expected[["b"]]
+    assert getattr(result, attr) == getattr(expected, attr)
+
+    if attr in ("axis", "as_index"):


can you xfail these in the parameter list instead

jreback · 2020-08-21T22:04:15Z

also if you can add a test that replicates the original issue (unless your new ones cover)

rhshadrach · 2020-08-22T18:02:12Z

@jreback test is now xfail instead of skip; original issue is covered by the added test when attr="group_keys". Failure is unrelated (due to pyarrow).

rhshadrach · 2020-08-22T18:18:33Z

@jreback should have mentioned - if the parameters are labeled xfail, then we wouldn't be testing their behavior for DataFrameGroupBy (which I think we want to). So I've left it as an if statement after the frame case and before the series. Does that work?

pandas/tests/groupby/test_groupby.py

rhshadrach · 2020-08-22T22:41:24Z

I've updated the tests with @arw2019's suggestion. To my surprise, the line

df.groupby('a', as_index=False)['b']

results in a DataFrameGroupBy rather than a SeriesGroupBy, and thus would successfully pass the test. I'm still of the opinion that this should instead raise (ref: #35443), but perhaps others disagree. I've marked it as xfail with strict=False, but perhaps it shouldn't be marked xfail at all? Any suggestions here are much appreciated.

Update: After rethinking, this test shouldn't be xfailed.

…oup_keys � Conflicts: � doc/source/whatsnew/v1.2.0.rst

…oup_keys

jreback · 2020-08-31T22:31:35Z

thanks @rhshadrach

good approach about separating something potentially controversial to another issue.

…ev#35444)

rhshadrach changed the title ~~BUG: Attributes are lost when subsetting columns~~ BUG: Attributes are lost when subsetting columns in groupby Jul 29, 2020

rhshadrach added Groupby Bug labels Jul 29, 2020

rhshadrach added 2 commits August 1, 2020 17:23

BUG: Attributes are lost when subsetting columns

493b0d2

Whatsnew

b4e0139

rhshadrach force-pushed the group_keys branch from 935a325 to b4e0139 Compare August 1, 2020 21:26

Merge branch 'master' into group_keys

3444eb2

rhshadrach added this to the 1.2 milestone Aug 1, 2020

rhshadrach added 2 commits August 1, 2020 18:12

Fixed whatsnew

e6099b7

Merge branch 'group_keys' of https://github.com/rhshadrach/pandas int…

d048a40

…o group_keys

rhshadrach force-pushed the group_keys branch from 3d0f2eb to d048a40 Compare August 2, 2020 14:58

jreback mentioned this pull request Aug 3, 2020

BUG: DataFrameGroupBy.__getitem__ fails to propagate dropna #35078

Merged

5 tasks

rhshadrach and others added 3 commits August 13, 2020 14:33

Merge remote-tracking branch 'upstream/master' into group_keys

5f223af

Merge branch 'master' into group_keys

4b16e07

Removed trailing whitespace

0d05b4b

Added tests back

35f03b5

This was referenced Aug 15, 2020

BUG: slicing DataFrameGroupBy to SeriesGroupBy doesn't propagate dropna #35745

Closed

BUG: DataFrame.groupby(., dropna=True, axis=0) incorrectly throws ShapeError #35751

Merged

Added propagating attributes in the ndim=1 case

161bbf0

Merge branch 'master' of https://github.com/pandas-dev/pandas into gr…

ee46137

…oup_keys � Conflicts: � doc/source/whatsnew/v1.2.0.rst

jreback requested changes Aug 21, 2020

View reviewed changes

Changed some tests from skip to xfail

fdbd5c2

arw2019 reviewed Aug 22, 2020

View reviewed changes

pandas/tests/groupby/test_groupby.py Outdated Show resolved Hide resolved

pandas/tests/groupby/test_groupby.py Outdated Show resolved Hide resolved

Reparametrized tests

216e11b

rhshadrach added 4 commits August 22, 2020 18:56

black

e51e730

Merge branch 'master' of https://github.com/pandas-dev/pandas into gr…

2d42774

…oup_keys � Conflicts: � doc/source/whatsnew/v1.2.0.rst

Merge branch 'master' of https://github.com/pandas-dev/pandas into gr…

28db1ff

…oup_keys

Removed xfail for as_index case

c4515b3

rhshadrach requested a review from jreback August 31, 2020 21:27

jreback approved these changes Aug 31, 2020

View reviewed changes

jreback merged commit bcb9e1b into pandas-dev:master Aug 31, 2020

rhshadrach deleted the group_keys branch August 31, 2020 22:37

jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this pull request Aug 31, 2020

BUG: Attributes are lost when subsetting columns in groupby (pandas-d…

a00c882

…ev#35444)

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020

BUG: Attributes are lost when subsetting columns in groupby (pandas-d…

5d5bb52

…ev#35444)

Uh oh!

BUG: Attributes are lost when subsetting columns in groupby #35444

BUG: Attributes are lost when subsetting columns in groupby #35444

Uh oh!

Conversation

rhshadrach commented Jul 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonjayhawkins commented Aug 1, 2020

Uh oh!

pep8speaks commented Aug 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2020-08-31 20:16:14 UTC

Uh oh!

rhshadrach commented Aug 1, 2020

Uh oh!

jreback commented Aug 3, 2020

Uh oh!

arw2019 commented Aug 4, 2020

Uh oh!

jreback commented Aug 6, 2020

Uh oh!

rhshadrach commented Aug 13, 2020

Uh oh!

simonjayhawkins commented Aug 14, 2020

Uh oh!

jreback commented Aug 14, 2020

Uh oh!

rhshadrach commented Aug 16, 2020

Uh oh!

jreback Aug 21, 2020

Choose a reason for hiding this comment

Uh oh!

jreback commented Aug 21, 2020

Uh oh!

rhshadrach commented Aug 22, 2020

Uh oh!

rhshadrach commented Aug 22, 2020

Uh oh!

Uh oh!

Uh oh!

rhshadrach commented Aug 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback commented Aug 31, 2020

Uh oh!

Uh oh!

rhshadrach commented Jul 29, 2020 •

edited

Loading

pep8speaks commented Aug 1, 2020 •

edited

Loading

rhshadrach commented Aug 22, 2020 •

edited

Loading