Skip to content

BUG: Attributes are lost when subsetting columns in groupby #35444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Aug 31, 2020

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented Jul 29, 2020

Avoiding the behavior of Series here, e.g. df.groupby('a')['b'] as I think that will involve some API changes. Will follow up: ref #35443

@rhshadrach rhshadrach changed the title BUG: Attributes are lost when subsetting columns BUG: Attributes are lost when subsetting columns in groupby Jul 29, 2020
@simonjayhawkins
Copy link
Member

@rhshadrach test is failing on Linux py37_np_dev

=========================== short test summary info ============================
FAILED pandas/tests/groupby/test_groupby.py::test_subsetting_columns_keeps_attrs[squeeze-True]
=== 1 failed, 70023 passed, 4046 skipped, 1022 xfailed in 541.95s (0:09:01) ====

@pep8speaks
Copy link

pep8speaks commented Aug 1, 2020

Hello @rhshadrach! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-08-31 20:16:14 UTC

@rhshadrach
Copy link
Member Author

Thanks @simonjayhawkins, passing now.

@rhshadrach rhshadrach added this to the 1.2 milestone Aug 1, 2020
@jreback
Copy link
Contributor

jreback commented Aug 3, 2020

does this also close: #35014 ?

@arw2019
Copy link
Member

arw2019 commented Aug 4, 2020

does this also close: #35014 ?

@jreback I ran the OP in 35014 on this branch, the error persists

@jreback
Copy link
Contributor

jreback commented Aug 6, 2020

@rhshadrach can you merge master and ping on green.

@rhshadrach
Copy link
Member Author

@jreback master has been merged, failures are unrelated

@simonjayhawkins
Copy link
Member

failures are unrelated

restarted


C:\Miniconda\envs\pandas-dev\lib\site-packages\hypothesis\core.py:637: in execute_once
    self.__flaky(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <hypothesis.core.StateForActualGivenExecution object at 0x00000234496FE130>
message = 'Hypothesis test_shift_across_dst(offset=<0 * MonthBegins>) produces unreliable results: Falsified on the first call but did not on a subsequent one'

    def __flaky(self, message):
        if len(self.falsifying_examples) <= 1:
>           raise Flaky(message)
E           hypothesis.errors.Flaky: Hypothesis test_shift_across_dst(offset=<0 * MonthBegins>) produces unreliable results: Falsified on the first call but did not on a subsequent one

C:\Miniconda\envs\pandas-dev\lib\site-packages\hypothesis\core.py:847: Flaky
--------------------------------- Hypothesis ----------------------------------
Falsifying example: test_shift_across_dst(
    offset=<0 * MonthBegins>,
)
Unreliable test timings! On an initial run, this test took 884.81ms, which exceeded the deadline of 500.00ms, but on a subsequent run it took 1.18 ms, which did not. If you expect this sort of variability in your test timings, consider turning deadlines off for this test by setting deadline=None.

You can reproduce this example by temporarily adding @reproduce_failure('5.24.3', b'AAAAAAA=') as a decorator on your test case

@jreback
Copy link
Contributor

jreback commented Aug 14, 2020

wait, what happend to the tests?

@rhshadrach
Copy link
Member Author

@arw2019 @jreback: I added all attributes to the ndim=1 case except as_index and axis as these might be considered an API change. Failures are unrelated.

…oup_keys

� Conflicts:
�	doc/source/whatsnew/v1.2.0.rst
result = expected[["b"]]
assert getattr(result, attr) == getattr(expected, attr)

if attr in ("axis", "as_index"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you xfail these in the parameter list instead

@jreback
Copy link
Contributor

jreback commented Aug 21, 2020

also if you can add a test that replicates the original issue (unless your new ones cover)

@rhshadrach
Copy link
Member Author

@jreback test is now xfail instead of skip; original issue is covered by the added test when attr="group_keys". Failure is unrelated (due to pyarrow).

@rhshadrach
Copy link
Member Author

@jreback should have mentioned - if the parameters are labeled xfail, then we wouldn't be testing their behavior for DataFrameGroupBy (which I think we want to). So I've left it as an if statement after the frame case and before the series. Does that work?

@rhshadrach
Copy link
Member Author

rhshadrach commented Aug 22, 2020

I've updated the tests with @arw2019's suggestion. To my surprise, the line

df.groupby('a', as_index=False)['b']

results in a DataFrameGroupBy rather than a SeriesGroupBy, and thus would successfully pass the test. I'm still of the opinion that this should instead raise (ref: #35443), but perhaps others disagree. I've marked it as xfail with strict=False, but perhaps it shouldn't be marked xfail at all? Any suggestions here are much appreciated.

Update: After rethinking, this test shouldn't be xfailed.

@rhshadrach rhshadrach requested a review from jreback August 31, 2020 21:27
@jreback jreback merged commit bcb9e1b into pandas-dev:master Aug 31, 2020
@jreback
Copy link
Contributor

jreback commented Aug 31, 2020

thanks @rhshadrach

good approach about separating something potentially controversial to another issue.

@rhshadrach rhshadrach deleted the group_keys branch August 31, 2020 22:37
jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this pull request Aug 31, 2020
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

columns selection after groupby reset group_keys to True
5 participants