Skip to content

ENH: groupby.apply for Categorical should preserve categories (closes… #10142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 4, 2015

Conversation

mortada
Copy link
Contributor

@mortada mortada commented May 15, 2015

closes #10138

@shoyer
Copy link
Member

shoyer commented May 15, 2015

Does this change anything for groupby.apply in the transform case, rather than the aggregate case? It would be nice to add a test for transform-like apply with categoricals if we don't have one already.

@mortada
Copy link
Contributor Author

mortada commented May 16, 2015

Oh good point ... the following actually doesn't reindex:

In [12]: grouped.apply(lambda x : 'foo')
Out[12]:
missing  dense
a        a        foo
         b        foo
         c        foo
dtype: object

I'll dig into the code path for this

@shoyer
Copy link
Member

shoyer commented May 16, 2015

@mortada I actually think not reindexing is the right behavior here, for transforming grouped operators. The result index should be identical to the input index.

@jreback jreback added this to the 0.17.0 milestone May 16, 2015
@jreback
Copy link
Contributor

jreback commented May 16, 2015

@mortada I think @shoyer is correct; transform returns the original index

@mortada
Copy link
Contributor Author

mortada commented May 16, 2015

As cool thanks guys, indeed I see that in the docs about transform.

I'll add a test case for this behavior

@mortada mortada force-pushed the groupby_apply_cat branch from d8b49d5 to bef6932 Compare May 16, 2015 17:56
@mortada
Copy link
Contributor Author

mortada commented May 16, 2015

@jreback @shoyer it's updated

@shoyer
Copy link
Member

shoyer commented May 16, 2015

OK, looks good to me. Can you add a release note?

@mortada mortada force-pushed the groupby_apply_cat branch from bef6932 to aeef3c2 Compare May 16, 2015 23:47
@mortada
Copy link
Contributor Author

mortada commented May 16, 2015

ah absolutely. I'm not quite sure what section this should go, I just added it in "other enhancements"

@shoyer
Copy link
Member

shoyer commented May 16, 2015

I'd probably call this a bug fix?

@mortada
Copy link
Contributor Author

mortada commented May 16, 2015

yeah I was actually thinking the same thing, but the original issue was created as an enhancement

@@ -26,6 +26,8 @@ New features
Other enhancements
^^^^^^^^^^^^^^^^^^

- groupby.apply aggregation for Categorical now preserves categories (:issue:`10138`)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeh, why don't you move to bug fixes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure will do

@mortada mortada force-pushed the groupby_apply_cat branch from aeef3c2 to 659bbec Compare May 21, 2015 17:27
@@ -2596,6 +2596,34 @@ def get_stats(group):
result = self.df.groupby(cats).D.apply(get_stats)
self.assertEqual(result.index.names[0], 'C')

def test_apply_categorical_data(self):
# GH 10138
dense = Categorical(list('abc'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an ordered=False, can you add a test for ordered=True as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I'll change this to test both cases

@mortada mortada force-pushed the groupby_apply_cat branch from 659bbec to eb33cf2 Compare June 2, 2015 16:38
@@ -68,6 +68,7 @@ Bug Fixes


- Bug in ``mean()`` where integer dtypes can overflow (:issue:`10172`)
- Bug in groupby.apply aggregation for Categorical not preserving categories (:issue:`10138`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to 0.16.2

@jreback jreback modified the milestones: 0.16.2, 0.17.0 Jun 2, 2015
@mortada mortada force-pushed the groupby_apply_cat branch from eb33cf2 to c8bf1c4 Compare June 2, 2015 21:54
@mortada
Copy link
Contributor Author

mortada commented Jun 3, 2015

@jreback moved this to v0.16.2, ready for review

jreback added a commit that referenced this pull request Jun 4, 2015
ENH: groupby.apply for Categorical should preserve categories (closes…
@jreback jreback merged commit 1f797c6 into pandas-dev:master Jun 4, 2015
@jreback
Copy link
Contributor

jreback commented Jun 4, 2015

@mortada gr8 thanks!

@mortada mortada deleted the groupby_apply_cat branch June 4, 2015 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: groupby.apply for Categorical groupers should preserve categories (like .agg)
3 participants