Skip to content

ENH: Categorical.unique can keep same dtype #38135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Nov 28, 2020

There are situations where we want to keep the same dtype as in the original after applying unique For example

>>> dtype = pd.categoricalDtype(['very good', 'good', 'neutral', 'bad', 'very bad'], ordered=True)
>>> cat = pd.Categorical(['good','good', 'bad', 'bad'], dtype=dtype)
>>> cat
[good, good, bad, bad]
Categories (5, object): [very good < good < neutral < bad < very bad]
>>> cat.unique().dtype == cat.dtype
False  # this is a bug IMO, but others may not agree

Even if it's not a bug, there are situations where we want the comparison above to return True. To alleviate the above, I've added a new parameter, so we can do

>>> cat.unique(remove_unused_categories=False).dtype == cat.dtype
True

Helps #18291, but does not close the issue.

@@ -2035,16 +2035,24 @@ def mode(self, dropna=True):
# ------------------------------------------------------------------
# ExtensionArray Interface

def unique(self):
def unique(self, remove_unused_categories: bool = True) -> "Categorical":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id rather just change/fix the behavior than add a new keyword

Copy link
Contributor Author

@topper-123 topper-123 Nov 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I´d be +1 on that, but I´ll wait on input from others. One consideration is that groupby with categoricals uses the current behaviour. Changing groupby behaviour without deprecation is not a good idea, IMO.

I thinik it would be reasonable to change the behaviour of Categorical.unique, but keep the current behaviour of groupbys. Then we can seperately discuss how the groupby behaviour can be properly deprecated and changed.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed this is very tricky and adding a parameter is just api bloat

@topper-123 topper-123 closed this Nov 28, 2020
@topper-123 topper-123 deleted the Categorical.unique_keep_dtype branch November 28, 2020 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants