-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: allow groupby (and drop_duplicates) on columns containing unhashable types #41759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the request @ezerkar! I'd be -1 on this - I think this would add considerable complexity for uncertain payoff. This sounds like a workaround better done on the user side - implicitly hashing mutable objects sounds like something which could lead to very confusing behavior. On the user side, you can make sure mutable objects aren't being mutated and breaking hash invariants. |
Thanks for your comment, Thanks |
That label just refers to the fact that this is related to
I think this makes sense given that it is default python behavior, eg
I think hashing mutable objects might surprise users who have a mutable object in their frame by mistake (which a failure would make very clear). If the intent actually is to drop duplicates of something mutable, I think it makes for the user to have to explicitly define how the hashing should be done (eg by converting to string first) |
(As a sidenote, if your use case is specifically in reference to better list support, this probably falls under the category of something which would probably be supported if we ever have a specific ListDType (xref #35176)) |
Thanks for clarifying the label LOL |
Thanks for the report, but agreed I would be -1 due the unexpected behavior as well and best if an external extension array could contain the logic to support this case. Closing since this enhancement request hasn't gotten support from the core team, but happy to reopen if there is revived interest |
Is your feature request related to a problem?
well sort of, currently one can not groupby on a column containing unhashable types (e.g dicts)
Describe the solution you'd like
an easy workaround is to groupby on that column as type str and then remap the strings back to their orig type,
wondering if we can provide this process built in so one can groupby on unhashable types if she desires to
Describe alternatives you've considered
add a try except to allow hash(str(x)) in case hash(x) is impossible, or convert the column to str and add it back later
# Your code here, if applicable
The text was updated successfully, but these errors were encountered: