-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: MultiIndex.names|codes|levels returns tuples #57042
Conversation
Just playing devil's advocate but why would we prefer tuple here over |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in spite of my comments this lgtm
@@ -2925,7 +2931,9 @@ def _partial_tup_index(self, tup: tuple, side: Literal["left", "right"] = "left" | |||
if lab not in lev and not isna(lab): | |||
# short circuit | |||
try: | |||
loc = algos.searchsorted(lev, lab, side=side) | |||
# Argument 1 to "searchsorted" has incompatible type "Index"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is surprising to see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think our typing isn't entirely correct somewhere
I picked tuple to align with the prior
But yeah the biggest thing is that the returned object should not be able to be resized. |
doc/source/user_guide/groupby.rst
Outdated
@@ -143,7 +143,7 @@ the columns except the one we specify: | |||
.. ipython:: python | |||
|
|||
df2 = df.set_index(["A", "B"]) | |||
grouped = df2.groupby(level=df2.index.names.difference(["B"])) | |||
grouped = df2.groupby(level="A") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line leading into this is:
If we also have a MultiIndex on columns
A
andB
, we can group by all
the columns except the one we specify:
If we want to preserve this, one can do [e for e in df2.index.names if e not in {"A"}]
. But I'd also be okay to remove the entire example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried taking union and difference here might be a common use case that we're breaking. I'm okay with removing the feature, but we could deprecate uses of these methods to make the breaking change a bit more soft. Of course, with our deprecation policy that would mean punting this off for some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. If you feel strongly about it we can go through a deprecation, but the fact that FrozenList was never really publically documented (except indirectly here with this example) makes me OK with breaking this use case.
As a data point, in cudf uses FrozenList
in cudf.MultiIndex
to match return types but has no usage of FrozenList.union/difference
xref #44823
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not feel strongly here. Okay to go forward in my opinion.
Will merge this week unless there's any objections |
@mroeschke - just the request to fix the docs: #57042 (comment) |
Ah thanks for the reminder. Removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
This is quite a breaking change .. I know you knew this and therefore kept it for 3.0, but I still do wonder if this is worth the breakage, and if we really want to change it, whether we couldn't first deprecate some aspects of it. The breaking change I ran into with geopandas, is the fact that a tuple cannot be concatenated like a list. We actually used the same pattern in pandas itself, as the diff in this PR has this change in the groupby code: - mi = MultiIndex(levels=levels, codes=codes, names=idx.names + [None])
+ mi = MultiIndex(levels=levels, codes=codes, names=list(idx.names) + [None]) I can imagine quite some external libraries use this when manipulating index / multi-index objects. We could first deprecate those methods on FrozenList that won't work anymore for tuple (I see @rhshadrach mentioned the same on the issue for the |
* MultiIndex.names|codes|levels returns tuples * Fix typing * Add whatsnew note * Fix stacking * Fix doctest, test * Fix other test * Remove example
tuple
instead ofFrozenList
? #53531 (Replace xxxx with the GitHub issue number)doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.