API: MultiIndex.names|codes|levels returns tuples #57042

mroeschke · 2024-01-23T22:06:14Z

closes API: Change MultiIndex.levels/codes to use tuple instead of FrozenList? #53531 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

WillAyd · 2024-01-30T21:41:09Z

Just playing devil's advocate but why would we prefer tuple here over Index? The latter is what we return from DataFrame.columns which maybe should harmonize with this

WillAyd

in spite of my comments this lgtm

WillAyd · 2024-01-30T21:43:42Z

pandas/core/indexes/multi.py

@@ -2925,7 +2931,9 @@ def _partial_tup_index(self, tup: tuple, side: Literal["left", "right"] = "left"
            if lab not in lev and not isna(lab):
                # short circuit
                try:
-                    loc = algos.searchsorted(lev, lab, side=side)
+                    # Argument 1 to "searchsorted" has incompatible type "Index";


This is surprising to see?

Yeah, I think our typing isn't entirely correct somewhere

mroeschke · 2024-01-30T21:56:17Z

Just playing devil's advocate but why would we prefer tuple here over Index?

I picked tuple to align with the prior FrozenList which seemed to act like an immutable list. If we were to go with Index I think each would have to be `Index[object]

MultiIndex.codes would be an Index of ndarrays
MultiIndex.levels would be an Index of Indexes
MultiIndex.names would be an Index of object/inferred dtype which would be reasonable.

But yeah the biggest thing is that the returned object should not be able to be resized.

rhshadrach · 2024-01-30T22:07:48Z

doc/source/user_guide/groupby.rst

@@ -143,7 +143,7 @@ the columns except the one we specify:
 .. ipython:: python

   df2 = df.set_index(["A", "B"])
-   grouped = df2.groupby(level=df2.index.names.difference(["B"]))
+   grouped = df2.groupby(level="A")


The line leading into this is:

If we also have a MultiIndex on columns A and B, we can group by all
the columns except the one we specify:

If we want to preserve this, one can do [e for e in df2.index.names if e not in {"A"}]. But I'd also be okay to remove the entire example.

I'm worried taking union and difference here might be a common use case that we're breaking. I'm okay with removing the feature, but we could deprecate uses of these methods to make the breaking change a bit more soft. Of course, with our deprecation policy that would mean punting this off for some time.

Fair point. If you feel strongly about it we can go through a deprecation, but the fact that FrozenList was never really publically documented (except indirectly here with this example) makes me OK with breaking this use case.

As a data point, in cudf uses FrozenList in cudf.MultiIndex to match return types but has no usage of FrozenList.union/difference

xref #44823

I do not feel strongly here. Okay to go forward in my opinion.

mroeschke · 2024-02-06T02:38:22Z

Will merge this week unless there's any objections

rhshadrach · 2024-02-06T03:07:17Z

@mroeschke - just the request to fix the docs: #57042 (comment)

mroeschke · 2024-02-06T17:52:30Z

@mroeschke - just the request to fix the docs: #57042 (comment)

Ah thanks for the reminder. Removed

rhshadrach

lgtm

jorisvandenbossche · 2024-02-09T10:47:40Z

This is quite a breaking change .. I know you knew this and therefore kept it for 3.0, but I still do wonder if this is worth the breakage, and if we really want to change it, whether we couldn't first deprecate some aspects of it.

The breaking change I ran into with geopandas, is the fact that a tuple cannot be concatenated like a list. We actually used the same pattern in pandas itself, as the diff in this PR has this change in the groupby code:

-        mi = MultiIndex(levels=levels, codes=codes, names=idx.names + [None])
+        mi = MultiIndex(levels=levels, codes=codes, names=list(idx.names) + [None])

I can imagine quite some external libraries use this when manipulating index / multi-index objects.

We could first deprecate those methods on FrozenList that won't work anymore for tuple (I see @rhshadrach mentioned the same on the issue for the difference/union methods)

* MultiIndex.names|codes|levels returns tuples * Fix typing * Add whatsnew note * Fix stacking * Fix doctest, test * Fix other test * Remove example

mroeschke added 3 commits January 23, 2024 12:47

MultiIndex.names|codes|levels returns tuples

5b791e1

Fix typing

5f87ec6

Add whatsnew note

85ca838

mroeschke added the MultiIndex label Jan 23, 2024

mroeschke requested review from rhshadrach and WillAyd as code owners January 23, 2024 22:06

mroeschke added 5 commits January 24, 2024 13:10

Merge remote-tracking branch 'upstream/main' into ref/mi/tuples

0dab99a

Fix stacking

d73ded2

Merge remote-tracking branch 'upstream/main' into ref/mi/tuples

31e0153

Fix doctest, test

8a36d34

Fix other test

7c511ce

mroeschke added this to the 3.0 milestone Jan 26, 2024

mroeschke added 2 commits January 26, 2024 14:37

Merge remote-tracking branch 'upstream/main' into ref/mi/tuples

87e173c

Merge remote-tracking branch 'upstream/main' into ref/mi/tuples

e1c3668

WillAyd approved these changes Jan 30, 2024

View reviewed changes

rhshadrach requested changes Jan 30, 2024

View reviewed changes

Merge remote-tracking branch 'upstream/main' into ref/mi/tuples

594a7a3

mroeschke added 2 commits February 6, 2024 09:50

Merge remote-tracking branch 'upstream/main' into ref/mi/tuples

a4df4c5

Remove example

c996730

rhshadrach approved these changes Feb 7, 2024

View reviewed changes

mroeschke merged commit 99e3afe into pandas-dev:main Feb 7, 2024
47 checks passed

mroeschke deleted the ref/mi/tuples branch February 7, 2024 04:52

jorisvandenbossche mentioned this pull request Feb 10, 2024

COMPAT: fix _get_index_for_parts (explode, etc) for future pandas geopandas/geopandas#3183

Merged

rhshadrach mentioned this pull request Mar 9, 2024

API: Revert 57042 - MultiIndex.names|codes|levels returns tuples #57788

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: MultiIndex.names|codes|levels returns tuples #57042

API: MultiIndex.names|codes|levels returns tuples #57042

mroeschke commented Jan 23, 2024

WillAyd commented Jan 30, 2024

WillAyd left a comment

WillAyd Jan 30, 2024

mroeschke Feb 6, 2024

mroeschke commented Jan 30, 2024 •

edited

Loading

rhshadrach Jan 30, 2024

rhshadrach Jan 30, 2024

mroeschke Jan 30, 2024 •

edited

Loading

rhshadrach Feb 1, 2024

mroeschke commented Feb 6, 2024

rhshadrach commented Feb 6, 2024

mroeschke commented Feb 6, 2024

rhshadrach left a comment

jorisvandenbossche commented Feb 9, 2024

API: MultiIndex.names|codes|levels returns tuples #57042

API: MultiIndex.names|codes|levels returns tuples #57042

Conversation

mroeschke commented Jan 23, 2024

WillAyd commented Jan 30, 2024

WillAyd left a comment

Choose a reason for hiding this comment

WillAyd Jan 30, 2024

Choose a reason for hiding this comment

mroeschke Feb 6, 2024

Choose a reason for hiding this comment

mroeschke commented Jan 30, 2024 • edited Loading

rhshadrach Jan 30, 2024

Choose a reason for hiding this comment

rhshadrach Jan 30, 2024

Choose a reason for hiding this comment

mroeschke Jan 30, 2024 • edited Loading

Choose a reason for hiding this comment

rhshadrach Feb 1, 2024

Choose a reason for hiding this comment

mroeschke commented Feb 6, 2024

rhshadrach commented Feb 6, 2024

mroeschke commented Feb 6, 2024

rhshadrach left a comment

Choose a reason for hiding this comment

jorisvandenbossche commented Feb 9, 2024

mroeschke commented Jan 30, 2024 •

edited

Loading

mroeschke Jan 30, 2024 •

edited

Loading