Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Change MultiIndex.levels/codes to use tuple instead of FrozenList? #53531

Closed
mroeschke opened this issue Jun 5, 2023 · 11 comments · Fixed by #57042
Closed

API: Change MultiIndex.levels/codes to use tuple instead of FrozenList? #53531

mroeschke opened this issue Jun 5, 2023 · 11 comments · Fixed by #57042

Comments

@mroeschke
Copy link
Member

mroeschke commented Jun 5, 2023

The doc of FrozenList says

    Container that doesn't allow setting item *but*
    because it's technically hashable, will be used
    for lookups, appropriately, etc.

which a tuple satisfies. Not sure why a tuple wasn't use originally but it may be simpler to use them instead of a custom FrozenList that acts like a tuple.

@srkds
Copy link
Contributor

srkds commented Jun 6, 2023

Hi,

While reading about Frozenlist I found a StackOverflow conversation (Link) and an answer from @jreback regarding the same.

The point of it is to prevent modification of these thru attributes and force the use of methods (e.g. set_levels()).

And If planning to change to tuple is it the correct way?

- def levels(self) -> FrozenList:
+ def levels(self) -> tuple:
        # Use cache_readonly to ensure that self.get_locs doesn't repeatedly
        # create new IndexEngine
        # https://github.com/pandas-dev/pandas/issues/31648
        result = [x._rename(name=name) for x, name in zip(self._levels, self._names)]
        for level in result:
            # disallow midx.levels[0].name = "foo"
            level._no_setting_name = True
-        return FrozenList(result)
+        return tuple(result)

expected output

>>> import pandas as pd
>>> import numpy as np
>>> arrays = [
...     np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
...     np.array(["one", "two", "one", "two", "one", "two", "one", "two"]),
... ]
>>> tuples = list(zip(*arrays))
>>> index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])
>>> df = pd.DataFrame(np.random.randn(3, 8), index=["A", "B", "C"], columns=index)
>>> df.columns.levels
(Index(['bar', 'baz', 'foo', 'qux'], dtype='object', name='first'), Index(['one', 'two'], dtype='object', name='second'))
>>>

@mroeschke
Copy link
Member Author

Nice find @srkds!

I'm not sure if any of those reason are still entirely applicable at this point. I would say if the custom FrozenList methods are not used extensively (e.g. union), it might as well just return a tuple as it would be better that the returned object is a public class (FrozenList is only accessible from the private pandas.core)

And If planning to change to tuple is it the correct way?

Yeah exactly

@srkds
Copy link
Contributor

srkds commented Jun 7, 2023

I'm not sure if any of those reason are still entirely applicable at this point. I would say if the custom FrozenList methods are not used extensively (e.g. union), it might as well just return a tuple as it would be better that the returned object is a public class (FrozenList is only accessible from the private pandas.core)

Yeah sounds good.

Yeah exactly

So going for submitting a PR.

@srkds
Copy link
Contributor

srkds commented Jun 7, 2023

take

@mroeschke
Copy link
Member Author

@srkds I would wait until there's a little more feedback before submitting a PR. Maybe there's a reason someone else might bring up why we should still use FrozenList

@srkds
Copy link
Contributor

srkds commented Jun 7, 2023

@srkds I would wait until there's a little more feedback before submitting a PR. Maybe there's a reason someone else might bring up why we should still use FrozenList

Yes, I agree with you👍

@topper-123
Copy link
Contributor

I think it can be done for levels/codes. So I'm +1, if there's no surprises after removing FrozenList.

We still will use FrozenList in .names:

>>> mi = pd.MultiIndex.from_frame(df, names=[(1,), (4,)])
>>> mi.names
FrozenList([(1,), (4,)])

@mroeschke
Copy link
Member Author

Looks pretty feasible internally based on #53582, but I think this may need to a hard 3.0 break as I don't think there's a clean way to deprecate this

@topper-123
Copy link
Contributor

Yeah, agree with that.

@rhshadrach
Copy link
Member

+1

FrozenList was introduced in #4039; specifically from #4039 (comment):

I had to create a separate FrozenList object to return levels, names and labels because PyTables doesn't work if the multiindex names are not a subclass of list (because DataFrame special-cases lists). Well, frankly, pandas just specialcases lists all over the place, so it was easier to go with the flow.

and #4039 (comment)

Created FrozenList and FrozenNDArray objects to make levels, labels and names immutable (can't use a tuple because of type-checking in other parts of the pandas library)

@rhshadrach
Copy link
Member

I think this may need to a hard 3.0 break

The two methods FrozenList implements that tuple does not are union and difference. We could deprecate these to make the break a bit less hard. I think most other operations would be compatible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants