Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Possible bad api side effect of replacing FrozenList with tuple #57607

Closed
3 tasks done
dalejung opened this issue Feb 25, 2024 · 8 comments · Fixed by #57788
Closed
3 tasks done

BUG: Possible bad api side effect of replacing FrozenList with tuple #57607

dalejung opened this issue Feb 25, 2024 · 8 comments · Fixed by #57788
Labels
Blocker Blocking issue or pull request for an upcoming release Bug Index Related to the Index class or subclasses Needs Discussion Requires discussion from core team before further action
Milestone

Comments

@dalejung
Copy link
Contributor

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

df = pd.DataFrame({'name': ['dale', 'frank'], 'age': [10, 20]})
df = df.set_index('name')

index_names = df.index.names

df = df.reset_index()
df.set_index(index_names)

# KeyError: "None of [('name',)] are in the columns"

Issue Description

This might not be considered a bug, but I would expect there to be symmetry in df.set_index and df.index.names. Previously when index.names was a FrozenList, this would work. However a tuple has different semantics than a list throughout pandas.

Expected Behavior

I would expect Index.names to work for set_index.

Installed Versions

INSTALLED VERSIONS

commit : f01ae20
python : 3.11.7.final.0
python-bits : 64
OS : Linux
OS-release : 6.7.4-arch1-1
Version : #1 SMP PREEMPT_DYNAMIC Mon, 05 Feb 2024 22:07:49 +0000
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 3.0.0.dev0+431.gf01ae209da
numpy : 1.24.4
pytz : 2023.3
dateutil : 2.8.2
setuptools : 69.1.1
pip : 24.0
Cython : 3.0.8
pytest : 7.4.0
hypothesis : 6.98.11
sphinx : 6.2.1
blosc : None
feather : None
xlsxwriter : 3.1.0
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : 1.0.3
psycopg2 : 2.9.6
jinja2 : 3.1.2
IPython : 8.22.1
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.2
bottleneck : 1.3.7
fastparquet : 2023.4.0
fsspec : 2023.5.0
gcsfs : 2023.5.0
matplotlib : 3.7.1
numba : 0.57.0
numexpr : 2.8.4
odfpy : None
openpyxl : 3.1.2
pyarrow : 16.0.0.dev175+g2d44818d0
pyreadstat : 1.2.1
python-calamine : None
pyxlsb : 1.0.10
s3fs : 2023.5.0
scipy : 1.10.1
sqlalchemy : 2.0.14
tables : 3.8.0
tabulate : 0.9.0
xarray : 2023.5.0
xlrd : 2.0.1
zstandard : 0.21.0
tzdata : 2023.3
qtpy : 2.4.0
pyqt5 : None

@dalejung dalejung added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 25, 2024
@rhshadrach
Copy link
Member

cc @mroeschke @jorisvandenbossche

@rhshadrach rhshadrach added Blocker Blocking issue or pull request for an upcoming release Needs Discussion Requires discussion from core team before further action Index Related to the Index class or subclasses and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 25, 2024
@rhshadrach rhshadrach added this to the 3.0 milestone Feb 25, 2024
@dalejung
Copy link
Contributor Author

dalejung commented Feb 25, 2024

https://github.com/search?q=%22index.names+%3D%3D+%5BNone%5D%22&type=code

fwiw checking for df.index.names == [None] happens pretty frequently. Variants of this type of error is how I ran into the change originally in private repo. Just ran into it with hvplot. The fixes to these are obviously simple, you just wrap index.names in a list. But unsure what the upside of the change is.

@mroeschke
Copy link
Member

Maybe we could return a list here instead of a tuple since FrozenList was originally a list subclass.

Since codes/names/levels that returned this before are not writable, maybe we don't need the mutability anymore

@jorisvandenbossche
Copy link
Member

maybe we don't need the mutability anymore

What do you mean exactly with this?

@jorisvandenbossche
Copy link
Member

Consider something like this

df.index.names[0] = "a"

Currently such code fails, because the FrozenList is not mutable (and also with the tuple it fails). But if this would return a list instead, that would actually "work", in the sense of not raising an error anymore, but I suppose it would not actually update the names?

@mroeschke
Copy link
Member

What do you mean exactly with this?

Sorry I meant immutability.

Consider something like this

It appears for MultiIndex that codes/levels/names are backed by private _codes/_level/_names attributes. If we have the public attributes return lists and the private attributes use tuples we can still achieve "immutability" as before. Here's an example from a branch I have that implements this

In [5]: df = pd.DataFrame([1], index=pd.MultiIndex(codes=[[0]], levels=[["a"]], names=[0]))

In [6]: df
Out[6]: 
   0
0   
a  1

In [9]: df.index.names[0] = "a"

In [10]: df.index
Out[10]: 
MultiIndex([('a',)],
           names=[0])

@jorisvandenbossche
Copy link
Member

Yes, but so that has the behaviour that I mentioned above of appearing to work (because it no longer raises an error about being immutable), while not actually working.

I personally think it is perfectly fine that this doesn't work (that's how it always has been, and there are other APIs to change the name), but IMO it would be a UX regression if it no longer fails loudly.

@WillAyd
Copy link
Member

WillAyd commented Feb 28, 2024

Maybe we also just restore the FrozenList? Not a strong preference but I think the pattern the OP established is pretty reasonable. I would really never expect df.set_index(index_names) to act like a tuple indexer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocker Blocking issue or pull request for an upcoming release Bug Index Related to the Index class or subclasses Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants