Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: With a tuple index df.loc[key] selects a row when key is a subclass of tuple #50597

Open
dicristina opened this issue Jan 6, 2023 · 0 comments
Labels
Deprecate Functionality to remove in pandas Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@dicristina
Copy link
Contributor

Issue description

df.loc[key] will raise a KeyError exception when df has a tuple index and key is a tuple but no exception is raised when key is a subclass of tuple, in which case df.loc[key] selects a row.

Reproducible example

from collections import namedtuple
import pandas as pd

# tuple subclasses
mynamedtuple = namedtuple("MyNamedTuple", ["a", "b"])
class mytuple(tuple): pass

# keys
tuple_key = ("foo", "bar")
mytuple_key = mytuple(["foo", "bar"])
mynamedtuple_key = mynamedtuple("foo", "bar")

# pandas objects
tuple_list = [("foo", "bar"), ("bar", "baz")]
tuple_index = pd.Index(tuple_list, tupleize_cols=False)
tuple_df =  pd.DataFrame([(1,2), (3,4)], index=tuple_index, columns=["A", "B"])

mytuple_list = [mytuple(["foo", "bar"]), mytuple(["bar", "baz"])]
mytuple_index = pd.Index(mytuple_list, tupleize_cols=False)
mytuple_df = pd.DataFrame([(1,2), (3,4)], index=mytuple_index, columns=["A", "B"])

mynamedtuple_list = [mynamedtuple("foo", "bar"), mynamedtuple("bar", "baz")]
mynamedtuple_index = pd.Index(mynamedtuple_list, tupleize_cols=False)
mynamedtuple_df =  pd.DataFrame([(1,2), (3,4)], index=mynamedtuple_index, columns=["A", "B"])

# Examples
print(mytuple_df.loc[mytuple_key])  # Does not raise
print(mynamedtuple_df.loc[mynamedtuple_key])  # Does not raise
print(tuple_df.loc[tuple_key])  # Raises KeyError

The following part of the example shows every possible combination of index and key types. On master the only cases where an exception is raised is when key is a tuple.

# Every possible example
from itertools import product

keys = [tuple_key, mytuple_key, mynamedtuple_key]
dfs = [tuple_df, mytuple_df, mynamedtuple_df]

for k, df in product(keys, dfs):
  print(f"Key: {type(k)}, Index: {type(df.index[0])}")
  try:
    res = df.loc[k]
  except KeyError:
    res = "**KeyError**"
  print(f"{res}\n")

Expected behavior

Everyone of the examples should raise KeyError just like when key is a tuple.

Deprecate or fix bug?

This behavior is tested for the case where key and the index elements are named tuples:

def test_loc_getitem_index_namedtuple(self):
IndexType = namedtuple("IndexType", ["a", "b"])
idx1 = IndexType("foo", "bar")
idx2 = IndexType("baz", "bof")
index = Index([idx1, idx2], name="composite_index", tupleize_cols=False)
df = DataFrame([(1, 2), (3, 4)], index=index, columns=["A", "B"])
result = df.loc[IndexType("foo", "bar")]["A"]
assert result == 1

I became aware of this issue when I was trying to write a patch for issue #48188, which caused the above test to fail. Because there is a test for the behavior described in this issue I think that it is more appropriate to give a deprecation warning before changing it.

INSTALLED VERSIONS

commit : 6b10bb8
python : 3.10.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-25-generic
Version : #25-Ubuntu SMP Wed Mar 30 15:54:22 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.0.0.dev0+1071.g6b10bb875a
numpy : 1.23.5
pytz : 2022.7
dateutil : 2.8.2
setuptools : 65.6.3
pip : 22.3.1
Cython : 0.29.32
pytest : 7.2.0
hypothesis : 6.61.0
sphinx : 4.5.0
blosc : 1.11.1
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : 1.0.2
psycopg2 : 2.9.5
jinja2 : 3.1.2
IPython : 8.7.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : 1.3.5
brotli :
fastparquet : 2022.12.0
fsspec : 2022.11.0
gcsfs : 2022.11.0
matplotlib : 3.6.2
numba : 0.56.4
numexpr : 2.8.4
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 9.0.0
pyreadstat : 1.2.0
pyxlsb : 1.0.10
s3fs : 2022.11.0
scipy : 1.9.3
snappy :
sqlalchemy : 1.4.45
tables : 3.7.0
tabulate : 0.9.0
xarray : 2022.12.0
xlrd : 2.0.1
zstandard : 0.19.0
tzdata : 2022.7
qtpy : None
pyqt5 : None

@jbrockmendel jbrockmendel added Indexing Related to indexing on series/frames, not to indexes themselves Deprecate Functionality to remove in pandas labels May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

2 participants