DEPR: With a tuple index `df.loc[key]` selects a row when `key` is a subclass of tuple #50597

dicristina · 2023-01-06T01:45:42Z

Issue description

df.loc[key] will raise a KeyError exception when df has a tuple index and key is a tuple but no exception is raised when key is a subclass of tuple, in which case df.loc[key] selects a row.

Reproducible example

from collections import namedtuple
import pandas as pd

# tuple subclasses
mynamedtuple = namedtuple("MyNamedTuple", ["a", "b"])
class mytuple(tuple): pass

# keys
tuple_key = ("foo", "bar")
mytuple_key = mytuple(["foo", "bar"])
mynamedtuple_key = mynamedtuple("foo", "bar")

# pandas objects
tuple_list = [("foo", "bar"), ("bar", "baz")]
tuple_index = pd.Index(tuple_list, tupleize_cols=False)
tuple_df =  pd.DataFrame([(1,2), (3,4)], index=tuple_index, columns=["A", "B"])

mytuple_list = [mytuple(["foo", "bar"]), mytuple(["bar", "baz"])]
mytuple_index = pd.Index(mytuple_list, tupleize_cols=False)
mytuple_df = pd.DataFrame([(1,2), (3,4)], index=mytuple_index, columns=["A", "B"])

mynamedtuple_list = [mynamedtuple("foo", "bar"), mynamedtuple("bar", "baz")]
mynamedtuple_index = pd.Index(mynamedtuple_list, tupleize_cols=False)
mynamedtuple_df =  pd.DataFrame([(1,2), (3,4)], index=mynamedtuple_index, columns=["A", "B"])

# Examples
print(mytuple_df.loc[mytuple_key])  # Does not raise
print(mynamedtuple_df.loc[mynamedtuple_key])  # Does not raise
print(tuple_df.loc[tuple_key])  # Raises KeyError

The following part of the example shows every possible combination of index and key types. On master the only cases where an exception is raised is when key is a tuple.

# Every possible example
from itertools import product

keys = [tuple_key, mytuple_key, mynamedtuple_key]
dfs = [tuple_df, mytuple_df, mynamedtuple_df]

for k, df in product(keys, dfs):
  print(f"Key: {type(k)}, Index: {type(df.index[0])}")
  try:
    res = df.loc[k]
  except KeyError:
    res = "**KeyError**"
  print(f"{res}\n")

Expected behavior

Everyone of the examples should raise KeyError just like when key is a tuple.

Deprecate or fix bug?

This behavior is tested for the case where key and the index elements are named tuples:

pandas/pandas/tests/indexing/test_loc.py

Lines 1581 to 1589 in 6b10bb8

    
           def test_loc_getitem_index_namedtuple(self): 
        
               IndexType = namedtuple("IndexType", ["a", "b"]) 
        
               idx1 = IndexType("foo", "bar") 
        
               idx2 = IndexType("baz", "bof") 
        
               index = Index([idx1, idx2], name="composite_index", tupleize_cols=False) 
        
               df = DataFrame([(1, 2), (3, 4)], index=index, columns=["A", "B"]) 
        
               result = df.loc[IndexType("foo", "bar")]["A"] 
        
               assert result == 1

I became aware of this issue when I was trying to write a patch for issue #48188, which caused the above test to fail. Because there is a test for the behavior described in this issue I think that it is more appropriate to give a deprecation warning before changing it.

INSTALLED VERSIONS

commit : 6b10bb8
python : 3.10.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-25-generic
Version : #25-Ubuntu SMP Wed Mar 30 15:54:22 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.0.0.dev0+1071.g6b10bb875a
numpy : 1.23.5
pytz : 2022.7
dateutil : 2.8.2
setuptools : 65.6.3
pip : 22.3.1
Cython : 0.29.32
pytest : 7.2.0
hypothesis : 6.61.0
sphinx : 4.5.0
blosc : 1.11.1
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : 1.0.2
psycopg2 : 2.9.5
jinja2 : 3.1.2
IPython : 8.7.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : 1.3.5
brotli :
fastparquet : 2022.12.0
fsspec : 2022.11.0
gcsfs : 2022.11.0
matplotlib : 3.6.2
numba : 0.56.4
numexpr : 2.8.4
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 9.0.0
pyreadstat : 1.2.0
pyxlsb : 1.0.10
s3fs : 2022.11.0
scipy : 1.9.3
snappy :
sqlalchemy : 1.4.45
tables : 3.7.0
tabulate : 0.9.0
xarray : 2022.12.0
xlrd : 2.0.1
zstandard : 0.19.0
tzdata : 2022.7
qtpy : None
pyqt5 : None

The text was updated successfully, but these errors were encountered:

dicristina mentioned this issue Jan 7, 2023

BUG: Tuple subclasses should work as tuple in df.iloc[key] #50625

Closed

3 tasks

jbrockmendel added Indexing Related to indexing on series/frames, not to indexes themselves Deprecate Functionality to remove in pandas labels May 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DEPR: With a tuple index `df.loc[key]` selects a row when `key` is a subclass of tuple #50597

DEPR: With a tuple index `df.loc[key]` selects a row when `key` is a subclass of tuple #50597

dicristina commented Jan 6, 2023

INSTALLED VERSIONS

Uh oh!

DEPR: With a tuple index df.loc[key] selects a row when key is a subclass of tuple #50597

DEPR: With a tuple index df.loc[key] selects a row when key is a subclass of tuple #50597

Comments

dicristina commented Jan 6, 2023

Issue description

Reproducible example

Expected behavior

Deprecate or fix bug?

INSTALLED VERSIONS

DEPR: With a tuple index `df.loc[key]` selects a row when `key` is a subclass of tuple #50597

DEPR: With a tuple index `df.loc[key]` selects a row when `key` is a subclass of tuple #50597