Skip to content

.loc with list of incomplete labels misbehaves or raises #16083

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
toobaz opened this issue Apr 21, 2017 · 4 comments
Open

.loc with list of incomplete labels misbehaves or raises #16083

toobaz opened this issue Apr 21, 2017 · 4 comments
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex

Comments

@toobaz
Copy link
Member

toobaz commented Apr 21, 2017

Code Sample, a copy-pastable example if possible

In [2]: df = pd.DataFrame(-1, index=pd.MultiIndex.from_product([[1,2], ['a', 'b'], ['i', 'ii']]), columns=['A', 'B'])

In [3]: df.loc[[1]] # works
Out[3]: 
        A  B
1 a i  -1 -1
    ii -1 -1
  b i  -1 -1
    ii -1 -1

In [4]: df.loc[[(1, 'a')]] # misbehaves (acts as if labels were missing)
Out[4]: 
      A   B
1 a NaN NaN

In [5]: df.loc[[(1,)]] # raises
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/home/pietro/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
   1100                 try:
-> 1101                     return self.obj.reindex_axis(keyarr, axis=axis)
   1102                 except AttributeError:

/home/pietro/nobackup/repo/pandas/pandas/core/frame.py in reindex_axis(self, labels, axis, method, level, copy, limit, fill_value)
   2836                                         method=method, level=level, copy=copy,
-> 2837                                         limit=limit, fill_value=fill_value)
   2838 

/home/pietro/nobackup/repo/pandas/pandas/core/generic.py in reindex_axis(self, labels, axis, method, level, copy, limit, fill_value)
   2509         new_index, indexer = axis_values.reindex(labels, method, level,
-> 2510                                                  limit=limit)
   2511         return self._reindex_with_indexers({axis: [new_index, indexer]},

/home/pietro/nobackup/repo/pandas/pandas/core/indexes/multi.py in reindex(self, target, method, level, limit, tolerance)
   1845                                                limit=limit,
-> 1846                                                tolerance=tolerance)
   1847                 else:

/home/pietro/nobackup/repo/pandas/pandas/core/indexes/multi.py in get_indexer(self, target, method, limit, tolerance)
   1791             # don't have the same dtypes
-> 1792             if self._inferred_type_levels != target._inferred_type_levels:
   1793                 return Index(self.values).get_indexer(target.values)

AttributeError: 'Int64Index' object has no attribute '_inferred_type_levels'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-5-e5c902934a28> in <module>()
----> 1 df.loc[[(1,)]]

/home/pietro/nobackup/repo/pandas/pandas/core/indexing.py in __getitem__(self, key)
   1326         else:
   1327             key = com._apply_if_callable(key, self.obj)
-> 1328             return self._getitem_axis(key, axis=0)
   1329 
   1330     def _is_scalar_access(self, key):

/home/pietro/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1539                     raise ValueError('Cannot index with multidimensional key')
   1540 
-> 1541                 return self._getitem_iterable(key, axis=axis)
   1542 
   1543             # nested tuple slicing

/home/pietro/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
   1105                     if axis != 0:
   1106                         raise AssertionError('axis must be 0')
-> 1107                     return self.obj.reindex(keyarr)
   1108 
   1109             # existing labels are non-unique

/home/pietro/nobackup/repo/pandas/pandas/core/frame.py in reindex(self, index, columns, **kwargs)
   2827     def reindex(self, index=None, columns=None, **kwargs):
   2828         return super(DataFrame, self).reindex(index=index, columns=columns,
-> 2829                                               **kwargs)
   2830 
   2831     @Appender(_shared_docs['reindex_axis'] % _shared_doc_kwargs)

/home/pietro/nobackup/repo/pandas/pandas/core/generic.py in reindex(self, *args, **kwargs)
   2421         # perform the reindex on the axes
   2422         return self._reindex_axes(axes, level, limit, tolerance, method,
-> 2423                                   fill_value, copy).__finalize__(self)
   2424 
   2425     def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value,

/home/pietro/nobackup/repo/pandas/pandas/core/frame.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   2773         if index is not None:
   2774             frame = frame._reindex_index(index, method, copy, level,
-> 2775                                          fill_value, limit, tolerance)
   2776 
   2777         return frame

/home/pietro/nobackup/repo/pandas/pandas/core/frame.py in _reindex_index(self, new_index, method, copy, level, fill_value, limit, tolerance)
   2781         new_index, indexer = self.index.reindex(new_index, method=method,
   2782                                                 level=level, limit=limit,
-> 2783                                                 tolerance=tolerance)
   2784         return self._reindex_with_indexers({0: [new_index, indexer]},
   2785                                            copy=copy, fill_value=fill_value,

/home/pietro/nobackup/repo/pandas/pandas/core/indexes/multi.py in reindex(self, target, method, level, limit, tolerance)
   1844                     indexer = self.get_indexer(target, method=method,
   1845                                                limit=limit,
-> 1846                                                tolerance=tolerance)
   1847                 else:
   1848                     raise Exception("cannot handle a non-unique multi-index!")

/home/pietro/nobackup/repo/pandas/pandas/core/indexes/multi.py in get_indexer(self, target, method, limit, tolerance)
   1790             # we may not compare equally because of hashing if we
   1791             # don't have the same dtypes
-> 1792             if self._inferred_type_levels != target._inferred_type_levels:
   1793                 return Index(self.values).get_indexer(target.values)
   1794 

AttributeError: 'Int64Index' object has no attribute '_inferred_type_levels'

Problem description

This was already discussed here and here, opening a separate issue for clarity.

Expected Output

df.loc[[(1, 'a',)]] should return the same as df.loc[1, 'a', :] (or more explicitly, df.loc[(1, 'a', slice(None)), :]), while df.loc[[(1,)]] should return the same as df.loc[[1]].

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.7.0-1-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.utf8
LOCALE: it_IT.UTF-8

pandas: 0.19.0+834.g8c7b9731f
pytest: 3.0.6
pip: 9.0.1
setuptools: 33.1.1
Cython: 0.25.2
numpy: 1.12.0
scipy: 0.18.1
xarray: 0.9.1
IPython: 5.1.0.dev
sphinx: 1.4.9
patsy: 0.3.0-dev
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: 3.7.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
pandas_gbq: None
pandas_datareader: 0.2.1

@jreback
Copy link
Contributor

jreback commented Apr 21, 2017

df.loc[1, 'a', :] is not valid syntax

@toobaz
Copy link
Member Author

toobaz commented Apr 21, 2017

df.loc[1, 'a', :] is not valid syntax

I used it because it works, but you're right that df.loc[(1, 'a', slice(None)), :] is clearer (updated decription)

@toobaz
Copy link
Member Author

toobaz commented Apr 21, 2017

df.loc[1, 'a', :] is not valid syntax

By the way: I tend to think it is valid syntax, since the index has three levels - as valid as df.loc[1, 'a', 'i']', which is (officially, I guess) supported as a shortcut for df.loc[(1, 'a', 'i'), :]'.

Indeed, I would expect df.loc[1, 'a', 'i', :] to raise (and it does, but with a different error from what it should because of #14885).

@jreback jreback added Bug Difficulty Intermediate Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Apr 21, 2017
@jreback jreback added this to the Next Major Release milestone Apr 21, 2017
@toobaz
Copy link
Member Author

toobaz commented Aug 3, 2018

As noticed by @bfollinprm in #22151 , the behavior of In [4]: changed in 0.23.0, and an error is now raised.

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

No branches or pull requests

4 participants