-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Multi-Index .loc with nlevels > 2 fails when nlevels - i > 1 levels are specified in a tuple. #22151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Does seem strange, though I'm fairly certain the preferred method of slicing here is: df.loc[('2011', '01'), :] which works. The docs are pretty explicit about need to specify both axes. That said there's likely a bug hidden here somewhere - investigation and PRs are always welcome. cc @toobaz |
The construction The solution is pretty simple actually (at least to get back to 0.22.0's functionality), in pandas.core.indexes/multi.py the size offset needs masked by the missing element in the tuple(s). For the example above, replacing line 77 with
is sufficient (of course, a similar change should propagate to the PyInt-version of the same class). I would raise as a PR, but work has strict rules around contributions to open source, and I've already tried unsuccessfully to get permission to contribute to pandas. |
@bfollinprm Thanks for the report. I definitely agree this should work. I don't think the proposed fix is optimal however. |
Correcting myself:
... but the example by @WillAyd does, so my argument should still stand. |
@bfollinprm did your example really work with 0.22.0? With 0.19.2, I get
... which is certainly not the desired behavior. |
Ah, you are right, this was buggy before--it gave the wrong result, but was silent about it. I didn't notice because the test code that was checking this was looping over all tuple lengths, but was only checking the values of the length 1 and length |
The problem with the other signatures, e.g. @WillAyd's The following seems to give what I want:
So there definitely is a way, but that's ugly. |
Sure, I agree, my comment was only concerning the implementation. Since the current engine can be used for the desired lookup, the problem is not there.
Please correct me if I'm wrong, but given This said, I think the solution will be something similar to your snippet, the main difficulty being that it needs to account for both lists and dataframes to be returned by |
Duplicate of #16083 , @bfollinprm feel free to add there any further comment |
Code Sample, a copy-pastable example if possible
Problem description
In the spirit of the Multi-Index missile example in the docstring for
pd.DataFrame.loc
, I query a data frame with a 3 level multi-index, with variable length tuple supplied. The following calls all function as expected from the generalization of the example:*
df.loc[['2011'], :]
*
df.loc[[('2011', '01', 1)], :]
*
df.loc['2011', :]
*
df.loc[('2011', '01', 1), :]
*
df.loc[('2011', '01'), :]
The requested query is failing due to a mis-match between the supplied multi-index tuple and the pre-computed offsets in the new
MultiIndexUIntEngine
class introduced in 0.23.0:pandas/core/indexing.py in getitem(self, key)
1508
1509 maybe_callable = com._apply_if_callable(key, self.obj)
-> 1510 return self._getitem_axis(maybe_callable, axis=axis)
1511
1512 def _is_scalar_access(self, key):
pandas/core/indexing.py in _getitem_axis(self, key, axis)
1910 raise ValueError('Cannot index with multidimensional key')
1911
-> 1912 return self._getitem_iterable(key, axis=axis)
1913
1914 # nested tuple slicing
pandas/core/indexing.py in _getitem_iterable(self, key, axis)
1211 # A collection of keys
1212 keyarr, indexer = self._get_listlike_indexer(key, axis,
-> 1213 raise_missing=False)
1214 return self.obj._reindex_with_indexers({axis: [keyarr, indexer]},
1215 copy=True, allow_dups=True)
pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
1160 if len(ax) or not len(key):
1161 key = self._convert_for_reindex(key, axis)
-> 1162 indexer = ax.get_indexer_for(key)
1163 keyarr = ax.reindex(keyarr)[0]
1164 else:
pandas/core/indexes/base.py in get_indexer_for(self, target, **kwargs)
3420 """
3421 if self.is_unique:
-> 3422 return self.get_indexer(target, **kwargs)
3423 indexer, _ = self.get_indexer_non_unique(target, **kwargs)
3424 return indexer
pandas/core/indexes/multi.py in get_indexer(self, target, method, limit, tolerance)
1972 'for MultiIndex; see GitHub issue 9365')
1973 else:
-> 1974 indexer = self._engine.get_indexer(target)
1975
1976 return ensure_platform_int(indexer)
pandas/_libs/index.pyx in pandas._libs.index.BaseMultiIndexCodesEngine.get_indexer()
pandas/_libs/index.pyx in pandas._libs.index.BaseMultiIndexCodesEngine._extract_level_codes()
pandas/core/indexes/multi.py in _codes_to_ints(self, codes)
75 # Shift the representation of each level by the pre-calculated number
76 # of bits:
---> 77 codes <<= self.offsets
78
79 # Now sum and OR are in fact interchangeable. This is a simple
ValueError: operands could not be broadcast together with shapes (1,2) (3,) (1,2)
Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.0.dev0+346.gb97545563
pytest: 3.6.3
pip: 10.0.1
setuptools: 39.2.0
Cython: 0.28.4
numpy: 1.14.0
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.1.6
pymysql: None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: