-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Incoherence between Index and MultiIndex when labels in list are not found #15452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
these are not equivalent, asking for a scalar, e.g. |
@jreback , while indeed the most strict equivalent to # flat Index, looking for a list of values
s.reset_index().loc[['not', 'found']] is not really # MultiIndex, looking for a list of values (in first level)
s.loc[['not', 'found']] but rather # MultiIndex, looking for a list of (complete) keys
s.loc[[('not', 'found1'), ('not', 'found2')]] , your example is not related to any of those (compare to In [6]: s.loc[[('not', 'found1'), ('not', 'found2')]]
Out[6]:
not found1 NaN
found2 NaN
dtype: float64 ... but at least there seems to be some reasoning behind (not clear to me). |
Or if you want a more concise example: |
that is asking for a scalar label, but @toobaz issue was a list of labels, so the equivalent would indeed rather be (but of course it difficult to pinpoint an exact equivalent):
which does a reindex. Thus, different from single Index indexing, where it only does a reindex when at least one label of the list is included in the Index. |
@jreback @jorisvandenbossche to be honest, I don't get why it was decided in the first place to make
And I know it's probably late to say this, but concerning this bug, the cleanest solution would seem to me to abandon the rule (documented, this time) Or alternatively, we could decide to be strict and raise a I understand that either of the two would break existing code. But the current compromise, in which we raise only if we do not find at least one label, seems to me a source of headaches both for the pandas user and for the pandas developer. (sorry if this was already discussed somewhere I couldn't find) |
To come back to the original issue of Because the desired result may be different depending on how it is interpreted (or at least the current behaviour). And |
@toobaz there is also a difference in the 'reindexing' behaviour of loc between single Index and MultiIndex.
so introducing NaN for the labels that did not exist in the original index.
So following that idea, you could see the original case here as equivalent to indexing with an empty list:
and then the output could be interpreted as consistent ... |
I gave it as granted that every time we ask In your first option, you have |
Sure! And this reveals, strictly speaking, another very perverse (although well known) bug: since the docs say Now, if I will be able to convince you that the current behaviour is perverse (I mean my proposal above about managing missing labels - but we can discuss it elsewhere), I will take care of required changes, and the problem will disappear, everything will be coherent, and I will be happy. But assuming this does not happen, we will update the docs stating "except when passing an empty list", and then again the expected behaviour for this bug will be clear (since I am not passing an empty list). |
No, it is a list of tuples (
That's maybe a good analogy to reason about the behaviour of a single list, yes. And to justify that
I am not sure we should call it a bug, as it could also be a design choice (although maybe a questionable one). But I would open a separate issue for that (focusing on the plain Index case). |
OK: you have tuples, I don't. Tuples (can) mean "pieces of keys" (one per level), lists never do. Tuples can also contain lists (as in your example below), but they again mean "pieces of keys", except that for each level you have several pieces rather than only one. It is a beautiful convention without which I would never know what to expect from
Exactly!
Sorry, you are right, I know it's a design choice... I meant that as of now, it is just a documentation bug (except that I would like to fix the code rather than the docs, but I agree to talk about this in a separate issue). |
This part I don't fully understand. But we may be confused by each other terminology.
Well, that is my question. Do lists never mean that? Because the result of By the way, the
does not fully hold if you have more than two levels:
|
I was referring precisely to the tuples within the list
OK, seems like my wording was confusing. What I meant was: tuples broadcast across levels, lists never do. (By "pieces of keys" I meant "different pieces of one same key", or of multiple keys, but with emphasis that different components refer to different levels.)
Good catch... I'm pretty sure we want to consider
You don't even need the three levels to contradict me, try the following ;-) In [2]: s = pd.Series(range(6), index=pd.MultiIndex.from_product([['A0', 'A1', 'A2'], ['B0', 'B1']]))
In [3]: s.loc[[('A0',), ('A1',)]] (clearly another bug - or the same, for what it matters) (By the way, before you take me more literally than I intended: I meant |
Looks like this reasonably raises a
|
Code Sample, a copy-pastable example if possible
Problem description
With regular
Index
es, looking for a list of labels of which none is present will raiseKeyError
(see below). We should be coherent (and while I would prefer the empty result to the exception, this was already discussed in #7999).Expected Output
Compare to
Output of
pd.show_versions()
pandas: 0.19.0+473.gf65a641
pytest: 3.0.6
pip: 8.1.2
setuptools: 28.0.0
Cython: 0.23.4
numpy: 1.12.0
scipy: 0.18.1
xarray: None
IPython: 5.1.0.dev
sphinx: 1.4.8
patsy: 0.3.0-dev
dateutil: 2.5.3
pytz: 2015.7
blosc: None
bottleneck: 1.2.0
tables: 3.2.2
numexpr: 2.6.0
feather: None
matplotlib: 2.0.0rc2
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: 0.999
httplib2: 0.9.1
apiclient: 1.5.2
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
pandas_datareader: 0.2.1
The text was updated successfully, but these errors were encountered: