-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Emit warning for missing keys in list-likes for partial indexing in MultiIndex #20916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
After further thought, I'm not anymore sure this (and eventually breaking lists with missing values) is a good idea. When we have missing keys in normal indexing, it means that
But when a user does partial indexing, there is no obvious expectation about what can come back. It could be one row, multiple rows... zero rows. "Anything that starts with..." can also be "nothing", it's not a contradiction. And in fact, in the current codebase
which contradicts the "raise if no key is found" rule. On the other hand, however
does raise. But my preferred solution would be to change this. So my proposal would be:
Notice that this is much simpler to implement, because we just do not need to care about missing keys. Assuming we agree, only question is how to transition to this. Does |
Additional observation: with the above example, doing a level-wise indexing for only the first level with a list with missing label does raise:
Should that also result in an empty series? |
The crazy part is that this only happens for one element lists! After discussing with Joris, we agreed that the two kinds of "partial" indexing (but we should maybe find a better name for passing a list per level: "composite"? "combined"?) should be considered separately. I would say there are 3 possibilities for indexing (one or more) levels with lists of labels (
each requires some backward incompatibility, and I think nobody likes 2), while both 1) and 3) in principle make sense. We then have 3 possibilities for indexing with incomplete keys, or lists of them ( Now consider the case Now: when we have a
In order to make our user's life simpler, we would like the two things to have the same semantics when 4 is missing. Which means that either we pick 1) with A), or 3) with C). And this brings us back to the "always raise" vs. "never raise". Advantages of "always raise": consistent with flat index, and with full indexing; "informs" the user that the desired labels do not appear. Easier transition (adding warnings that tell of future error is simpler than adding warning of future error removal). Advantages of "never raise": in flat indexes, you do expect all the labels you provide to be in the result, while this is not true in Both options allow us to simplify our codebase. I think I would go for "always raise", but at this point I don't really have a strong opinion. Thoughts? |
OK, after further thought and discussion, I would conclude that
I will start adding warnings for errors which will be removed. |
@toobaz Is this still up to date?
raises now while it was changed in #27154 to not raise. So I am wondering what exactly we would want to achieve here |
We need the equivalent of #20770 for the case of partial indexing (which includes both indexing by lists of partial keys, and indexing by specifying a list for each level). It needs a separate approach because resulting indexers are different (as missing keys are dropped rather than replaced with NaN).
In any case, as for #20770, it should be possible to emit the warning without checking a second (or first) time for absent keys.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: c4da79b
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-6-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8
pandas: 0.23.0.dev0+824.gc4da79b5b.dirty
pytest: 3.5.0
pip: 9.0.1
setuptools: 39.0.1
Cython: 0.25.2
numpy: 1.14.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.5.0
dateutil: 2.7.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.9.6
lxml: 4.1.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1
The text was updated successfully, but these errors were encountered: