-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH]: Reworking of iloc
and loc
indexing
#12793
Comments
Thank you for putting in the effort to collect these issues! |
cc @mroeschke |
@wence- @mroeschke how many of these issues become moot once COW becomes the default behavior in pandas 3.0? If many, I think that is probably the better path for us to follow on those issues rather than trying to fix issues that we know are going away. |
CoW unfortunately, I think, does not fix most of the issues here. They are mostly not to do with views vs copies. Most of them are that the desugaring step from the top-level syntax is not handled compatibly with pandas. |
Sad, but not surprising. Just wanted to check since I did close a couple of related issues over the past week where COW definitely does fix them, but many others didn't look like they would be. |
Status quo
Indexing of dataframes and series happens through six user-facing routes:
DataFrame.__setitem__
/DataFrame.__getitem__
DataFrame.iloc.__setitem__
/DataFrame.iloc.__getitem__
DataFrame.loc.__setitem__
/DataFrame.loc.__getitem__
Series.__setitem__
/Series.__getitem__
Series.iloc.__setitem__
/Series.iloc.__getitem__
Series.loc.__setitem__
/Series.loc.__getitem__
These all have slightly different semantics (to match pandas behaviour), but there is still quite a lot of (possibly unnecessary) code duplication and a number of bugs around indexing. Many of these look to be because the business logic of handling slicing/gather-by-mask/indexing is intertwined with error handling and determining exactly what to slice. There's also logic effectively repeated between the loc and iloc versions in both cases.
It would be nice if the number of different paths into indexing was reduced, perhaps it is a pipe dream to share between Series and DataFrame (since a DataFrame is not just a collection of Series), but it feels like it should be possible to share more between iloc/loc/setgetitem.
Related issues:
iloc
bugsEllipsis
#13267Index bugs
loc
bugsSeries.__setitem__
fails with tuple keys on a multiindex #7448loc
behavior differs from pandas when a duplicated index is requested #8693df.loc
fails with type error. #11298loc
-based indexing with slice ranges inconsistent with pandas #12833Ellipsis
#13268loc
indexing is incorrect with repeated column labels. #13269loc
-based indexing of DataFrames silently discards missing keys if at least one key is present in indexer #13379Views vs. copies
.iloc[]
to set a single value in a Series does not do so in-place #11085__setitem__
and friends #11990Other (mostly dtype-related)
__getitem__
calls #8184DataFrame.iloc
returns the wrong type of object when a string column is present #11477__setitem__
#12039Column._scatter_by_slice
doesn't handle negative-stride slices correctly. #13532Your issue here.
As we can see from this classification,
loc
-based indexing is definitely the harder nut to crack. The edge-cases that provoke most of the issues are cases where the values used in the indexing are not in the index.The text was updated successfully, but these errors were encountered: