[BUG] `loc` behavior differs from pandas when a duplicated index is requested #8693

brandon-b-miller · 2021-07-08T19:35:19Z

Describe the bug
When we have a series with a duplicated index such as

x = cudf.Series([1,2,3,4], index=[0,1,1,2])

And we try and fetch the items corresponding to that duplicated index, we get just the first element rather than a series of all the elements corresponding to that index:

x.loc[1]
# 2

Steps/Code to reproduce bug
See above

Expected behavior
We should get the same thing as pandas, basically all the rows corresponding to that index gathered.

x.to_pandas().loc[1]
# 1    2
# 1    3
# dtype: int64

Environment overview (please complete the following information)

Environment location: [Bare-metal]
Method of cuDF install: [source]

Environment details

Additional context
Add any other context about the problem here.

github-actions · 2021-11-15T21:03:20Z

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

shwina · 2022-11-22T16:43:37Z

This is because of an "optimization" for scalar inputs to loc (

cudf/python/cudf/cudf/core/series.py

Line 292 in d49e412

if _is_scalar_or_zero_d_array(arg):

), that we should probably get rid of.

A workaround is to pass the index in a list:

s.loc[[0]]

Now that we have indices_of available on index objects, use that when looking up scalar values in an index for loc-based indexing. - Closes rapidsai#8693

…3625) Much of the index-specific search code for label-based lookup only worked in the case where the index was sorted in ascending order and/or the requested slice had the same step sign as the index. To fix this, handle ascending and descending sorted indices specially, as well as refactoring to remove unused codepaths. The core idea is to push `find_first` and `find_last` to being generic column methods (in doing so we remove the `closest` argument, but that was only used to produce pandas-incompatible index behaviour). Lookup in indices then uses `get_slice_bound` (that can be specialised for index types) that uses first (if applicable) `searchsorted` and then `find_first/last`. While we're here, since we now check the sortedness properties of Index objects, turn them into `@cached_property` (partially addressing the request of #13357). - Closes #8693 - Closes #12833 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Ashwin Srinath (https://github.com/shwina) URL: #13625

brandon-b-miller added bug Something isn't working Python Affects Python cuDF API. labels Jul 8, 2021

brandon-b-miller self-assigned this Jul 8, 2021

github-actions bot added the inactive-90d label Nov 15, 2021

brandon-b-miller mentioned this issue Dec 15, 2021

[BUG] Setting a float into an int dtype series through iloc should cast the series #9913

Closed

GregoryKimball added this to the Pandas API Alignment and Coverage milestone Nov 19, 2022

wence- mentioned this issue Feb 16, 2023

[ENH]: Reworking of iloc and loc indexing #12793

Open

wence- mentioned this issue Jun 27, 2023

Refactor Index search to simplify code and increase correctness #13625

Merged

3 tasks

wence- added a commit to wence-/cudf that referenced this issue Jun 27, 2023

Correctly handle repeated index entries in Series.loc

a7933bf

Now that we have indices_of available on index objects, use that when looking up scalar values in an index for loc-based indexing. - Closes rapidsai#8693

rapids-bot bot closed this as completed in #13625 Jun 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] `loc` behavior differs from pandas when a duplicated index is requested #8693

[BUG] `loc` behavior differs from pandas when a duplicated index is requested #8693

brandon-b-miller commented Jul 8, 2021 •

edited

Loading

github-actions bot commented Nov 15, 2021

shwina commented Nov 22, 2022 •

edited

Loading

[BUG] loc behavior differs from pandas when a duplicated index is requested #8693

[BUG] loc behavior differs from pandas when a duplicated index is requested #8693

Comments

brandon-b-miller commented Jul 8, 2021 • edited Loading

github-actions bot commented Nov 15, 2021

shwina commented Nov 22, 2022 • edited Loading

[BUG] `loc` behavior differs from pandas when a duplicated index is requested #8693

[BUG] `loc` behavior differs from pandas when a duplicated index is requested #8693

brandon-b-miller commented Jul 8, 2021 •

edited

Loading

shwina commented Nov 22, 2022 •

edited

Loading