Skip to content

REF: IntervalIndex.get_value partial slicing #31117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Jan 18, 2020 · 3 comments · Fixed by #33938
Closed

REF: IntervalIndex.get_value partial slicing #31117

jbrockmendel opened this issue Jan 18, 2020 · 3 comments · Fixed by #33938
Assignees
Labels
good first issue Indexing Related to indexing on series/frames, not to indexes themselves Interval Interval data type Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@jbrockmendel
Copy link
Member

In trying to refactor IntervalIndex.get_loc/get_value to more closely follow the other implementations, I stumbled on a behavior in a test case that surprised me:

idx = IntervalIndex.from_tuples([(1, 5), (3, 7)])
s = Series(range(len(idx)), index=idx)
key = [4]

>>> idx.get_loc(key)  # <-- behaves like all the other get_locs in requiring scalar
KeyError: [4]

>>> idx.get_value(s, key)   # <-- surprising
(1, 5]    0
(3, 7]    1
dtype: int64

All of the other FooIndex.get_value methods raise InvalidIndexError on non-scalar.

AFAICT this is behaving sort of like a slicing operation, since 4 is in the interior of both of these intervals. @jschendel is that correct? If so, we should try to refactor IntervalIndex.get_value to explicitly dispatch for this case, kind of like DatetimeIndex._get_string_slice

@jschendel
Copy link
Member

This should follow the other get_value methods in that a non-scalar should raise an InvalidIndexError, or at least I don't see a reason why it shouldn't. The original implementation accepted non-scalars though, so it's possible there's some history there from before my time that I'm unaware of (cc @jreback).

I would expect the scalar 4 to return the same result as above for idx.get_value(s, 4), which it does on master. For IntervalIndex we want the indexing methods to work for exact Interval matches (i.e. same left/right/closed) or for points contained in the intervals. This does not apply to overlapping or nested Interval matches though, e.g. Interval(2, 6) or Interval(1.5, 2.5) should both raise a KeyError, which is a somewhat recent change (implemented during the sprint last summer).

@jschendel jschendel added the Interval Interval data type label Jan 21, 2020
@jbrockmendel jbrockmendel added the Indexing Related to indexing on series/frames, not to indexes themselves label Feb 25, 2020
@mroeschke
Copy link
Member

Looks like this is consistent with get_loc now. Could use a test.

In [92]: idx = IntervalIndex.from_tuples([(1, 5), (3, 7)])
    ...: s = Series(range(len(idx)), index=idx)
    ...: key = [4]

In [93]: idx.get_value(s, key)
InvalidIndexError: [4]

In [94]: idx.get_loc(key)
InvalidIndexError: [4]

In [95]: pd.__version__
Out[95]: '1.1.0.dev0+1390.gf3fdab389'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Indexing Related to indexing on series/frames, not to indexes themselves Interval Interval data type labels Apr 27, 2020
@primaprashant
Copy link
Contributor

take

@jreback jreback added this to the 1.1 milestone May 9, 2020
@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Interval Interval data type labels May 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Indexing Related to indexing on series/frames, not to indexes themselves Interval Interval data type Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants