-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
API: allow negative steps for label-based indexing #8753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1959,23 +1959,99 @@ def slice_indexer(self, start=None, end=None, step=None): | |
----- | ||
This function assumes that the data is sorted, so use at your own peril | ||
""" | ||
start_slice, end_slice = self.slice_locs(start, end) | ||
start_slice, end_slice = self.slice_locs(start, end, step=step) | ||
|
||
# return a slice | ||
if np.isscalar(start_slice) and np.isscalar(end_slice): | ||
if not lib.isscalar(start_slice): | ||
raise AssertionError("Start slice bound is non-scalar") | ||
if not lib.isscalar(end_slice): | ||
raise AssertionError("End slice bound is non-scalar") | ||
|
||
# degenerate cases | ||
if start is None and end is None: | ||
return slice(None, None, step) | ||
return slice(start_slice, end_slice, step) | ||
|
||
return slice(start_slice, end_slice, step) | ||
def _maybe_cast_slice_bound(self, label, side): | ||
""" | ||
This function should be overloaded in subclasses that allow non-trivial | ||
casting on label-slice bounds, e.g. datetime-like indices allowing | ||
strings containing formatted datetimes. | ||
|
||
# loc indexers | ||
return (Index(start_slice) & Index(end_slice)).values | ||
Parameters | ||
---------- | ||
label : object | ||
side : {'left', 'right'} | ||
|
||
Notes | ||
----- | ||
Value of `side` parameter should be validated in caller. | ||
|
||
def slice_locs(self, start=None, end=None): | ||
""" | ||
For an ordered Index, compute the slice locations for input labels | ||
return label | ||
|
||
def get_slice_bound(self, label, side): | ||
""" | ||
Calculate slice bound that corresponds to given label. | ||
|
||
Returns leftmost (one-past-the-rightmost if ``side=='right'``) position | ||
of given label. | ||
|
||
Parameters | ||
---------- | ||
label : object | ||
side : {'left', 'right'} | ||
|
||
""" | ||
if side not in ('left', 'right'): | ||
raise ValueError( | ||
"Invalid value for side kwarg," | ||
" must be either 'left' or 'right': %s" % (side,)) | ||
|
||
original_label = label | ||
# For datetime indices label may be a string that has to be converted | ||
# to datetime boundary according to its resolution. | ||
label = self._maybe_cast_slice_bound(label, side) | ||
|
||
try: | ||
slc = self.get_loc(label) | ||
except KeyError: | ||
if self.is_monotonic_increasing: | ||
return self.searchsorted(label, side=side) | ||
elif self.is_monotonic_decreasing: | ||
# np.searchsorted expects ascending sort order, have to reverse | ||
# everything for it to work (element ordering, search side and | ||
# resulting value). | ||
pos = self[::-1].searchsorted( | ||
label, side='right' if side == 'left' else 'right') | ||
return len(self) - pos | ||
|
||
# In all other cases, just re-raise the KeyError | ||
raise | ||
|
||
if isinstance(slc, np.ndarray): | ||
# get_loc may return a boolean array or an array of indices, which | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a nice addition. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like it needs tests? |
||
# is OK as long as they are representable by a slice. | ||
if com.is_bool_dtype(slc): | ||
slc = lib.maybe_booleans_to_slice(slc.view('u1')) | ||
else: | ||
slc = lib.maybe_indices_to_slice(slc.astype('i8')) | ||
if isinstance(slc, np.ndarray): | ||
raise KeyError( | ||
"Cannot get %s slice bound for non-unique label:" | ||
" %r" % (side, original_label)) | ||
|
||
if isinstance(slc, slice): | ||
if side == 'left': | ||
return slc.start | ||
else: | ||
return slc.stop | ||
else: | ||
if side == 'right': | ||
return slc + 1 | ||
else: | ||
return slc | ||
|
||
def slice_locs(self, start=None, end=None, step=None): | ||
""" | ||
Compute slice locations for input labels. | ||
|
||
Parameters | ||
---------- | ||
|
@@ -1986,51 +2062,51 @@ def slice_locs(self, start=None, end=None): | |
|
||
Returns | ||
------- | ||
(start, end) : (int, int) | ||
start, end : int | ||
|
||
Notes | ||
----- | ||
This function assumes that the data is sorted, so use at your own peril | ||
""" | ||
inc = (step is None or step >= 0) | ||
|
||
is_unique = self.is_unique | ||
|
||
def _get_slice(starting_value, offset, search_side, slice_property, | ||
search_value): | ||
if search_value is None: | ||
return starting_value | ||
if not inc: | ||
# If it's a reverse slice, temporarily swap bounds. | ||
start, end = end, start | ||
|
||
try: | ||
slc = self.get_loc(search_value) | ||
|
||
if not is_unique: | ||
|
||
# get_loc will return a boolean array for non_uniques | ||
# if we are not monotonic | ||
if isinstance(slc, (np.ndarray, Index)): | ||
raise KeyError("cannot peform a slice operation " | ||
"on a non-unique non-monotonic index") | ||
|
||
if isinstance(slc, slice): | ||
slc = getattr(slc, slice_property) | ||
else: | ||
slc += offset | ||
start_slice = None | ||
if start is not None: | ||
start_slice = self.get_slice_bound(start, 'left') | ||
if start_slice is None: | ||
start_slice = 0 | ||
|
||
except KeyError: | ||
if self.is_monotonic_increasing: | ||
slc = self.searchsorted(search_value, side=search_side) | ||
elif self.is_monotonic_decreasing: | ||
search_side = 'right' if search_side == 'left' else 'left' | ||
slc = len(self) - self[::-1].searchsorted(search_value, | ||
side=search_side) | ||
else: | ||
raise | ||
return slc | ||
end_slice = None | ||
if end is not None: | ||
end_slice = self.get_slice_bound(end, 'right') | ||
if end_slice is None: | ||
end_slice = len(self) | ||
|
||
start_slice = _get_slice(0, offset=0, search_side='left', | ||
slice_property='start', search_value=start) | ||
end_slice = _get_slice(len(self), offset=1, search_side='right', | ||
slice_property='stop', search_value=end) | ||
if not inc: | ||
# Bounds at this moment are swapped, swap them back and shift by 1. | ||
# | ||
# slice_locs('B', 'A', step=-1): s='B', e='A' | ||
# | ||
# s='A' e='B' | ||
# AFTER SWAP: | | | ||
# v ------------------> V | ||
# ----------------------------------- | ||
# | | |A|A|A|A| | | | | |B|B| | | | | | ||
# ----------------------------------- | ||
# ^ <------------------ ^ | ||
# SHOULD BE: | | | ||
# end=s-1 start=e-1 | ||
# | ||
end_slice, start_slice = start_slice - 1, end_slice - 1 | ||
|
||
# i == -1 triggers ``len(self) + i`` selection that points to the | ||
# last element, not before-the-first one, subtracting len(self) | ||
# compensates that. | ||
if end_slice == -1: | ||
end_slice -= len(self) | ||
if start_slice == -1: | ||
start_slice -= len(self) | ||
|
||
return start_slice, end_slice | ||
|
||
|
@@ -3887,7 +3963,12 @@ def _tuple_index(self): | |
""" | ||
return Index(self.values) | ||
|
||
def slice_locs(self, start=None, end=None, strict=False): | ||
def get_slice_bound(self, label, side): | ||
if not isinstance(label, tuple): | ||
label = label, | ||
return self._partial_tup_index(label, side=side) | ||
|
||
def slice_locs(self, start=None, end=None, step=None): | ||
""" | ||
For an ordered MultiIndex, compute the slice locations for input | ||
labels. They can be tuples representing partial levels, e.g. for a | ||
|
@@ -3900,7 +3981,8 @@ def slice_locs(self, start=None, end=None, strict=False): | |
If None, defaults to the beginning | ||
end : label or tuple | ||
If None, defaults to the end | ||
strict : boolean, | ||
step : int or None | ||
Slice step | ||
|
||
Returns | ||
------- | ||
|
@@ -3910,21 +3992,9 @@ def slice_locs(self, start=None, end=None, strict=False): | |
----- | ||
This function assumes that the data is sorted by the first level | ||
""" | ||
if start is None: | ||
start_slice = 0 | ||
else: | ||
if not isinstance(start, tuple): | ||
start = start, | ||
start_slice = self._partial_tup_index(start, side='left') | ||
|
||
if end is None: | ||
end_slice = len(self) | ||
else: | ||
if not isinstance(end, tuple): | ||
end = end, | ||
end_slice = self._partial_tup_index(end, side='right') | ||
|
||
return start_slice, end_slice | ||
# This function adds nothing to its parent implementation (the magic | ||
# happens in get_slice_bound method), but it adds meaningful doc. | ||
return super(MultiIndex, self).slice_locs(start, end, step) | ||
|
||
def _partial_tup_index(self, tup, side='left'): | ||
if len(tup) > self.lexsort_depth: | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not what we are discussing in #8613? (so it seems this is explicitly and intentionally implemented?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The blocking out of bounds indexing is actually done in
pandas.core.indexing._LocIndexer
. So the functionality here is unchanged (though I agree the checks do make more sense here than in the indexing module)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I see, and the
IXIndexer
does not implement those checks, soix
can hit this codepath.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving the checks here should be trivial with adding the third
side=='exact'
orside='strict'
that disables this branch of execution (and probably renaming the function and parameter to make more sense with that third value).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I was really hoping that we'll come to dropping those checks altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@immerrr agreed, but I do think we'll want at least a type check (not in this PR), e.g., to ensure
pd.Index([1, 2, 3]).slice_indexer('a')
raises on Python 2.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is easy: you just need to add something that throws on invalid type into
_maybe_cast_slice_bound
. The inverse is tricky (e.g.pd.Index(list('abc')).slice_indexer(1)
), but should be doable onceStringIndex
lands (either by adding another index type or by adding a string-object numpy dtype).