-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
API: allow negative steps for label-based indexing #8753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
raise | ||
|
||
if isinstance(slc, np.ndarray): | ||
# get_loc may return a boolean array or an array of indices, which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it needs tests?
generally seems like a nice refactor to me 👍 |
This PR will consolidate |
8b63e9d
to
6085288
Compare
cbe858e
to
30e7dc1
Compare
Mostly done here, except for
|
|
||
.. note:: | ||
|
||
Value of `side` parameter should be validated in caller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comment (it does not really matter, as this will not get formatted in the online docs), but you can use a Notes
section like:
Notes
-----
Value of ....
Again, this looks very nice. Thanks! |
@immerrr maybe you could have a look at https://github.com/pydata/pandas/blob/master/doc/source/internals.rst to see if this is still up to date after your PR (although I am not fully sure this is the best way to document this complex indexing part of the codebase) |
slc = self.get_loc(label) | ||
except KeyError: | ||
if self.is_monotonic_increasing: | ||
return self.searchsorted(label, side=side) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not what we are discussing in #8613? (so it seems this is explicitly and intentionally implemented?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The blocking out of bounds indexing is actually done in pandas.core.indexing._LocIndexer
. So the functionality here is unchanged (though I agree the checks do make more sense here than in the indexing module)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I see, and the IXIndexer
does not implement those checks, so ix
can hit this codepath.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving the checks here should be trivial with adding the third side=='exact'
or side='strict'
that disables this branch of execution (and probably renaming the function and parameter to make more sense with that third value).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I was really hoping that we'll come to dropping those checks altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@immerrr agreed, but I do think we'll want at least a type check (not in this PR), e.g., to ensure pd.Index([1, 2, 3]).slice_indexer('a')
raises on Python 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is easy: you just need to add something that throws on invalid type into _maybe_cast_slice_bound
. The inverse is tricky (e.g. pd.Index(list('abc')).slice_indexer(1)
), but should be doable once StringIndex
lands (either by adding another index type or by adding a string-object numpy dtype).
going to have a look tomorrow. |
I've been doing some benchmarking and found the strangest thing: I couldn't get a stable result from Given usually there's no such thing as "run code before each iteration, but don't benchmark it" in benchmark libraries, I wonder what's the proper way to fix this... Maybe we should just reduce panel sizes in panel benchmarks. |
@immerrr yep that vbench is odd, never really had time to look, feel free to submit a pr to reduce it in size. |
return (Timestamp(st, tz=self.tz), | ||
Timestamp(Timestamp(st + offsets.Second(), | ||
tz=self.tz).value - 1)) | ||
elif reso == 'microsecond': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test for this (in the appropriate section where the partial string resos are tested)
@immerrr looks good. minor case that needs testing. Which might be an open issue in any event (e.g. supporting < second, e.g. microsecond reso in partial_strings for DatetimeIndex). |
I devoted some time to this at the weekend:
|
5fdd168
to
4034a79
Compare
4034a79
to
1ff4720
Compare
1ff4720
to
6a81185
Compare
@@ -69,7 +69,7 @@ Bug Fixes | |||
- ``io.data.Options`` now raises ``RemoteDataError`` when no expiry dates are available from Yahoo (:issue:`8761`). | |||
- ``Timedelta`` kwargs may now be numpy ints and floats (:issue:`8757`). | |||
- ``sql_schema`` now generates dialect appropriate ``CREATE TABLE`` statements (:issue:`8697`) | |||
|
|||
- Fix negative step support for label-based slices (:issue:`8753`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give a previous (use code-block) and current behavior mini example here. otherwise looks good.
6a81185
to
2f4b61f
Compare
INT: make Index.slice_locs step-aware BUG: fix PeriodIndex.searchsorted to accept Periods INT: refactor time-related indices to use step-aware slice_locs INT: refactor MultiIndex to use step-aware slice_locs INT: enable second/microsecond partial string slicing
2f4b61f
to
b735ffc
Compare
ready & green |
API: allow negative steps for label-based indexing
thanks! keep-em coming! |
Thank you @immerrr ! |
This should fix #8716.
Some of this refactoring may be useful for #8613, so I'd like someone to look through this.
cc'ing @shoyer, @jreback and @jorisvandenbossche .
TODO: