API: allow negative steps for label-based indexing #8753

immerrr · 2014-11-08T07:14:47Z

This should fix #8716.

Some of this refactoring may be useful for #8613, so I'd like someone to look through this.

cc'ing @shoyer, @jreback and @jorisvandenbossche .

TODO:

add tests for label slicing with negative step (ix, loc)
- int64index
- float64index
- objectindex
fix datetime/period indices
add tests for label slicing with negative step (datetime, string, native pandas scalar) x (ix, loc)
datetimeindex
periodindex
timedeltaindex
clean up old partial string code?
benchmark

shoyer · 2014-11-08T07:48:48Z

pandas/core/index.py

+            raise
+
+        if isinstance(slc, np.ndarray):
+            # get_loc may return a boolean array or an array of indices, which


This is a nice addition.

Looks like it needs tests?

shoyer · 2014-11-08T08:14:44Z

generally seems like a nice refactor to me 👍

immerrr · 2014-11-08T10:16:59Z

This PR will consolidate Index.slice_locs with that of Datetime-/Period-/TimedeltaIndex, which means that the non-present lookups on is_monotonically_decreasing will start working on those classes.

immerrr · 2014-11-10T07:58:02Z

Mostly done here, except for

maybe old partial string slice code can be cleaned up
per-level slices on MultiIndex don't work and I don't yet see a clear way to support steps for it
have to check performance

jorisvandenbossche · 2014-11-10T22:26:29Z

pandas/core/index.py

+
+        .. note::
+
+           Value of `side` parameter should be validated in caller.


Small comment (it does not really matter, as this will not get formatted in the online docs), but you can use a Notes section like:

Notes ----- Value of ....

shoyer · 2014-11-10T22:57:18Z

Again, this looks very nice. Thanks!

jorisvandenbossche · 2014-11-10T23:19:15Z

@immerrr maybe you could have a look at https://github.com/pydata/pandas/blob/master/doc/source/internals.rst to see if this is still up to date after your PR (although I am not fully sure this is the best way to document this complex indexing part of the codebase)

jorisvandenbossche · 2014-11-10T23:47:24Z

pandas/core/index.py

+            slc = self.get_loc(label)
+        except KeyError:
+            if self.is_monotonic_increasing:
+                return self.searchsorted(label, side=side)


Is this not what we are discussing in #8613? (so it seems this is explicitly and intentionally implemented?)

The blocking out of bounds indexing is actually done in pandas.core.indexing._LocIndexer. So the functionality here is unchanged (though I agree the checks do make more sense here than in the indexing module)

Yes, I see, and the IXIndexer does not implement those checks, so ix can hit this codepath.

Moving the checks here should be trivial with adding the third side=='exact' or side='strict' that disables this branch of execution (and probably renaming the function and parameter to make more sense with that third value).

But I was really hoping that we'll come to dropping those checks altogether.

@immerrr agreed, but I do think we'll want at least a type check (not in this PR), e.g., to ensure pd.Index([1, 2, 3]).slice_indexer('a') raises on Python 2.

This one is easy: you just need to add something that throws on invalid type into _maybe_cast_slice_bound. The inverse is tricky (e.g. pd.Index(list('abc')).slice_indexer(1)), but should be doable once StringIndex lands (either by adding another index type or by adding a string-object numpy dtype).

jreback · 2014-11-11T00:13:24Z

going to have a look tomorrow.

immerrr · 2014-11-11T08:17:55Z

I've been doing some benchmarking and found the strangest thing: I couldn't get a stable result from panel_methods.panel_pct_change_major benchmark, it reported anything from 2000ms to 8000ms. It appears that the task is memory intensive (easily several gbs of ram) and leaves a lot of objects in its wake because timeit disables garbage collection during the benchmarks. The problem is that it doesn't do any cleanup afterwards and this slows down subsequent runs because malloc has harder time finding chunks of memory when requested.

Given usually there's no such thing as "run code before each iteration, but don't benchmark it" in benchmark libraries, I wonder what's the proper way to fix this... Maybe we should just reduce panel sizes in panel benchmarks.

jreback · 2014-11-11T14:18:59Z

@immerrr yep that vbench is odd, never really had time to look, feel free to submit a pr to reduce it in size.

jreback · 2014-11-11T14:26:17Z

pandas/tseries/index.py

+            return (Timestamp(st, tz=self.tz),
+                    Timestamp(Timestamp(st + offsets.Second(),
+                                        tz=self.tz).value - 1))
+        elif reso == 'microsecond':


can you add a test for this (in the appropriate section where the partial string resos are tested)

jreback · 2014-11-11T14:28:41Z

@immerrr looks good. minor case that needs testing. Which might be an open issue in any event (e.g. supporting < second, e.g. microsecond reso in partial_strings for DatetimeIndex).

immerrr · 2014-11-18T04:17:53Z

I devoted some time to this at the weekend:

enabled second/microsecond-resolution partial based indexing.
partial string lookups are going to stay for now, they're used in get_loc in seemingly a bit inconsistent way and fixing it may alter the API, so the cleanup will be a separate PR.

jreback · 2014-11-18T11:14:53Z

doc/source/whatsnew/v0.15.2.txt

@@ -69,7 +69,7 @@ Bug Fixes
 - ``io.data.Options`` now raises ``RemoteDataError`` when no expiry dates are available from Yahoo (:issue:`8761`).
 - ``Timedelta`` kwargs may now be numpy ints and floats (:issue:`8757`).
 - ``sql_schema`` now generates dialect appropriate ``CREATE TABLE`` statements (:issue:`8697`)
-
+- Fix negative step support for label-based slices (:issue:`8753`)


can you give a previous (use code-block) and current behavior mini example here. otherwise looks good.

INT: make Index.slice_locs step-aware BUG: fix PeriodIndex.searchsorted to accept Periods INT: refactor time-related indices to use step-aware slice_locs INT: refactor MultiIndex to use step-aware slice_locs INT: enable second/microsecond partial string slicing

immerrr · 2014-11-19T12:31:10Z

ready & green

API: allow negative steps for label-based indexing

jreback · 2014-11-19T22:05:35Z

thanks!

keep-em coming!

shoyer · 2014-11-19T22:09:10Z

Thank you @immerrr !

shoyer reviewed Nov 8, 2014
View reviewed changes

immerrr mentioned this pull request Nov 8, 2014

API: label-based slicing with not-included labels #8613

Closed

immerrr force-pushed the refactor-slice-locs branch 2 times, most recently from 8b63e9d to 6085288 Compare November 8, 2014 17:50

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Nov 9, 2014

immerrr force-pushed the refactor-slice-locs branch 7 times, most recently from cbe858e to 30e7dc1 Compare November 10, 2014 07:54

jorisvandenbossche reviewed Nov 10, 2014
View reviewed changes

jreback added this to the 0.15.2 milestone Nov 11, 2014

jreback reviewed Nov 11, 2014
View reviewed changes

immerrr force-pushed the refactor-slice-locs branch 2 times, most recently from 5fdd168 to 4034a79 Compare November 18, 2014 04:18

immerrr force-pushed the refactor-slice-locs branch from 4034a79 to 1ff4720 Compare November 18, 2014 04:28

shoyer mentioned this pull request Nov 18, 2014

API: Index.get_nearest method #8845

Closed

immerrr force-pushed the refactor-slice-locs branch from 1ff4720 to 6a81185 Compare November 18, 2014 08:51

jreback reviewed Nov 18, 2014
View reviewed changes

immerrr force-pushed the refactor-slice-locs branch from 6a81185 to 2f4b61f Compare November 19, 2014 07:49

immerrr force-pushed the refactor-slice-locs branch from 2f4b61f to b735ffc Compare November 19, 2014 11:08

jreback added a commit that referenced this pull request Nov 19, 2014

Merge pull request #8753 from immerrr/refactor-slice-locs

3274f27

API: allow negative steps for label-based indexing

jreback merged commit 3274f27 into pandas-dev:master Nov 19, 2014

immerrr deleted the refactor-slice-locs branch November 19, 2014 22:15


		.. note::

		Value of `side` parameter should be validated in caller.

Uh oh!

API: allow negative steps for label-based indexing #8753

API: allow negative steps for label-based indexing #8753

Uh oh!

Conversation

immerrr commented Nov 8, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shoyer commented Nov 8, 2014

Uh oh!

immerrr commented Nov 8, 2014

Uh oh!

immerrr commented Nov 10, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shoyer commented Nov 10, 2014

Uh oh!

jorisvandenbossche commented Nov 10, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Nov 11, 2014

Uh oh!

immerrr commented Nov 11, 2014

Uh oh!

jreback commented Nov 11, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Nov 11, 2014

Uh oh!

immerrr commented Nov 18, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

immerrr commented Nov 19, 2014

Uh oh!

jreback commented Nov 19, 2014

Uh oh!

shoyer commented Nov 19, 2014

Uh oh!

Uh oh!