-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: add per axis, per level indexing with tuples using loc #6134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think accept as a side issue, we can support/eliminate xs by doing this:
|
@@ -725,6 +732,7 @@ def _getitem_lowerdim(self, tup): | |||
if len(new_key) == 1: | |||
new_key, = new_key | |||
|
|||
# This is an elided recursive call to loc (always?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, that's why name is their, you see this with any multi-axis indexing, e.g.df.iloc[3:5,1:2]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
|
||
return ranges | ||
|
||
def _spec_to_array_indices(ix, specs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a special case of core/index.py/MultiIndex/_get_loc_level
which is a big monster....
your routine nicely encapsulates things and is pretty...should maybe be a method of MultiIndex? (in the future maybe integrate with _get_loc_level
, but I think that has lots of special cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw some terrible things in the indexing code. It's practically unreadabele. I fear
that by plugging it in there, we'll remain stuck where we only get a soul brave enough
to improve indexing things once every 18 months. or you have to do it.
_get_loc_level returns a slice right? If I'm not wrong, it returns a slice relative to the
whole array, rather then per level, which I find easier to grok code wise,
and aides the lazy index generation later on. Not that we have a lazy consumer to plug it into.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, how about a compromise. Maybe something like this:
class MultiIndexSlicer(object):
def __init__(self, labels):
self.labels = labels
def get_indexer(self, key):
# this basically calls your 2 functions
# but keeps the specs internal
# also, I think allows one to do _get_loc_level as a separate method call (when this gets refactored
indexer = MultiIndexSlicer(labels).get_indexer(key)
return self.obj.iloc[indexer]
(could also pas obj
and axis
so this MIS object would have more state...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks fine. It's a style thing, but I prefer self-contained functions that make
good building blocks over monster objects.
I know the None convention is used elsewhere, but it seems awful strange to specify None, |
actually, maybe you can just to and empty slice, e.g. |
You know it doesn't. Alternativelty. define something short in the pd namespace to be |
ok...I would be ok with maybe need to think of something good |
But a big 👍 on this....I think sorely needed. also FYI, setting shouldn't be too difficult once getting is done. |
cc @dragoljub as I know you are wanting this |
Glad to hear it. I know there's the copy problem, but I'm not sure how you actually get past that. |
Actually, once the indexer is found setting is a quite striaghtforward assignment of locations to array kind of similar to
which is already handled I agree the copy problem is tricky. |
YES! I do want this. This will make my multi-index DFs much more user friendly, I may be able to avoid having duplicate index and column data moving forward. 👍 Not to get sidetracked but why does the empty slice not work? 😕 |
the ':' syntax only works in the [ ] which is a problem of we need to have multi axis and multi levels the slice(None) syntax works fine to indicate give me a full slice for that level (iow don't exclude anything) I think that is kind of cumbersome and I stead maybe something pd.allslice to represent more of a syntax thing |
If you're asking why you can't do |
allslice? too spicy. |
After vectorizing it's pretty evenly matched with passing |
|
could work |
Just having the option to do Will this support indexing with a shorter tuple than the number of levels? That would be a nice feature to have.
maybe: |
You can omit trailing levels right now, it'll work. The example shows this, look at the last, 'D', level. |
This is really good, MI slicing atm is fiddly... FYI The only toplevel constant atm is pd.NaT. I think pd.Alls would work be better, and less confusing with all. I was going to (jokingly) suggest passing the function all (or maybe not) :s, but interestingly...
Users would have to be careful not to accidentally type in wrong thing here if we go with pd.something (I guess that's always the case). |
What about leveraging the fact that Python is case sensitive and using Maybe accepting a string argument(s) like 'all' in place of a direct alias of |
Let's leave that for now and pick it up in a subsequent PR. no clear winner. |
on the allslice,alls, ALL, I suggest we punt and use a non-official @jtratner has used this trick in several places. I like it. |
it moves a collection of functions into an object where they have a logical functionality, grouping them for ease of understanding. One of the functions is purely internal (even to the indexing routines) and thus just makes the indexing module even more complicated to understand. This breaks with the pattern all of the functionality grouped according to what it does in an object. Furthermore you may need to delegate functionality for different index types and/or setting which you might need to keep some state. I get your approach and what I am suggesting is almost a stylistic change, except to someoneone looking at the module will see a bunch of objects and then a bunch of not-short functions. |
how would you actually write the are we against just using |
That makes sense, basically we already agreed on this here. Grouping by object is fine (and true, consistent with the rest of pandas) and I like the idea of keeping the dispatch logic inside such a thing, that re how to write
I dislike None, yeah. And I don't think we need to make a call on the matter this way. We don't really have a |
The only issue with the slice object thing is that I don't want to hold this up for |
u can refactor would be great I think this warrants merging soon as their are a number of things to test/fix after (setting, doc changes, tests with all index types ) - maybe open a new issue with a checklist for this as a master issue let's open a new issue to defer the ALL change (or not change) |
I suggested refactroring later or waiting till you put it in place. Where did We can stick with |
ok np |
I'll add tests and a vbench, and ping you before merging. If you choose to |
@jreback as discussed, closing this for you to pick up and rework as you see fit. |
fine...go ahead and merge when you are ready with tests and such |
oh...thought you were going to add tests/vbench...i can pick up after its in master |
did you have any tests / vbench already done? |
@y-p, @jreback, I'm sorry for barging in without looking deeply into the back-story, but did you consider something like this? In [1]: class IndexMaker(object):
def __getitem__(self, *args):
return args
...:
In [2]: idx = IndexMaker()
In [3]: idx[:,:,'foo':'bar']
Out[3]:
((slice(None, None, None),
slice(None, None, None),
slice('foo', 'bar', None)),) |
@immerrr what does this buy you? It's not a problem creating the syntax at all |
@jreback, it seemed to me that the fact that colon-syntax wasn't directly supported in multilevel lookups forcing users to write something like Given how easy is the implementation, it surprised me that I hadn't seen that come up in the discussion and I suspected I'd missed something in the back-story which ruled out this option. Hence the question. |
@immerrr with a slight modification that would work (needs to return the [0] of the tuple, but that's just a detail. The reason it won't work directly is since multi-axis selection needs to work, the following
however rewriting
maybe do this as something like: e.g.
|
@cpcloud @jorisvandenbossche @dragoljub thoughts on this suggestion by @immerrr ? |
It does looks compelling as a convince function to generate tuples of slices. 👍 My most common use case would probably be to specify fixed values for a few levels and just return all the other levels. The key of Because multi-index is (for lack of a better word) a bunch of tuples under the hood, I like the idea of passing explicit tuples for each axis indexer. That being said if I need to start slicing and dicing my levels I could get used to @immerrr 's suggesion. 😄 Looking at them side by side I'm not sure which I prefer. Can we have both? 😬
|
already added....I am calling it |
@dragoljub FYI, I have put up some dev builds of master for windows....pretty easy now that its automated.....so http://pandas.pydata.org/pandas-build/dev/ |
I'm +1 on |
Wow all, really nice thread! I learned a lot reading it, thank you. I too am looking forward to such a feature for slicing into |
@jreback, sure, I was aiming for a drop-in replacement for tuple of slices, but if you need that self parameter outside you can have it as well. In fact your suggestion has sparkled an idea that I'm going to post as a separate issue for discussion. |
@dragoljub you are advocating for also allowing All or ALL as a null slice? easy to create a pd.IndexAll that u can then alias |
Yea that's fine. Let's stick with one good way to do it: pd.IndexSlice. I like that I can always build my one tuples if needed. |
Selection only. Did not check (nor care) about performance at this point.
Full Tests, vbenches and docs to follow in due course.
Please hold off on PEP8 comments or suggestions on how this could be made more complex.
I'd like to get something functional, solid and understandable in master fairly early in 0.14.
@jreback, I don't often wade this deep into the gore. Would welcome your review and suggestions.
Example of setting usage:
http://stackoverflow.com/questions/21539260/assign-to-multiindex-slice/21539479#21539479
xref: #5641, #3057, #4036, and others