Skip to content

CLN: implement xs in terms of loc #6249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Feb 4, 2014 · 50 comments
Closed

CLN: implement xs in terms of loc #6249

jreback opened this issue Feb 4, 2014 · 50 comments

Comments

@jreback
Copy link
Contributor

jreback commented Feb 4, 2014

after #6134 we can drop xs and implement directly. Thus we should deprecate it.

.xs(key, level=n)

is roughly equivalent to this

indexer = tuple([slice(None)]*(n-1) + key)]
axis_indexer = [ slice(None) ] * self.ndim
axis_indexer[axis] = indexer
self.loc[tuple(axis_indexer)]

roughly because needs to handle .xs full argument set

  • drop_level (need to surround the indexer tuple with a list or not)
  • n could be a level name
  • n = 1 (needs a test)
@cpcloud
Copy link
Member

cpcloud commented Feb 4, 2014

I don't think we should deprecate it. Why not just implement xs with the above?

@jreback
Copy link
Contributor Author

jreback commented Feb 4, 2014

definitly should reimplement with above...

deprecate because then the selection becomes less confusing, e.g. want to get a value by location, then use .loc, don't have to think/worry if its a cross-section or not

see #5421

@cpcloud
Copy link
Member

cpcloud commented Feb 4, 2014

Yeah, I just abhor having to type four lines instead of a call to xs. Don't think loc et al can be overloaded any more before reaching critical mass

@jreback
Copy link
Contributor Author

jreback commented Feb 4, 2014

no....that's the implementation....after #6134

you just do:

df.loc[key] for df.xs(key,level=0)
df.loc[(slice(None),key)] for df.xs(key,level=1)
df.loc[(slice(None),slice(None),key)] for df.xs(key,level=2)

etc I suppose it has its uses instead of typing empty levels..(but that's about it)

@cpcloud
Copy link
Member

cpcloud commented Feb 4, 2014

How will you disambiguate selecting a column by name with loc vs this kind of selection? Python makes no distinction between colon and none slices except at the syntactic level so your above example is equivalent to df.loc[:, key] which already does column selection. There's no way to tell apart what you've written above from that example. Am I missing something?

@jreback
Copy link
Contributor Author

jreback commented Feb 4, 2014

hmm...this is a very good point..#6134 won't work as advertised then...

easy way to 'fix' this is to allow a tuple/list for the key in xs, but then the assigning is problematic

how do we feel about

df.loc(level=1)[key]

or to disambiguate

df.loc(axis=0)[slice(None)

hmm....

@jreback
Copy link
Contributor Author

jreback commented Feb 4, 2014

@y-p want to weigh in here, I didn't think about this with #6134, how to disambiguate?

e.g.

In [1]: mi = pd.MultiIndex.from_product([['A0', 'A1', 'A2'],['B0', 'B1']])

In [2]: mi.get_values()
Out[2]: 
array([('A0', 'B0'), ('A0', 'B1'), ('A1', 'B0'), ('A1', 'B1'),
       ('A2', 'B0'), ('A2', 'B1')], dtype=object)

In [3]: df = DataFrame(np.random.randn(6,2),index=mi,columns=list('AB'))

In [4]: df
Out[4]: 
              A         B
A0 B0 -0.195160  0.397622
   B1 -0.081161 -1.433219
A1 B0  0.443265  0.414330
   B1 -2.500851  0.458434
A2 B0 -1.358423  0.559703
   B1  0.365160  0.846487

raises (even though we'd like it to work)

In [7]: df.loc[(slice(None),'B1')]

@ghost
Copy link

ghost commented Feb 4, 2014

One tuple per axis.

df.loc[(slice(None),'B1'),:]

as advertised in the #6134 example.

@jreback
Copy link
Contributor Author

jreback commented Feb 4, 2014

doesn't work, its mis-interpreted because you cannot disambiguate

@ghost
Copy link

ghost commented Feb 4, 2014

Disambiguate with what? give me an example of something that's indistinguishable
from df.loc[(slice(None),'B1'),:] but already means something different.

@cpcloud
Copy link
Member

cpcloud commented Feb 4, 2014

df.loc[(slice(None), 'B1'), :] -> df.loc[:'B1', :]which as you know is rows up to and including 'B1' from the row index.

@ghost
Copy link

ghost commented Feb 4, 2014

That's so broken. But, It also doesn't seem to be true:

In [13]: class A(object):
    ...:     def __getitem__(self,*args):
    ...:         return args
    ...: df=A()
    ...: print df[(slice(None), 'B1'), :]
    ...: print df[:'B1', :]
(((slice(None, None, None), 'B1'), slice(None, None, None)),)
((slice(None, 'B1', None), slice(None, None, None)),)

am I missing something?

@jreback
Copy link
Contributor Author

jreback commented Feb 4, 2014

with #6134

In [4]: df.loc[(slice(None),'B1'),:]
Out[4]: 
Empty DataFrame
Columns: [A, B]
Index: []

[0 rows x 2 columns]

@cpcloud
Copy link
Member

cpcloud commented Feb 4, 2014

The difference is that you're passing in a tuple of tuples, so I was wrong about the equivalence.

@ghost
Copy link

ghost commented Feb 4, 2014

So you withdraw the claim that xs is still necessary, or are there other corner cases?

@cpcloud
Copy link
Member

cpcloud commented Feb 4, 2014

There's no syntactic equivalent of nested tuples of slices, but I think allowing nested tuples as well as flat ones for indexing is going to be both confusing for users and a maintenance nightmare

@ghost
Copy link

ghost commented Feb 4, 2014

What are nested tuples? show me.

@cpcloud
Copy link
Member

cpcloud commented Feb 4, 2014

tuples of tuples

@ghost
Copy link

ghost commented Feb 4, 2014

Where did I pass one in?

@cpcloud
Copy link
Member

cpcloud commented Feb 4, 2014

Look at your example

    ...: print df[(slice(None), 'B1'), :]
    ...: print df[:'B1', :]
(((slice(None, None, None), 'B1'), slice(None, None, None)),)

first one is nested, second is not

@jreback
Copy link
Contributor Author

jreback commented Feb 4, 2014

@y-p your example is tuple of tuples (not passed explicity, but generates it)

@jreback
Copy link
Contributor Author

jreback commented Feb 4, 2014

@cpcloud actually I think this can work, but need to careful of figure out what is an axis part of the tuple

@cpcloud
Copy link
Member

cpcloud commented Feb 4, 2014

I'm trying to find the python docs on this .... harder than i imagined

@ghost
Copy link

ghost commented Feb 4, 2014

I don't understand.
hasn't ix,xs, loc and iloc supported it[(tuple),something] since forever? why is this
confusing all of a sudden? or novel?

@cpcloud
Copy link
Member

cpcloud commented Feb 4, 2014

@jreback I just think it's confusing for users (maybe it wouldn't be user facing?) and seems like a huge burden for anyone to maintain. It seems like missing parens are going to cause all sorts of confusino

@cpcloud
Copy link
Member

cpcloud commented Feb 4, 2014

here's a decent ref

@jreback
Copy link
Contributor Author

jreback commented Feb 4, 2014

df.loc[(slice(None),'B1'),:]

I think is very natural actually; if you have multi-levels it is the only way to go

e.g. imagine

df.loc[(slice(None),'B1'),('column_level_1','columns_level2')]

@ghost
Copy link

ghost commented Feb 4, 2014

This discussion is ridiculous and that objection is vapid.
How is it confusing to users to have syntax that's been around since pandas 0.8.0 or something?

missing parens? what?

@cpcloud
Copy link
Member

cpcloud commented Feb 4, 2014

@y-p I guess it does....nevermind then

@cpcloud
Copy link
Member

cpcloud commented Feb 4, 2014

@y-p Sheesh. My mistake. Vapid is a bit much don't you think? Forget it.

@jreback
Copy link
Contributor Author

jreback commented Feb 4, 2014

@y-p I think their is a bug in the interpreation of

df.loc[(slice(None),'B1'),:]

should this not return

In [16]: df.xs('B1',level=1,drop_level=False)
Out[16]: 
              A         B
A0 B1 -0.856515 -1.153493
A1 B1 -0.460835 -2.063314
A2 B1 -1.963562 -0.592727

[3 rows x 2 columns]

@cpcloud
Copy link
Member

cpcloud commented Feb 4, 2014

And by missing parens I meant df.loc[(slice(None),'B1'),:] vs df.loc[slice(None),'B1',:].

@ghost
Copy link

ghost commented Feb 4, 2014

I'm in a bad mood today, my apologies for snapping at you. Not ok.

@cpcloud
Copy link
Member

cpcloud commented Feb 4, 2014

But as I said my objection was unwarranted. Carry on.

@ghost
Copy link

ghost commented Feb 4, 2014

do you agree that indexing with tuples has been around for a long time?

@jreback
Copy link
Contributor Author

jreback commented Feb 4, 2014

@cpcloud the missing parens thing will be caught (maybe could have a better message though)

@cpcloud
Copy link
Member

cpcloud commented Feb 4, 2014

@y-p Yes, they have. I guess I just never used them. Availability heuristic taking over it seems

@ghost
Copy link

ghost commented Feb 4, 2014

I really feel bad for getting carried away in the argument just then. I apologize again.
Let me present my reasoning: this is not new syntax, nor new functionality as such.
#6134 merely implements an existing case, which we used to raise on.

In fact, several users have complained that they expect this syntax to work,
since it's logically consistent with the rest of our indexing style. I did not add new
logic, I merely inserted a handler to an existing case which used to be ignored.

Since the tuple indexing case already exists, I don't find the "maintenance" objection
convicing, and since users have complained that this doesn't work when they expect
it too, I don't think the argument that it'll confuse users holds up either.
The rest of your objections I think turned out to be invalid in a technical sense.

It's frustrating for me to have to fight such a barrage, but I've been even more
difficult on other occasions. So i guess it serves me right.

@cpcloud
Copy link
Member

cpcloud commented Feb 4, 2014

@y-p Don't worry about it. You're right. Statement(s) rescinded.

@ghost
Copy link

ghost commented Feb 4, 2014

I'm going to step away from github for a few days, Clearly I'm starting to
take things waaay too seriously.

@jreback
Copy link
Contributor Author

jreback commented Feb 9, 2014

@cpcloud in #6301 I used the nested_tuple concept....which is what it actual is!

I think we can just leave .xs in, I put a comparison of the semantics and for certain simpler cases they are 'nicer' (see in docs of #6301), so going to change this issue to more of a cleanup and implementation of xs via the new .loc multiindex slicers

@jreback jreback self-assigned this Feb 9, 2014
@cpcloud
Copy link
Member

cpcloud commented Feb 10, 2014

Cool. This is great.

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Feb 25, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015
@tgarc
Copy link

tgarc commented Jul 17, 2015

@jreback Forgive me if this is the wrong place to put this but I've been trying to implement xs using .loc and wanted to get some feedback.

This is what I currently have (inside of core/generic.py)

    def xs(self, key, axis=0, level=None, copy=None, drop_level=True):
        axis = self._get_axis_number(axis)
        axis_indexer = [ slice(None) ] * self.ndim

        indexer = self._get_axis(axis)

        if not isinstance(indexer,MultiIndex) or (level is None and drop_level):
            # .loc works directly if the key is already ordered and you don't
            # mind dropping levels (or if it's a regular index)
            axis_indexer[axis] = key
            return self.loc[tuple(axis_indexer)]

        if level is None:
            # fill in with slice(None)s to keep from dropping levels
            slicer = [ slice(None) ] * indexer.nlevels
            # Q: is it ok to always expect tuple for a multi-index key?
            for i,k in enumerate(key if type(key) is tuple else (key,)):
                slicer[i] = k
            return self.loc[tuple(slicer)]

        # let get_loc_level handle cases with both key and level
        slicer, new_indexer = indexer.get_loc_level(key
                                                    ,level=level
                                                    ,drop_level=drop_level)
        axis_indexer[axis] = slicer
        result = self.loc[tuple(axis_indexer)]

        # apply the new index to the result to drop levels as necessary
        setattr(result,result._get_axis_name(axis), new_indexer)

        # this could be a view
        # but only in a single-dtyped view slicable case
        result._set_is_copy(self, copy=not result._is_view)

        return result

Running the indexing tests, there are many failures but they all seem to be related to a loop that is created by calling .loc inside of xs. Here's an example and partial traceback

$ nosetests pandas/tests/test_indexing.py:TestIndexing.test_iloc_getitem_multiindex

  File /home/tdos/git/pandas/pandas/tests/test_indexing.py, line 1494, in test_iloc_getitem_multiindex
    xp = mi_int.ix[4].ix[8]
  File /home/tdos/git/pandas/pandas/core/indexing.py, line 70, in __getitem__
    return self._getitem_axis(key, axis=0)
  File /home/tdos/git/pandas/pandas/core/indexing.py, line 920, in _getitem_axis
    return self._get_label(key, axis=axis)
  File /home/tdos/git/pandas/pandas/core/indexing.py, line 86, in _get_label
    return self.obj._xs(label, axis=axis)
  File /home/tdos/git/pandas/pandas/core/generic.py, line 1455, in xs
    return self.loc[tuple(axis_indexer)]
  File /home/tdos/git/pandas/pandas/core/indexing.py, line 1187, in __getitem__
    return self._getitem_tuple(key)
  File /home/tdos/git/pandas/pandas/core/indexing.py, line 700, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File /home/tdos/git/pandas/pandas/core/indexing.py, line 808, in _getitem_lowerdim
    return self._getitem_nested_tuple(tup)
  File /home/tdos/git/pandas/pandas/core/indexing.py, line 880, in _getitem_nested_tuple
    obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
  File /home/tdos/git/pandas/pandas/core/indexing.py, line 1334, in _getitem_axis
    return self._get_label(key, axis=axis)
  File /home/tdos/git/pandas/pandas/core/indexing.py, line 86, in _get_label
    return self.obj._xs(label, axis=axis)
  File /home/tdos/git/pandas/pandas/core/generic.py, line 1455, in xs
    return self.loc[tuple(axis_indexer)]
...
  File "/home/tdos/git/pandas/pandas/core/indexing.py", line 1305, in _getitem_axis
    elif is_bool_indexer(key):
  File "/home/tdos/git/pandas/pandas/core/common.py", line 2126, in is_bool_indexer
    if isinstance(key, (ABCSeries, np.ndarray)):
  File "/home/tdos/git/pandas/pandas/core/common.py", line 72, in _check
    return getattr(inst, attr, '_typ') in comp
RuntimeError: maximum recursion depth exceeded in cmp

So it seems that the feedback loop follows the chain .loc -> _get_label -> _xs -> .loc

Any ideas on next steps? Or is there anything that sticks out in my code as possibly causing the problem?

@jreback
Copy link
Contributor Author

jreback commented Jul 17, 2015

I think this should collapse to just a small amount of code, but i suppose you can iterate on that.

can you point to your branch for this?

@tgarc
Copy link

tgarc commented Jul 17, 2015

@jreback
Copy link
Contributor Author

jreback commented Jul 17, 2015

so when you have an axis=0 indexer (and nothing else), you can collapse it and it will give you the result back, e.g.

obj.loc[('a',slice(None))]

is equiv to

obj.loc['a']

@tgarc
Copy link

tgarc commented Jul 19, 2015

@jreback yeah, I just wanted to get something that works and then simplify it. It seems though that xs is a little deeper ingrained in the library then I thought. I figured it was just a user facing function but it seems that _get_label uses this (defined in the _NDFrameIndexer base class) which is causing some unwanted recursion in the code as _get_label is used by (I think) all of the label based indexers.

I think that _get_label will have to be modified in order to deprecate xs. Thoughts?

@jreback
Copy link
Contributor Author

jreback commented Jul 20, 2015

yeah prob a bit of recursion going on. Prob needs some simplifying

@jreback jreback mentioned this issue Nov 13, 2017
34 tasks
@mroeschke
Copy link
Member

Since xs is out the door in the near future and loc is here to stay, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants