Proposal: Deprecating support of incomplete indexing on MultiIndexes

First, here's the example DataFrame

``` python
                    0  1  2  3  4  5  6  7
first second third                        
bar   one    three  4  9  7  8  5  0  7  8
             four   6  8  1  5  9  9  1  7
      two    three  7  5  2  6  7  8  5  9
             four   0  8  8  5  3  5  8  3
baz   one    three  6  0  8  0  0  9  8  8
             four   9  3  2  0  2  7  4  9
      two    three  0  3  7  7  4  3  7  0
             four   6  3  2  8  3  9  7  8
foo   one    three  6  7  3  7  3  0  3  6
             four   5  8  0  8  1  5  1  5
      two    three  2  0  8  2  8  1  8  3
             four   9  0  2  7  0  6  8  3
qux   one    three  1  2  5  5  0  7  0  1
             four   6  6  7  0  0  4  5  3
      two    three  1  8  2  8  7  5  7  5
             four   1  1  3  8  8  6  0  3
```

For `DataFrame`s with `MultiIndex`ed rows, pandas allows this type of indexing

`df.loc[('foo','bar'), ('one','two'), ('three','four')]`

To be taken to mean

`df.loc[(('foo','bar'), ('one','two'), ('three','four')), :]`

But this type of indexing is ambiguous in the case when the number of indexing tuples is 2 since

`df.loc[('foo','bar'), ('one','two')]`

could mean incomplete indexing as in

`df.loc[(('foo','bar'), ('one','two')),:]`

or row,column indexing as in

`df.loc[(('foo','bar'),), (('one','two'),)]`

I appreciate that there is already a warning for this in the [documentation](http://pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers), but I wonder if the functionality is worth the complications it adds to the code/docs.

Personally, I would suggest offloading the responsibility of complete indexing on a MultiIndex DataFrame to the user (obviously this doesn't apply to `Series` as they are 1d so to speak). This would take away the _minor_ syntactical convenience of not specifying the column index, but it simplifies the code and gives the user only one way to index on a MultiIndex DataFrame (which makes usage less confusing).

The consequence to the user in the specific case of selecting multiple levels of a row-MultiIndex on a DataFrame is that instead of writing

`df.loc['foo','one']`

they would have to write

`df.loc[('foo','one'), :]`

And, in the syntactically worst case, instead of writing

`df.loc[('foo','bar'), ('one','two'), ('three','four')]`

they would have to write

`df.loc[(('foo','bar'), ('one','two'), ('three','four')), :]`

I'm fairly new to pandas (don't think I started using it until v0.16), so I realize I may be missing the bigger picture. If so, enlighten me!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Proposal: Deprecating support of incomplete indexing on MultiIndexes #10574

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Proposal: Deprecating support of incomplete indexing on MultiIndexes #10574

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions