-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
First, here's the example DataFrame
0 1 2 3 4 5 6 7
first second third
bar one three 4 9 7 8 5 0 7 8
four 6 8 1 5 9 9 1 7
two three 7 5 2 6 7 8 5 9
four 0 8 8 5 3 5 8 3
baz one three 6 0 8 0 0 9 8 8
four 9 3 2 0 2 7 4 9
two three 0 3 7 7 4 3 7 0
four 6 3 2 8 3 9 7 8
foo one three 6 7 3 7 3 0 3 6
four 5 8 0 8 1 5 1 5
two three 2 0 8 2 8 1 8 3
four 9 0 2 7 0 6 8 3
qux one three 1 2 5 5 0 7 0 1
four 6 6 7 0 0 4 5 3
two three 1 8 2 8 7 5 7 5
four 1 1 3 8 8 6 0 3
For DataFrame
s with MultiIndex
ed rows, pandas allows this type of indexing
df.loc[('foo','bar'), ('one','two'), ('three','four')]
To be taken to mean
df.loc[(('foo','bar'), ('one','two'), ('three','four')), :]
But this type of indexing is ambiguous in the case when the number of indexing tuples is 2 since
df.loc[('foo','bar'), ('one','two')]
could mean incomplete indexing as in
df.loc[(('foo','bar'), ('one','two')),:]
or row,column indexing as in
df.loc[(('foo','bar'),), (('one','two'),)]
I appreciate that there is already a warning for this in the documentation, but I wonder if the functionality is worth the complications it adds to the code/docs.
Personally, I would suggest offloading the responsibility of complete indexing on a MultiIndex DataFrame to the user (obviously this doesn't apply to Series
as they are 1d so to speak). This would take away the minor syntactical convenience of not specifying the column index, but it simplifies the code and gives the user only one way to index on a MultiIndex DataFrame (which makes usage less confusing).
The consequence to the user in the specific case of selecting multiple levels of a row-MultiIndex on a DataFrame is that instead of writing
df.loc['foo','one']
they would have to write
df.loc[('foo','one'), :]
And, in the syntactically worst case, instead of writing
df.loc[('foo','bar'), ('one','two'), ('three','four')]
they would have to write
df.loc[(('foo','bar'), ('one','two'), ('three','four')), :]
I'm fairly new to pandas (don't think I started using it until v0.16), so I realize I may be missing the bigger picture. If so, enlighten me!