Skip to content

BUG: Pandas loc can result in ambiguous result not caught by an exception #42603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 3 tasks
komodovaran opened this issue Jul 19, 2021 · 3 comments
Open
2 of 3 tasks
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Needs Discussion Requires discussion from core team before further action

Comments

@komodovaran
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample

Tested on Pandas 1.3.0.

import pandas as pd
import numpy as np
import pytest

sample_name = ["foo", "foo", "foo"]
sample_number = [3, 3, 3]

y_pred = np.array([[0.1, 0.1, 0.1], [0.9, 0.9, 0.9]]).reshape(-1, 2)

# a df with columns '0' and '1'
df = pd.DataFrame(y_pred, index=[sample_name, sample_number])

# Returns a df, as expected
c1 = df.loc["foo", 3]
print(c1)
assert hasattr(c1, "columns")

# Key error, as expected, because there's no entry 2
with pytest.raises(KeyError):
    _c2 = df.loc["foo", 2]

# Now it's a Series of (foo, column 0) instead of a DataFrame!
c3 = df.loc["foo", 0]
print(c3)
assert hasattr(c3, "columns")

Problem description

When converting arrays to DataFrames, the columns will automatically be enumerated. Obviously this problem is somewhat circumventable by not doing all these implicit conversions. However, it doesn't warn you of this, and the case I describe above can return a Series or a DataFrame, depending on which index you query for, which, at least for me, created a decent amount of confusion.

Expected Output

I'd imagine it should raise an ambiguity error, similar to #21080, like

 ValueError: '0' is both an index and a column label.
@komodovaran komodovaran added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 19, 2021
@phofl
Copy link
Member

phofl commented Jul 20, 2021

This is a "feature" I think, but not happy with it either. Is not really ambiguous, but you coud avoid this easily through something like this:

df.loc[("foo", 0), :]

if you would like to select the index or

df.loc[("foo", slice(None)), 0]

if you would like to select the column.

This is much better to read imo.

@rhshadrach
Copy link
Member

I think the issue is when a tuple is provided to .loc, is that specifying levels of a MultiIndex or an index-column pair? This appears to be value-dependent behavior to me, not sure I would describe it as a feature.

@phofl
Copy link
Member

phofl commented Jul 23, 2021

Based on the ambiguity error Shown above I think this is intended, I don‘t think this is good either

@phofl phofl added Indexing Related to indexing on series/frames, not to indexes themselves Needs Triage Issue that has not been reviewed by a pandas team member and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 25, 2021
@mroeschke mroeschke added Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

4 participants