Skip to content

DOC: get_indexer returns non-matching with -1 positional #38482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chrisjdixon opened this issue Dec 15, 2020 · 12 comments · Fixed by #43738
Closed

DOC: get_indexer returns non-matching with -1 positional #38482

chrisjdixon opened this issue Dec 15, 2020 · 12 comments · Fixed by #43738
Assignees
Labels
Docs good first issue Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@chrisjdixon
Copy link

An answer on my recent SO post used get_indexer, which worked great, but I soon learned if the indexer doesn't find a match it seems to use the last entry without raising an error:

df = pd.DataFrame({"ID": [6, 2, 4],
                   "to ignore": ["foo", "whatever", "idk"],
                   "value": ["A", "B", "asdf"],
                   })

df2 = pd.DataFrame({"ID_number": [1, 2, 3, 4, 5, 6],
                    "A": [0.91, 0.42, 0.85, 0.84, 0.81, 0.88],
                    "B": [0.11, 0.22, 0.45, 0.38, 0.01, 0.18],
                    })

df2 = df2.set_index('ID_number')
df['new_col'] = df2.values[df2.index.get_indexer(df['ID']), df2.columns.get_indexer(df['value'])]

I presumed the row with "asdf" would have raised an error, not returned the value for "B". This was problematic because I unwittingly processed data incorrectly and I enjoy being employed.

My interpretation of the documentation was that if method was not supplied it would be "default: exact matches only". Supplying method = 'default' and tolerance = 0 was also not accepted when not using another method.

Also, I'm no serious programmer so I have likely misunderstood something and am only trying to help. Please feel free to correct / tell me to go away

@chrisjdixon chrisjdixon added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 15, 2020
@jreback
Copy link
Contributor

jreback commented Dec 15, 2020

-1 mark index positions that are not found. the user of this (more or less internal routine) must take care to filter these if they are not wanted.

Its in the example, but i suppose could be more prominent in the doc-string. changing this issue to a documentation one.

@chrisjdixon if you'd like to do a pull-request would be great .

@jreback jreback added Docs Indexing Related to indexing on series/frames, not to indexes themselves and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 15, 2020
@jreback jreback changed the title BUG: get_indexer appears to accept inexact matches by default, don't know how to force exact matches only DOC: get_indexer returns non-matching with -1 positional Dec 15, 2020
@chrisjdixon
Copy link
Author

@jreback I'm sorry mate I'd be happy to make a pull request but I'm not very good at this and I don't know why -1 index positions would ever be desired or how they could be filtered out, so I wouldn't know what to ask for in the documentation / pull request...

Could you please elaborate or point me in the right direction?

@phofl
Copy link
Member

phofl commented Jan 23, 2021

The docs say

Integers from 0 to n - 1 indicating that the index at these positions matches the corresponding target values. Missing values in the target are marked by -1.

under the Returns section. So I think we can close this one?

@phofl phofl added the Closing Candidate May be closeable, needs more eyeballs label Jan 23, 2021
@chrisjdixon
Copy link
Author

@phofl @jreback I agree that existing documentation accurately explains things but I don't find the explanation beginner friendly.

Despite this potentially being too verbose for guys like you, as a layman I would've benefited from a more prominent warning explicitly describing the outcome, perhaps something like: 'Note unmatched entries return -1, which when applied as an index returns series' last values without raising errors. If this is undesired it's important you filter out -1s`.

I'd like to help but I don't know anything about software development and couldn't figure out how to make a suitable pull request. If a more elaborate warning like this is appropriate and if someone could please point me in the right direction I am happy to try.

@MarcoGorelli
Copy link
Member

Thanks @chrisjdixon for the report/suggestion, but wouldn't such a warning be disruptive to users intentionally using the function knowing it returns -1 if no match is found?

If you want to clarify the docstring, then here's the contributing guide, feel free to ask if anything's unclear

@chrisjdixon
Copy link
Author

@MarcoGorelli you'd know better than me but my expectation was the warning would show in the documentation only like it does for pd.Index.values, not when running the code.

I trust but don't understand why -1 values might sometimes be desired, but I suppose if they would typically not be desired / expected, or if users unwittingly processing data incorrectly would be particularly problematic (as in my case), I'd think the warning as justifiable. But I'm new to pandas and programming so others would know best.

I've had a look at the contributing guide but there's a huge amount of basic stuff / git that I still don't understand. I don't want to mess you guys around but I am very busy with other things right now and don't have time to quickly learn it all, so I might need a while. In the meantime if anyone more knowledgeable can be bothered quickly making the change I'd appreciate it.

@MarcoGorelli
Copy link
Member

my expectation was the warning would show in the documentation only like it does for pd.Index.values, not when running the code.

Ah yes, no objection to that - a pull request to add such a warning would be welcome if you (or anyone else following along) wanted to submit one, if you comment "take" the issue will be assigned to you

@MarcoGorelli MarcoGorelli removed the Closing Candidate May be closeable, needs more eyeballs label Feb 1, 2021
@shuaggar-sys
Copy link

take

@phofl
Copy link
Member

phofl commented Apr 17, 2021

Removed milestone since pr is closed and stale

@chrisjdixon
Copy link
Author

Are we making progress?

@shuaggar-sys created this pull request and it looked like that made progress but @MarcoGorelli closed it and I don't understand why. Looks like @shuaggar-sys has disappeared but I don't know what else needed to be done. Is there anything I can do?

@jreback
Copy link
Contributor

jreback commented Apr 25, 2021

@chrisjdixon or others are welcome to take over that PR

@DhruvBShetty
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs good first issue Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
7 participants