Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API/BUG: Series.argmin/max with all-NaN data returns -1 ? #33941

Closed
jorisvandenbossche opened this issue May 2, 2020 · 6 comments · Fixed by #54170
Closed

API/BUG: Series.argmin/max with all-NaN data returns -1 ? #33941

jorisvandenbossche opened this issue May 2, 2020 · 6 comments · Fixed by #54170
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Discussion Requires discussion from core team before further action

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented May 2, 2020

Ran into this while looking at #27801.

Both numpy and pandas raise an error for empty arrays/Series in argmin and argmax:

>>> pd.Series([], dtype=float).argmin()   
...
ValueError: attempt to get argmin of an empty sequence

>>> np.array([], dtype=float).argmin()  
...
ValueError: attempt to get argmin of an empty sequence

However, when having all NaN data, we see a different behaviour:

>>> pd.Series([np.nan, np.nan], dtype=float).argmin() 
-1

>>> np.array([np.nan, np.nan], dtype=float).argmin() 
0

Does somebody have an explanation of why this would return -1 ?

In principle, in pandas, argmin has a skipna=True keyword, so for pandas I would expect that an all-NaN Series behaves the same as an empty Series.

@jorisvandenbossche jorisvandenbossche added API Design Numeric Operations Arithmetic, Comparison, and Logical operations labels May 2, 2020
@jorisvandenbossche
Copy link
Member Author

So this is quite explicit in the code of nanops.nanargmin/max:

pandas/pandas/core/nanops.py

Lines 1171 to 1176 in 07402f6

if skipna:
if mask.all():
result = -1
else:
if mask.any():
result = -1

and those lines haven't been touched in 8 years .. c1307b6

This behaviour is used in idxmin/max:

pandas/pandas/core/series.py

Lines 2005 to 2008 in 07402f6

i = nanops.nanargmin(self._values, skipna=skipna)
if i == -1:
return np.nan
return self.index[i]

The logic behind "-1" might be that in general in pandas indexers, -1 is used for missing values (and argmin kind of returns an indexer). But still, that doesn't make much sense to me for a user-facing argmin/max.

@jreback
Copy link
Contributor

jreback commented May 2, 2020

if you recall we separated out argmax from idxmax a while back

since argmax is not that useful in pandas (you almost always want idxmax) it’s likely understated

not averse to changing to more common behavior to raise for argmax

however min/max imho should just work in empty (no raise) as that is a common pandas pattern

@jorisvandenbossche
Copy link
Member Author

if you recall we separated out argmax from idxmax a while back

Yes, I know. But, which actually means that the new user-facing behaviour of argmin/argmax is rather recent (only from pandas 1.0), which is maybe a good reason that we can still fix this as a bug instead of deprecating.

Now, the actual behaviour of nanops.nanargmin/max is from long before this idx/arg split. It even stems from before pandas 0.13, when Series stopped being an ndarray subclass and changed the behaviour of argmin/max. Now, I suppose before 1.0, the nanops version was only used internally (eg in the idxmin/max implementation, as shown above)

@jorisvandenbossche
Copy link
Member Author

Now, I suppose before 1.0, the nanops version was only used internally (eg in the idxmin/max implementation, as shown above)

On second thought, that's not fully true, since the Index classes already have been using the nanops version in their argmin/max methods (and does already returns -1 for a long time in this case).

@rhshadrach
Copy link
Member

if you recall we separated out argmax from idxmax a while back

since argmax is not that useful in pandas (you almost always want idxmax) it’s likely understated

...

however min/max imho should just work in empty (no raise) as that is a common pandas pattern

Should idxmax also work on empty Series - both truly empty Series and Series that are entirely null? Both Series.idxmax and DataFrame.idxmax currently raise, but their groupby equivalents do not (except for the special case of a categorical grouper with observed=False and missing categories).

@jreback - I'm curious as to what common pattern you refer to above. Is it the idea that producing summary statistics (e.g. DataFrame.describe) on any data shouldn't fail?

@jbrockmendel
Copy link
Member

Discussed on last week's dev call, agreed these should be deprecated to raise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants