-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Series.median() treats NaT as INT_MIN #8617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
nothing to do with numpy, this is an error here: https://github.com/pydata/pandas/blob/master/pandas/core/nanops.py#L281(just take out the mask and it should work), the mask is getting overwritten |
Yeah, it's not the fix that's the problem (though you do have to carefully propagate the reshaping of the input to the mask; you can't just close it into |
@ischwabacher I fixed all of this for 0.15.0 (aside from this one), which I guess was just not completely tested. You can fix upstream if you want but tests don't rely on numpy very much for these tests. |
It looks like In [81]: pd.Series([0, pd.tslib.iNaT], dtype='M8[ns]').max(skipna=False)
Out[81]: Timestamp('1970-01-01 00:00:00')
In [82]: pd.Series([0, pd.tslib.iNaT], dtype='m8[ns]').max(skipna=False)
Out[82]: Timedelta('0 days 00:00:00')
In [83]: pd.Series([0, pd.tslib.iNaT], dtype='M8[ns]').min(skipna=False)
Out[83]: NaT
In [84]: pd.Series([0, pd.tslib.iNaT], dtype='m8[ns]').min(skipna=False)
Out[84]: NaT |
for these dtypes skipna=False is not well defined by definition you are not using a mask so the values are the values |
Really? Why aren't we treating Also, if |
so the min/max for pandas is NOT numpy. so treatment can be different, though not w/o a good reason. (like numpy doesn't handle anything NaT related correctly). usually pretty good with NaN. Don't test directly with numpy. Write a function that returns the correct result. |
And then merge it into numpy. :) I really think that getting the |
well you can touch numpy if you would like :). We are going to bypass numpy to support integer NA, using |
Welp, I guess that's a thing. |
@jreback It seems that the bug had been fixed. In [9]: s = pd.Series([0, pd.tslib.iNaT], dtype='m8[ns]')
In [10]: s.median()
Out[10]: Timedelta('0 days 00:00:00')
In [11]: s.min()
Out[11]: Timedelta('0 days 00:00:00')
In [12]: s.max()
Out[12]: Timedelta('0 days 00:00:00')
In [13]: s = pd.Series([0, pd.NaT], dtype='m8[ns]')
In [14]: s.median()
Out[14]: Timedelta('0 days 00:00:00')
In [15]: s.min()
Out[15]: Timedelta('0 days 00:00:00')
In [16]: s.max()
Out[16]: Timedelta('0 days 00:00:00') |
More tests may be needed. |
xref #12992 can you do a pr for some tests? |
Fixing this will be easier once numpy/numpy#5222 is fixed.
The text was updated successfully, but these errors were encountered: