Skip to content

Support Decimal("NaN") is pandas.isna #23530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Nov 6, 2018 · 2 comments · Fixed by #39409
Closed

Support Decimal("NaN") is pandas.isna #23530

TomAugspurger opened this issue Nov 6, 2018 · 2 comments · Fixed by #39409
Labels
Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@TomAugspurger
Copy link
Contributor

Should we do this? See #23284 (comment) for some timings. Personally, I don't think it's worth the cost.

kind master PR ratio
scalar 821 ns 926 ns 1.12 (different result)
object array 1.0 ms 6.6 ms 6.6
decimal array 1.0 ms 2.0 ms 2.0 (different result)

Here's some a patch implementing support.

diff --git a/doc/source/whatsnew/v0.24.0.txt b/doc/source/whatsnew/v0.24.0.txt
index f449ca532..c8c5db611 100644
--- a/doc/source/whatsnew/v0.24.0.txt
+++ b/doc/source/whatsnew/v0.24.0.txt
@@ -1227,6 +1227,7 @@ Missing
 - Bug in :func:`Series.hasnans` that could be incorrectly cached and return incorrect answers if null elements are introduced after an initial call (:issue:`19700`)
 - :func:`Series.isin` now treats all NaN-floats as equal also for `np.object`-dtype. This behavior is consistent with the behavior for float64 (:issue:`22119`)
 - :func:`unique` no longer mangles NaN-floats and the ``NaT``-object for `np.object`-dtype, i.e. ``NaT`` is no longer coerced to a NaN-value and is treated as a different entity. (:issue:`22295`)
+- :meth:`isna` now considers ``decimal.Decimal('NaN')`` a missing value (:issue:`23284`).
 
 
 MultiIndex
diff --git a/pandas/_libs/missing.pyx b/pandas/_libs/missing.pyx
index b87913592..4fa96f652 100644
--- a/pandas/_libs/missing.pyx
+++ b/pandas/_libs/missing.pyx
@@ -1,6 +1,7 @@
 # -*- coding: utf-8 -*-
 
 import cython
+import decimal
 from cython import Py_ssize_t
 
 import numpy as np
@@ -33,6 +34,8 @@ cdef inline bint _check_all_nulls(object val):
         res = get_datetime64_value(val) == NPY_NAT
     elif util.is_timedelta64_object(val):
         res = get_timedelta64_value(val) == NPY_NAT
+    elif isinstance(val, decimal.Decimal):
+        return val.is_nan()
     else:
         res = 0
     return res
@@ -71,6 +74,8 @@ cpdef bint checknull(object val):
         return get_timedelta64_value(val) == NPY_NAT
     elif util.is_array(val):
         return False
+    elif isinstance(val, decimal.Decimal):
+        return val.is_nan()
     else:
         return val is None or util.is_nan(val)
 
diff --git a/pandas/tests/dtypes/test_missing.py b/pandas/tests/dtypes/test_missing.py
index 8f82db69a..0fa738893 100644
--- a/pandas/tests/dtypes/test_missing.py
+++ b/pandas/tests/dtypes/test_missing.py
@@ -1,5 +1,6 @@
 # -*- coding: utf-8 -*-
 
+import decimal
 import pytest
 from warnings import catch_warnings, simplefilter
 import numpy as np
@@ -248,6 +249,43 @@ class TestIsNA(object):
         tm.assert_series_equal(isna(s), exp)
         tm.assert_series_equal(notna(s), ~exp)
 
+    def test_decimal(self):
+        # scalars
+        a = decimal.Decimal(1.0)
+        assert pd.isna(a) is False
+        assert pd.notna(a) is True
+
+        b = decimal.Decimal('NaN')
+        assert pd.isna(b) is True
+        assert pd.notna(b) is False
+
+        # array
+        arr = np.array([a, b])
+        expected = np.array([False, True])
+        result = pd.isna(arr)
+        tm.assert_numpy_array_equal(result, expected)
+
+        result = pd.notna(arr)
+        tm.assert_numpy_array_equal(result, ~expected)
+
+        # series
+        ser = pd.Series(arr)
+        expected = pd.Series(expected)
+        result = pd.isna(ser)
+        tm.assert_series_equal(result, expected)
+
+        result = pd.notna(ser)
+        tm.assert_series_equal(result, ~expected)
+
+        # index
+        idx = pd.Index(arr)
+        expected = np.array([False, True])
+        result = pd.isna(idx)
+        tm.assert_numpy_array_equal(result, expected)
+
+        result = pd.notna(idx)
+        tm.assert_numpy_array_equal(result, ~expected)
+
 
 def test_array_equivalent():
     assert array_equivalent(np.array([np.nan, np.nan]),
@TomAugspurger TomAugspurger added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Dtype Conversions Unexpected or buggy dtype conversions labels Nov 6, 2018
@manoelpqueiroz
Copy link

I would comment that it should be worth revisiting this issue. Decimal('NaN') is a null object, the current pandas behaviour incorrectly assigns pd.isna(Decimal('NaN') as a False value. Operation-wise, it is wrong and it causes problems when performing mathematic operations involving two or more Decimal series.

@TomAugspurger
Copy link
Contributor Author

I think we're more likely to add a proper Decimal extension type based on Arrow's decimal type rather than change this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants