Skip to content

Fix for comparisons of categorical and an scalar not in categories #9864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v0.16.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -123,3 +123,5 @@ Bug Fixes
- Bug in which ``SparseDataFrame`` could not take `nan` as a column name (:issue:`8822`)

- Bug in unequal comparisons between a ``Series`` of dtype `"category"` and a scalar (e.g. ``Series(Categorical(list("abc"), categories=list("cba"), ordered=True)) > "b"``, which wouldn't use the order of the categories but use the lexicographical order. (:issue:`9848`)

- Bug in unequal comparisons between categorical data and a scalar, which was not in the categories (e.g. ``Series(Categorical(list("abc"), ordered=True)) > "d"``. This returned ``False`` for all elements, but now raises a TypeError. Equality comparisons also now return ``False`` for ``==`` and ``True`` for ``!=``. (:issue:`9848`)
9 changes: 8 additions & 1 deletion pandas/core/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,14 @@ def f(self, other):
i = self.categories.get_loc(other)
return getattr(self._codes, op)(i)
else:
return np.repeat(False, len(self))
if op == '__eq__':
return np.repeat(False, len(self))
elif op == '__ne__':
return np.repeat(True, len(self))
else:
msg = "Cannot compare a Categorical for op {op} with a scalar, " \
"which is not a category."
raise TypeError(msg.format(op=op))
else:

# allow categorical vs object dtype array comparisons for equality
Expand Down
27 changes: 27 additions & 0 deletions pandas/tests/test_categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -1087,6 +1087,20 @@ def test_reflected_comparison_with_scalars(self):
self.assert_numpy_array_equal(cat > cat[0], [False, True, True])
self.assert_numpy_array_equal(cat[0] < cat, [False, True, True])

def test_comparison_with_unknown_scalars(self):
# https://github.com/pydata/pandas/issues/9836#issuecomment-92123057 and following
# comparisons with scalars not in categories should raise for unequal comps, but not for
# equal/not equal
cat = pd.Categorical([1, 2, 3], ordered=True)

self.assertRaises(TypeError, lambda: cat < 4)
self.assertRaises(TypeError, lambda: cat > 4)
self.assertRaises(TypeError, lambda: 4 < cat)
self.assertRaises(TypeError, lambda: 4 > cat)

self.assert_numpy_array_equal(cat == 4 , [False, False, False])
self.assert_numpy_array_equal(cat != 4 , [True, True, True])


class TestCategoricalAsBlock(tm.TestCase):
_multiprocess_can_split_ = True
Expand Down Expand Up @@ -2440,6 +2454,19 @@ def f():
cat > "b"
self.assertRaises(TypeError, f)

# https://github.com/pydata/pandas/issues/9836#issuecomment-92123057 and following
# comparisons with scalars not in categories should raise for unequal comps, but not for
# equal/not equal
cat = Series(Categorical(list("abc"), ordered=True))

self.assertRaises(TypeError, lambda: cat < "d")
self.assertRaises(TypeError, lambda: cat > "d")
self.assertRaises(TypeError, lambda: "d" < cat)
self.assertRaises(TypeError, lambda: "d" > cat)

self.assert_series_equal(cat == "d" , Series([False, False, False]))
self.assert_series_equal(cat != "d" , Series([True, True, True]))


# And test NaN handling...
cat = Series(Categorical(["a","b","c", np.nan]))
Expand Down