Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable many complex number tests #54761

Open
wants to merge 43 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
6125494
Enable many complex number tests
MichaelTiemannOSC Aug 25, 2023
e7a285a
Update v2.1.0.rst
MichaelTiemannOSC Aug 25, 2023
02719d9
Merge branch 'main' into test_numpy_complex2
MichaelTiemannOSC Aug 29, 2023
f9bfeb9
Fix merge error in test_decimal.py
MichaelTiemannOSC Aug 29, 2023
077213f
Simplify test_fillna_no_op_returns_copy
MichaelTiemannOSC Aug 29, 2023
9fda0ef
Merge remote-tracking branch 'upstream/main' into test_numpy_complex2
MichaelTiemannOSC Sep 8, 2023
d25baa2
changes from review
MichaelTiemannOSC Sep 8, 2023
ad841bf
Merge remote-tracking branch 'upstream/main' into test_numpy_complex2
MichaelTiemannOSC Sep 8, 2023
7535374
Merge branch 'main' into test_numpy_complex2
MichaelTiemannOSC Sep 22, 2023
7ef6052
Use LSP parameter style for request
MichaelTiemannOSC Sep 22, 2023
f1139f5
Merge branch 'main' into test_numpy_complex2
MichaelTiemannOSC Oct 10, 2023
19d3127
Handle complex128 EA in _ensure_data
MichaelTiemannOSC Oct 11, 2023
67e2dbc
Fix mypy pre-commit problems
MichaelTiemannOSC Oct 12, 2023
909ced4
Remove some LSP sigs for _get_expected_exception
MichaelTiemannOSC Oct 13, 2023
48cb330
Merge branch 'main' into test_numpy_complex2
MichaelTiemannOSC Oct 13, 2023
bc96021
Additional `requests` removed; indentation fix
MichaelTiemannOSC Oct 13, 2023
d98e6f0
Merge branch 'main' into test_numpy_complex2
MichaelTiemannOSC Oct 14, 2023
dabaf6f
Keep rval refs alive in StringHashTable._unique
MichaelTiemannOSC Oct 15, 2023
61c9b32
Merge branch 'main' into test_numpy_complex2
MichaelTiemannOSC Nov 4, 2023
6ed24ad
Code review changes
MichaelTiemannOSC Nov 4, 2023
e923878
Fix incomplete removal of `keep_rval_refs`
MichaelTiemannOSC Nov 4, 2023
5efad33
Merge remote-tracking branch 'upstream/main' into test_numpy_complex2
MichaelTiemannOSC Dec 9, 2023
51450c8
Merge branch 'main' into test_numpy_complex2
MichaelTiemannOSC Dec 9, 2023
c31b213
Merge remote-tracking branch 'upstream/main' into test_numpy_complex2
MichaelTiemannOSC Jan 5, 2024
9473130
Update io.py
MichaelTiemannOSC Jan 5, 2024
a86c896
Update test_numpy.py
MichaelTiemannOSC Jan 5, 2024
de56177
Update test_numpy.py
MichaelTiemannOSC Jan 5, 2024
198a16d
Merge branch 'main' into test_numpy_complex2
MichaelTiemannOSC Jan 5, 2024
554a5c3
Update ops.py
MichaelTiemannOSC Jan 6, 2024
6ddb7f7
Update test_decimal.py
MichaelTiemannOSC Jan 6, 2024
c4a17a7
Further simplifications due to upstream
MichaelTiemannOSC Jan 6, 2024
040c98b
Update test_arrow.py
MichaelTiemannOSC Jan 6, 2024
3a58f5a
Update test_arrow.py
MichaelTiemannOSC Jan 6, 2024
29aa747
Update test_arrow.py
MichaelTiemannOSC Jan 6, 2024
5210c8b
setitem exceptions for complex raise ValueError
MichaelTiemannOSC Jan 9, 2024
9f4bea5
Merge branch 'main' into test_numpy_complex2
MichaelTiemannOSC Jan 16, 2024
be1f02b
Merge branch 'main' into test_numpy_complex2
MichaelTiemannOSC Jan 23, 2024
b3edefa
Update _mixins.py
MichaelTiemannOSC Jan 23, 2024
89ea60b
Incorporate feedback
MichaelTiemannOSC Jan 31, 2024
4dc3bea
Merge branch 'main' into test_numpy_complex2
MichaelTiemannOSC Mar 22, 2024
4e273fa
Update test_sparse.py
MichaelTiemannOSC Mar 22, 2024
abfdedb
Merge branch 'main' into test_numpy_complex2
MichaelTiemannOSC Mar 29, 2024
59b50c9
Update algorithms.py
MichaelTiemannOSC Mar 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,7 @@ Styler

Other
^^^^^
- Add complex128 to the types of numerical data we test across the test suite (:issue:`54761`)
MichaelTiemannOSC marked this conversation as resolved.
Show resolved Hide resolved
- Bug in :func:`cut` incorrectly allowing cutting of timezone-aware datetimes with timezone-naive bins (:issue:`54964`)

.. ***DO NOT USE THIS SECTION***
Expand Down
5 changes: 5 additions & 0 deletions pandas/core/dtypes/astype.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,11 @@ def _astype_nansafe(
elif np.issubdtype(arr.dtype, np.floating) and dtype.kind in "iu":
return _astype_float_to_int_nansafe(arr, dtype, copy)

elif np.issubdtype(arr.dtype, np.complexfloating) and is_object_dtype(dtype):
res = arr.astype(dtype, copy=copy)
res[np.isnan(arr)] = np.nan
return res

elif arr.dtype == object:
# if we have a datetime/timedelta array of objects
# then coerce to datetime64[ns] and use DatetimeArray.astype
Expand Down
20 changes: 17 additions & 3 deletions pandas/core/nanops.py
Original file line number Diff line number Diff line change
Expand Up @@ -1003,11 +1003,25 @@ def nanvar(
# cancellation errors and relatively accurate for small numbers of
# observations.
#
# See https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
avg = _ensure_numeric(values.sum(axis=axis, dtype=np.float64)) / count
# See https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance...
if values.dtype.kind == "c":
MichaelTiemannOSC marked this conversation as resolved.
Show resolved Hide resolved
avg = _ensure_numeric(values.sum(axis=axis, dtype=values.dtype)) / count
else:
avg = _ensure_numeric(values.sum(axis=axis, dtype=np.float64)) / count
if axis is not None:
avg = np.expand_dims(avg, axis)
sqr = _ensure_numeric((avg - values) ** 2)
# ...but also,
# see https://numpy.org/doc/stable/reference/generated/numpy.nanvar.html#numpy-nanvar
# which explains why computing the variance of complex numbers
# requires first normalizing the complex differences to magnitudes
if values.dtype.kind == "c":
deltas = _ensure_numeric(avg - values)
avg_re = np.real(deltas)
avg_im = np.imag(deltas)
sqr = avg_re**2 + avg_im**2
else:
sqr = _ensure_numeric((avg - values) ** 2)

if mask is not None:
np.putmask(sqr, mask, 0)
result = sqr.sum(axis=axis, dtype=np.float64) / d
Expand Down
62 changes: 46 additions & 16 deletions pandas/tests/arithmetic/test_numeric.py
Original file line number Diff line number Diff line change
Expand Up @@ -977,7 +977,7 @@ def test_frame_operators_none_to_nan(self):
df = pd.DataFrame({"a": ["a", None, "b"]})
tm.assert_frame_equal(df + df, pd.DataFrame({"a": ["aa", np.nan, "bb"]}))

@pytest.mark.parametrize("dtype", ("float", "int64"))
@pytest.mark.parametrize("dtype", ("float", "int64", "complex128"))
def test_frame_operators_empty_like(self, dtype):
# Test for issue #10181
frames = [
Expand Down Expand Up @@ -1059,7 +1059,7 @@ def test_series_divmod_zero(self):
class TestUFuncCompat:
# TODO: add more dtypes
@pytest.mark.parametrize("holder", [Index, RangeIndex, Series])
@pytest.mark.parametrize("dtype", [np.int64, np.uint64, np.float64])
@pytest.mark.parametrize("dtype", [np.int64, np.uint64, np.float64, np.complex128])
def test_ufunc_compat(self, holder, dtype):
box = Series if holder is Series else Index

Expand All @@ -1075,45 +1075,75 @@ def test_ufunc_compat(self, holder, dtype):

# TODO: add more dtypes
@pytest.mark.parametrize("holder", [Index, Series])
@pytest.mark.parametrize("dtype", [np.int64, np.uint64, np.float64])
@pytest.mark.parametrize("dtype", [np.int64, np.uint64, np.float64, np.complex128])
def test_ufunc_coercions(self, holder, dtype):
idx = holder([1, 2, 3, 4, 5], dtype=dtype, name="x")
box = Series if holder is Series else Index

result = np.sqrt(idx)
assert result.dtype == "f8" and isinstance(result, box)
exp = Index(np.sqrt(np.array([1, 2, 3, 4, 5], dtype=np.float64)), name="x")
if result.dtype.kind == "c":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel is there something in our type conversion / introspection functions that lets us cast to the nearest inexact data type? If not that might be something we want to do here or in a follow up PR to better handle this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that lets us cast to the nearest inexact data type

I don't think so, no. i expected maybe_promote to do that, but looks like it always gives float64

assert result.dtype == dtype and isinstance(result, box)
exp_dtype = dtype
else:
assert result.dtype == "f8" and isinstance(result, box)
exp_dtype = np.float64
exp = Index(np.sqrt(np.array([1, 2, 3, 4, 5], dtype=exp_dtype)), name="x")
exp = tm.box_expected(exp, box)
tm.assert_equal(result, exp)

result = np.divide(idx, 2.0)
assert result.dtype == "f8" and isinstance(result, box)
exp = Index([0.5, 1.0, 1.5, 2.0, 2.5], dtype=np.float64, name="x")
if result.dtype.kind == "c":
assert result.dtype == dtype and isinstance(result, box)
exp_dtype = dtype
else:
assert result.dtype == "f8" and isinstance(result, box)
exp_dtype = np.float64
exp = Index([0.5, 1.0, 1.5, 2.0, 2.5], dtype=exp_dtype, name="x")
exp = tm.box_expected(exp, box)
tm.assert_equal(result, exp)

# _evaluate_numeric_binop
result = idx + 2.0
assert result.dtype == "f8" and isinstance(result, box)
exp = Index([3.0, 4.0, 5.0, 6.0, 7.0], dtype=np.float64, name="x")
if result.dtype.kind == "c":
assert result.dtype == dtype and isinstance(result, box)
exp_dtype = dtype
else:
assert result.dtype == "f8" and isinstance(result, box)
exp_dtype = np.float64
MichaelTiemannOSC marked this conversation as resolved.
Show resolved Hide resolved
exp = Index([3.0, 4.0, 5.0, 6.0, 7.0], dtype=exp_dtype, name="x")
exp = tm.box_expected(exp, box)
tm.assert_equal(result, exp)

result = idx - 2.0
assert result.dtype == "f8" and isinstance(result, box)
exp = Index([-1.0, 0.0, 1.0, 2.0, 3.0], dtype=np.float64, name="x")
if result.dtype.kind == "c":
assert result.dtype == dtype and isinstance(result, box)
exp_dtype = dtype
else:
assert result.dtype == "f8" and isinstance(result, box)
exp_dtype = np.float64
exp = Index([-1.0, 0.0, 1.0, 2.0, 3.0], dtype=exp_dtype, name="x")
exp = tm.box_expected(exp, box)
tm.assert_equal(result, exp)

result = idx * 1.0
assert result.dtype == "f8" and isinstance(result, box)
exp = Index([1.0, 2.0, 3.0, 4.0, 5.0], dtype=np.float64, name="x")
if result.dtype.kind == "c":
assert result.dtype == dtype and isinstance(result, box)
exp_dtype = dtype
else:
assert result.dtype == "f8" and isinstance(result, box)
exp_dtype = np.float64
exp = Index([1.0, 2.0, 3.0, 4.0, 5.0], dtype=exp_dtype, name="x")
exp = tm.box_expected(exp, box)
tm.assert_equal(result, exp)

result = idx / 2.0
assert result.dtype == "f8" and isinstance(result, box)
exp = Index([0.5, 1.0, 1.5, 2.0, 2.5], dtype=np.float64, name="x")
if result.dtype.kind == "c":
assert result.dtype == dtype and isinstance(result, box)
exp_dtype = dtype
else:
assert result.dtype == "f8" and isinstance(result, box)
exp_dtype = np.float64
exp = Index([0.5, 1.0, 1.5, 2.0, 2.5], dtype=exp_dtype, name="x")
exp = tm.box_expected(exp, box)
tm.assert_equal(result, exp)

Expand Down Expand Up @@ -1367,7 +1397,7 @@ def test_numeric_compat2_floordiv(self, idx, div, expected):
# __floordiv__
tm.assert_index_equal(idx // div, expected, exact=True)

@pytest.mark.parametrize("dtype", [np.int64, np.float64])
@pytest.mark.parametrize("dtype", [np.int64, np.float64, np.complex128])
@pytest.mark.parametrize("delta", [1, 0, -1])
def test_addsub_arithmetic(self, dtype, delta):
# GH#8142
Expand Down
4 changes: 4 additions & 0 deletions pandas/tests/extension/base/dim2.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

from pandas.core.dtypes.common import (
is_bool_dtype,
is_complex_dtype,
is_integer_dtype,
)

Expand Down Expand Up @@ -272,6 +273,9 @@ def get_reduction_result_dtype(dtype):
data = data.astype("Float64")
if method == "mean":
tm.assert_extension_array_equal(result, data)
elif is_complex_dtype(data) and method in ["std", "var"]:
# std and var produce real-only results
tm.assert_extension_array_equal(result, data - data, check_dtype=False)
else:
tm.assert_extension_array_equal(result, data - data)

Expand Down
8 changes: 7 additions & 1 deletion pandas/tests/extension/base/io.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,13 @@

class BaseParsingTests:
@pytest.mark.parametrize("engine", ["c", "python"])
def test_EA_types(self, engine, data):
def test_EA_types(self, engine, data, request):
if engine == "c" and data.dtype.kind == "c":
request.node.add_marker(
pytest.mark.xfail(
reason=f"engine '{engine}' cannot parse the dtype {data.dtype.name}"
)
)
df = pd.DataFrame({"with_dtype": pd.Series(data, dtype=str(data.dtype))})
csv_output = df.to_csv(index=False, na_rep=np.nan)
result = pd.read_csv(
Expand Down
42 changes: 21 additions & 21 deletions pandas/tests/extension/base/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ class BaseOpsUtil:
divmod_exc: type[Exception] | None = TypeError

def _get_expected_exception(
self, op_name: str, obj, other
self, op_name: str, obj, other, request
MichaelTiemannOSC marked this conversation as resolved.
Show resolved Hide resolved
) -> type[Exception] | None:
# Find the Exception, if any we expect to raise calling
# obj.__op_name__(other)
Expand Down Expand Up @@ -54,8 +54,8 @@ def get_op_from_name(self, op_name: str):
# case that still requires overriding _check_op or _combine, please let
# us know at github.com/pandas-dev/pandas/issues
@final
def check_opname(self, ser: pd.Series, op_name: str, other):
exc = self._get_expected_exception(op_name, ser, other)
def check_opname(self, ser: pd.Series, op_name: str, other, request):
exc = self._get_expected_exception(op_name, ser, other, request)
op = self.get_op_from_name(op_name)

self._check_op(ser, op, other, op_name, exc)
Expand Down Expand Up @@ -91,12 +91,12 @@ def _check_op(

# see comment on check_opname
@final
def _check_divmod_op(self, ser: pd.Series, op, other):
def _check_divmod_op(self, ser: pd.Series, op, other, request):
# check that divmod behavior matches behavior of floordiv+mod
if op is divmod:
exc = self._get_expected_exception("__divmod__", ser, other)
exc = self._get_expected_exception("__divmod__", ser, other, request)
else:
exc = self._get_expected_exception("__rdivmod__", ser, other)
exc = self._get_expected_exception("__rdivmod__", ser, other, request)
if exc is None:
result_div, result_mod = op(ser, other)
if op is divmod:
Expand Down Expand Up @@ -128,53 +128,53 @@ class BaseArithmeticOpsTests(BaseOpsUtil):
series_array_exc: type[Exception] | None = TypeError
divmod_exc: type[Exception] | None = TypeError

def test_arith_series_with_scalar(self, data, all_arithmetic_operators):
def test_arith_series_with_scalar(self, data, all_arithmetic_operators, request):
# series & scalar
if all_arithmetic_operators == "__rmod__" and is_string_dtype(data.dtype):
pytest.skip("Skip testing Python string formatting")

op_name = all_arithmetic_operators
ser = pd.Series(data)
self.check_opname(ser, op_name, ser.iloc[0])
self.check_opname(ser, op_name, ser.iloc[0], request)

def test_arith_frame_with_scalar(self, data, all_arithmetic_operators):
def test_arith_frame_with_scalar(self, data, all_arithmetic_operators, request):
# frame & scalar
if all_arithmetic_operators == "__rmod__" and is_string_dtype(data.dtype):
pytest.skip("Skip testing Python string formatting")

op_name = all_arithmetic_operators
df = pd.DataFrame({"A": data})
self.check_opname(df, op_name, data[0])
self.check_opname(df, op_name, data[0], request)

def test_arith_series_with_array(self, data, all_arithmetic_operators):
def test_arith_series_with_array(self, data, all_arithmetic_operators, request):
# ndarray & other series
op_name = all_arithmetic_operators
ser = pd.Series(data)
self.check_opname(ser, op_name, pd.Series([ser.iloc[0]] * len(ser)))
self.check_opname(ser, op_name, pd.Series([ser.iloc[0]] * len(ser)), request)

def test_divmod(self, data):
def test_divmod(self, data, request):
ser = pd.Series(data)
self._check_divmod_op(ser, divmod, 1)
self._check_divmod_op(1, ops.rdivmod, ser)
self._check_divmod_op(ser, divmod, 1, request)
self._check_divmod_op(1, ops.rdivmod, ser, request)

def test_divmod_series_array(self, data, data_for_twos):
def test_divmod_series_array(self, data, data_for_twos, request):
ser = pd.Series(data)
self._check_divmod_op(ser, divmod, data)
self._check_divmod_op(ser, divmod, data, request)

other = data_for_twos
self._check_divmod_op(other, ops.rdivmod, ser)
self._check_divmod_op(other, ops.rdivmod, ser, request)

other = pd.Series(other)
self._check_divmod_op(other, ops.rdivmod, ser)
self._check_divmod_op(other, ops.rdivmod, ser, request)

def test_add_series_with_extension_array(self, data):
def test_add_series_with_extension_array(self, data, request):
# Check adding an ExtensionArray to a Series of the same dtype matches
# the behavior of adding the arrays directly and then wrapping in a
# Series.

ser = pd.Series(data)

exc = self._get_expected_exception("__add__", ser, data)
exc = self._get_expected_exception("__add__", ser, data, request)
if exc is not None:
with pytest.raises(exc):
ser + data
Expand Down
5 changes: 3 additions & 2 deletions pandas/tests/extension/base/setitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,8 @@ def test_setitem_slice_array(self, data):

def test_setitem_scalar_key_sequence_raise(self, data):
arr = data[:5].copy()
with pytest.raises(ValueError):
# complex128 data raises TypeError; other numeric types raise ValueError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know where the ValueError is being thrown? I think the type of error should stay consistent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running pandas/tests/extension/test_numpy.py and data is an array of float64 we see:

ValueError: setting an array element with a sequence.

When data is complex128 we see:

TypeError: must be real number, not NumpyExtensionArray

The ValueError seems to come from Numpy, and the TypeError seems to come from Python, both coming from __setitem__ in class NDArrayBackedExtensionArray in pandas/core/arrays/_mixins.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we catch the Python TypeError and reraise as a ValueError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've amended the PR to attempt this. Please let me know what you think.

with pytest.raises((ValueError, TypeError)):
arr[0] = arr[[0, 1]]

def test_setitem_preserves_views(self, data):
Expand Down Expand Up @@ -438,7 +439,7 @@ def test_setitem_invalid(self, data, invalid_scalar):
data[:] = invalid_scalar

def test_setitem_2d_values(self, data):
# GH50085
# GH54445
original = data.copy()
df = pd.DataFrame({"a": data, "b": data})
df.loc[[0, 1], :] = df.loc[[1, 0], :].values
Expand Down
Loading