Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable many complex number tests #54761

Open
wants to merge 43 commits into
base: main
Choose a base branch
from

Conversation

MichaelTiemannOSC
Copy link
Contributor

@MichaelTiemannOSC MichaelTiemannOSC commented Aug 25, 2023

These changes put complex128 on an even footing with float64 and int64 as far as numerical testing is concerned. These changes have been tested both against the Pandas test suite as well as the Pint-Pandas testsuite (using complex magnitudes).

These changes are a simpler version of a previous pull-request that was destroyed by GitHub's fork synchronize behavior.

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

These changes put complex128 on an even footing with float64 and int64 as far as numerical testing is concerned.  These changes have been tested both against the Pandas test suite as well as the Pint-Pandas testsuite (using complex magnitudes).

These changes are a simpler version of a previous pull-request that was destroyed by GitHub's fork synchronize behavior.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Add description of this PR to what's new file.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
@MichaelTiemannOSC MichaelTiemannOSC marked this pull request as ready for review August 25, 2023 18:48
@MichaelTiemannOSC MichaelTiemannOSC marked this pull request as draft August 26, 2023 08:07
@MichaelTiemannOSC
Copy link
Contributor Author

MichaelTiemannOSC commented Aug 26, 2023

False alarm: these simplified changes DO work with uncertainties in Pint...when used with the version of Pint that also supports uncertainties. 😊

@MichaelTiemannOSC MichaelTiemannOSC marked this pull request as ready for review August 26, 2023 08:32
@MichaelTiemannOSC
Copy link
Contributor Author

Tests with uncertainties all pass (with similar request-related changes). I cannot commit those changes to Pint-Pandas until these changes are accepted. It is very nice that the underlying changes made to Pandas 2.1 really simplifies the Pint-Pandas uses of EAs.

@andrewgsavage @hgrecco

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
xfail complex tests, but otherwise defer to parent object to implement test case.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
@MichaelTiemannOSC
Copy link
Contributor Author

@mroeschke can you have a look at this and let me know if any additional changes are needed?

@@ -100,6 +100,11 @@ def _astype_nansafe(
elif np.issubdtype(arr.dtype, np.floating) and dtype.kind in "iu":
return _astype_float_to_int_nansafe(arr, dtype, copy)

elif np.issubdtype(arr.dtype, np.complexfloating) and is_object_dtype(dtype):
if np.isnan(arr).any():
res = np.asarray([np.nan if np.isnan(x) else x for x in arr], dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you just do res[np.isnan(arr)] = np.nan?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

res has to be set to something first (it's the "result"). But I can look at using copy to decide if we really need something fresh before we do something that may have a side-effect.

@mroeschke mroeschke added the Complex Complex Numbers label Sep 8, 2023
Attempt to resolve all comments from @mroeschke

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Replace `request` parameter with `*args, **kwargs` in many places.  This allows us to avoid needlessly passing request parameters just to satisfy method signatures.

Also remove whatsnew entry as this enhancement to test cases is not really user-visible.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
@MichaelTiemannOSC
Copy link
Contributor Author

I strongly suspect the mypy failure was due to unrelated issues in CI/CD. In the intervening weeks, things have moved forward. Is it preferred practice to update via merge or rebase to catch up to latest after sitting idle for 3 weeks?

@mroeschke
Copy link
Member

Merging in main is sufficient. History is squashed during merging into main

The _duplicate functions expect complex128 data in ndarrays, not EAs.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
The command `pre-comment run -a` doesn't do all the things that Pandas CI/CD does.  This commit fixes two problems found in the mypy section:

* Adding some comments to ignore some errors in _ensure_data for complex dtypes
* Fix the type signature of _get_expected_exception to match LSP stylings

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Clean up random merge conflict.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Instead of overriding _get_expected_exception, set expected exception types in various test cases.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Fix indentation errors.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Don't need `*args, **kwargs` changes anymore due to refactoring upstream.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Re-harmonize with upstream.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
It turns out that `data` in `test_sparse.py` never contains complex numbers, and furthermore, that it's quite a lot of extra work to make the sparse tests complex-friendly.  So we leave complex number testing out of test_sparse.  Contributions welcome, as usual.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Arrow doesn't support complex numbers, so no need to special case tests as if it does.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Restore function accidentally deleted.  Intention was to only delete unneeded complex code handling, not the whole function.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Delete `request` parameter no longer needed for `test_fillna_no_op_returns_copy`.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
@MichaelTiemannOSC MichaelTiemannOSC marked this pull request as ready for review January 6, 2024 03:44
@MichaelTiemannOSC
Copy link
Contributor Author

Thanks to some upstream changes, the changes required to support many complex test cases is greatly simplified. Please let me know your feedback on these changes.

In certain cases where Python throws a TypeError for complex values where it would throw ValueError for real values, transform exception to ValueError.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
@MichaelTiemannOSC
Copy link
Contributor Author

Can somebody remove the stale tag from this PR?

@MichaelTiemannOSC
Copy link
Contributor Author

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

I have updated this PR.

Make ruff happy by raising from `exc`.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
@MichaelTiemannOSC
Copy link
Contributor Author

Note that the only failure now is Numpy Dev (and most likely "not my fault").

# Don't let Python's handling of `complex` make extra complexity for Pandas
if self._ndarray.dtype.kind == "c":
raise ValueError(
*(x.replace("real", "complex") for x in exc.args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we tinkering with the exception message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is actually a NumPy bug. When value is an object, NumPy appears to try constructing a complex number by passing the object in as the real component, but that, too fails with the message TypeError: must be a real number, not object. But that's a wrong message, because what we really want is to report the failure to construct a complex number. Here's a test case of sorts:

import numpy as np
xx = object()
np.asarray(xx).astype('float64')
# *** TypeError: float() argument must be a string or a real number, not 'object'
np.asarray(xx).astype('complex128')
# *** TypeError: must be real number, not object

That's what I'm trying to fix. So I could file an issue against NumPy, remove the above special case, and Pandas will fix itself when NumPy fixes itself. Is that the best approach?

# error: Item "ExtensionDtype" of "Union[Any, ExtensionDtype]"
# has no attribute "itemsize"
if values.dtype.itemsize == 16: # type: ignore[union-attr]
# We have support for complex128
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not complex64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to add complex64 to the test suite, but thought it better to do one case (complex128) at a time. I didn't want to add something (values.dtype.itemsize == 8) that wasn't tested by the test suite.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a deal-breaker, but id prefer to do them both in this pass. or at least leave a comment explaining why one is excluded

Updated code as per code review suggestions.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
`test_fillna_no_op_returns_copy` now works for all cases in Pandas 3.0.0 without special casing.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
@MichaelTiemannOSC
Copy link
Contributor Author

The new 3.0.0 build rules make these changes happy. Previous build rules across a variety of merge points in the 2.2.x created problems with inconsistent parameters unrelated to these changes. Perhaps this is now ready to merge?

# Note: when `self._ndarray.dtype.kind == "c"`, Numpy incorrectly complains
# that `must be real number, not ...` when in reality
# a complex argument is more likely what's expected
raise ValueError(exc.args) from exc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment here is helpful, thanks. it suggests to me that we may want to only catch-and-re-raise in a subset of cases? otherwise we'll be re-raising as ValueError more often than we really want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously @mroeschke questioned when ever this would be a TypeError in the first place...the above assignment was expected only ever raise a ValueError. So, instead of trying to also handle TypeError in higher-level code, we translate this "impossible" case into the canonical form that Pandas expects. The comment helps the user understand a completely unhelpful error message that comes from Python. Previously I attempted to edit the error message to something more reasonable, but that was challenged. At the end of the day the question is: how much steering do we want to do for this case vs. just letting the exception raise in the expected way and let users decipher what was wrong with their code in the first place.

Allow all defined numpy complex dtypes to work as well as complex128.  Note that by default the test suite only tests the complex128 case, but users can add or alter that default by modifying the test suite source code to test other cases.

Signed-off-by: Michael Tiemann <72577720+MichaelTiemannOSC@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Complex Complex Numbers Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants