Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clib.converison._to_numpy: Add tests for pandas.Series with pandas numeric dtypes #3584

Merged
merged 46 commits into from
Dec 12, 2024

Conversation

seisman
Copy link
Member

@seisman seisman commented Nov 5, 2024

This PR adds tests for pandas.Series with pandas/pyarrow numeric dtypes (in 7222db2).

Tests pass with pandas 2.2 but fail with pandas 2.1. Check https://github.com/GenericMappingTools/pygmt/actions/runs/11714111582/job/32628172796?pr=3584 for details.

The failures are due to upstream pandas v2.2 changes. Previously, all pandas nullable dtypes are converted to np.object_ dtype. Since pandas v2.2, the new rules are (source: https://pandas.pydata.org/docs/whatsnew/v2.2.0.html#to-numpy-for-numpy-nullable-and-arrow-types-converts-to-suitable-numpy-dtype):

  • float dtypes are cast to NumPy floats
  • integer dtypes without missing values are cast to NumPy integer dtypes
  • integer dtypes with missing values are cast to NumPy float dtypes and NaN is used as missing value indicator
  • boolean dtypes without missing values are cast to NumPy bool dtype
  • boolean dtypes with missing values keep object dtype
  • datetime and timedelta types are cast to Numpy datetime64 and timedelta64 types respectively and NaT is used as missing value indicator

Following the first three points, we add the workaround for pandas<2.2 to explicitly specify how to map pandas dtypes into numpy dtypes (a4f15cd). It is an improved version of PR #3505.

Here is the minimal example to understand the behavior change:

With pandas 2.1

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: x = pd.Series([1, 2, 3], dtype=pd.Int32Dtype())

In [4]: x.dtype
Out[4]: Int32Dtype()

In [5]: np.ascontiguousarray(x)
Out[5]: array([1, 2, 3], dtype=object)

In [6]: x = pd.Series([1, pd.NA, 3], dtype=pd.Int32Dtype())

In [7]: np.ascontiguousarray(x)
Out[7]: array([1, <NA>, 3], dtype=object)

With pandas 2.2

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: x = pd.Series([1, 2, 3], dtype=pd.Int32Dtype())

In [4]: np.ascontiguousarray(x)
Out[4]: array([1, 2, 3], dtype=int32)

In [5]: x = pd.Series([1, pd.NA, 3], dtype=pd.Int32Dtype())

In [6]: np.ascontiguousarray(x)
Out[6]: array([ 1., nan,  3.])

Related to #3581, #2848, #3513, #3583.

Wait for #3583.

@seisman seisman force-pushed the to_numpy/pandas_numeric branch from a7054b9 to c3471cb Compare November 5, 2024 11:03
@weiji14 weiji14 changed the title WIP: clib.converison._to_numpy: Add tests for panda.Series with pandas numeric dtypes WIP: clib.converison._to_numpy: Add tests for pandas.Series with pandas numeric dtypes Nov 6, 2024
@seisman seisman force-pushed the to_numpy/numpy_numeric branch from 8489ebe to 6f966db Compare November 6, 2024 06:16
@seisman seisman force-pushed the to_numpy/pandas_numeric branch from 176f511 to eceff7f Compare November 6, 2024 14:18
seisman and others added 4 commits November 7, 2024 07:08
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
@seisman seisman force-pushed the to_numpy/pandas_numeric branch from eceff7f to 6b77f42 Compare November 7, 2024 00:13
@seisman seisman force-pushed the to_numpy/pandas_numeric branch from 6b77f42 to 7222db2 Compare November 7, 2024 00:17
@seisman
Copy link
Member Author

seisman commented Nov 7, 2024

As explained in the top post, currently this PR does two things:

  • Add tests for pandas dtypes
  • Improve the pandas->numpy dtype conversion for pd.Series with NA values (pandas<2.2)

After finishing this PR, we have two options:

  1. Change the PR title to something like "Improve the conversion of pandas dtypes with missing values for pandas<=2.1" and mark it as a "bug".
  2. Cherry-pick the changes in a4f15cd, and open a separate PR so that we'll have two commits in the main branch, one for the new workaround, and one for the newly added tests.

I'm inclined to option 2.

@weiji14
Copy link
Member

weiji14 commented Nov 7, 2024

2. Cherry-pick the changes in a4f15cd, and open a separate PR so that we'll have two commits in the main branch, one for the new workaround, and one for the newly added tests.

I'm inclined to option 2.

Yeah, agree to option 2. Keep the unit tests here, and open a separate PR with the code changes from a4f15cd

Base automatically changed from to_numpy/numpy_numeric to main November 7, 2024 10:38
@seisman seisman changed the title WIP: clib.converison._to_numpy: Add tests for pandas.Series with pandas numeric dtypes clib.converison._to_numpy: Add tests for pandas.Series with pandas numeric dtypes Nov 7, 2024
@seisman seisman marked this pull request as ready for review November 7, 2024 10:41
pygmt/clib/conversion.py Outdated Show resolved Hide resolved
pygmt/clib/conversion.py Outdated Show resolved Hide resolved
@seisman seisman marked this pull request as ready for review November 15, 2024 05:27
@seisman seisman added the needs review This PR has higher priority and needs review. label Nov 15, 2024
@seisman seisman added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels Dec 3, 2024
@seisman
Copy link
Member Author

seisman commented Dec 4, 2024

@weiji14 I'm wondering if you have time to give this PR a final review.

@seisman seisman removed the final review call This PR requires final review and approval from a second reviewer label Dec 12, 2024
@seisman seisman merged commit 351247d into main Dec 12, 2024
15 checks passed
@seisman seisman deleted the to_numpy/pandas_numeric branch December 12, 2024 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs skip-changelog Skip adding Pull Request to changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants