Skip to content

API: should to_numpy() by default return the corresponding type, and raise otherwise? #48891

@MarcoGorelli

Description

@MarcoGorelli

Currently, to_numpy() from a nullable type always returns dtype object, regardless of whether the Series contains missing values.
E.g.

In [3]: pd.Series([1,2,3], dtype='Int64').to_numpy()
Out[3]: array([1, 2, 3], dtype=object)

This isn't great for users/downstream libraries. E.g. matplotlib/matplotlib#23991 (comment)

Users can specify a dtype in to_numpy(), but if they want to mirror the current behaviour of pandas whereby one has:

In [4]: pd.Series([1,2,3], dtype='int64').to_numpy()
Out[4]: array([1, 2, 3])

In [5]: pd.Series([1,2,3], dtype='float64').to_numpy()
Out[5]: array([1., 2., 3.])

then they would need to add extra logic to their code.

I'd suggest that instead, to_numpy() just always convert to the corresponding numpy type. Then:

  • if a Series has no NA, then to_numpy() will just convert to the corresponding numpy type:
In [4]: pd.Series([1,2,3], dtype='Int64').to_numpy()
Out[4]: array([1, 2, 3])

In [5]: pd.Series([1,2,3], dtype='Float64').to_numpy()
Out[5]: array([1., 2., 3.])
  • if a Series does have NA, then unless it's being converted to float (which can handle nan), it'll raise:
In [8]: pd.Series([1,2,pd.NA], dtype='Int64').to_numpy(int)
---------------------------------------------------------------------------
ValueError: cannot convert to '<class 'int'>'-dtype NumPy array with missing values.
Please either:
- convert to 'float'
- convert to 'object'
- specify an appropriate 'na_value' for this dtype

This would achieve:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions