-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
API DesignNA - MaskedArraysRelated to pd.NA and nullable extension arraysRelated to pd.NA and nullable extension arrays
Description
Currently, to_numpy()
from a nullable type always returns dtype
object
, regardless of whether the Series contains missing values.
E.g.
In [3]: pd.Series([1,2,3], dtype='Int64').to_numpy()
Out[3]: array([1, 2, 3], dtype=object)
This isn't great for users/downstream libraries. E.g. matplotlib/matplotlib#23991 (comment)
Users can specify a dtype
in to_numpy()
, but if they want to mirror the current behaviour of pandas whereby one has:
In [4]: pd.Series([1,2,3], dtype='int64').to_numpy()
Out[4]: array([1, 2, 3])
In [5]: pd.Series([1,2,3], dtype='float64').to_numpy()
Out[5]: array([1., 2., 3.])
then they would need to add extra logic to their code.
I'd suggest that instead, to_numpy()
just always convert to the corresponding numpy
type. Then:
- if a Series has no
NA
, thento_numpy()
will just convert to the correspondingnumpy
type:
In [4]: pd.Series([1,2,3], dtype='Int64').to_numpy()
Out[4]: array([1, 2, 3])
In [5]: pd.Series([1,2,3], dtype='Float64').to_numpy()
Out[5]: array([1., 2., 3.])
- if a Series does have
NA
, then unless it's being converted tofloat
(which can handlenan
), it'll raise:
In [8]: pd.Series([1,2,pd.NA], dtype='Int64').to_numpy(int)
---------------------------------------------------------------------------
ValueError: cannot convert to '<class 'int'>'-dtype NumPy array with missing values.
Please either:
- convert to 'float'
- convert to 'object'
- specify an appropriate 'na_value' for this dtype
This would achieve:
- better experience for users
- continue to avoid value-dependent behavior where the metadata (shape, dtype, etc.) depend on the values of the array
Metadata
Metadata
Assignees
Labels
API DesignNA - MaskedArraysRelated to pd.NA and nullable extension arraysRelated to pd.NA and nullable extension arrays