BUG?: using None
as replacement value in replace()
typically upcasts to object dtype
#60284
Labels
API - Consistency
Internal Consistency of API/Behavior
Bug
Missing-data
np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
replace
replace method
I noticed that in certain cases, when replacing a value with
None
, that we always cast to object dtype, regardless of whether the dtype of the calling series can actually hold None (at least, when consideringNone
just as a generic "missing value" indicator).For example, a float Series can hold
None
in the sense of holding missing values, which is howNone
is treated in setitem:However, when using
replace()
to change the value 2.0 with None, it depends on the exact way to specify the to_replace/value combo, but typically it will upcast to object:In all the above cases, when replacing
None
withnp.nan
, it of course just results in a float Series with NaN.The reason for this is two-fold. First, in
Block._replace_coerce
there is a check specifically forvalue is None
and in that case we always cast to object dtype:pandas/pandas/core/internals/blocks.py
Lines 906 to 910 in 5f23ace
The above is used when replacing with a list of values. But for the scalar case, we also cast to object dtype because in this case we check for
if self._can_hold_element(value)
to do the replacement with a simple setitem (and if not cast to object dtype first before trying again). But it seems thatcan_hold_element(np.array([], dtype=float), None)
gives False ..Everything is tested with current main (3.0.0.dev), but I see the same behaviour on older releases (2.0 and 1.5)
Somewhat related issue:
The text was updated successfully, but these errors were encountered: