-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: dtype
not being preserved for replace
on a CategoricalDtype
#46672
Comments
Thanks for the report @galipremsagar! Within the BlockManager, there is special logic to handle None pandas/pandas/core/internals/blocks.py Lines 782 to 792 in 32999a1
but I'm not sure it was ever meant to handle Categorical blocks. If the code were to take the |
Thanks @galipremsagar for the report, I noticed this while fixing #46634 The cause of this regression is #46404 from #46636 (comment)
We need a fix that preserves all the bug fixes in #44940 that were included in 1.4 and also restores the 1.3.5 behavior when None is the replacement value #45601 and #45836 while at the same time making these None cases consistent with other replacement values such as copying and splitting blocks. This work is ongoing and being discussed in #46636, #46443 and #46393 The underlying issue is that the bug fixes introduced in 1.4 (#44940) were achieved by dispatching to Block.replace which appears handle Ideally we would fix this on main in Block.replace, but it is my opinion that this may be difficult to backport (due to other changes that we would not normally backport for a patch release). However, if we continue with small patches like #46634 we would also need to revert the special case handling of Categorical to pre #44940 and would still not solve all the issue around copying and splitting. There is also an extra issue on main, #45725 which needs to be fixed before we can dispatch to Block.replace
this was done elsewhere. |
Looking some more, this code sample is actually a duplicate of #46634 since the replacement is a no-op? The regression for categorical comes when there is a replacement performed. in 1.3.5 >>> new_df = pdf.replace(to_replace=[".", "one"], value=["_", None])
>>> new_df
a b
0 None None
1 two NaN
2 NaN two
3 three three
>>>
>>> new_df.dtypes
a object
b object
dtype: object
>>>
>>> new_df["a"]
0 None
1 two
2 NaN
3 three
Name: a, dtype: object
>>>
>>> new_df.dtypes.a
dtype('O')
>>>
>>> pdf.dtypes.a
CategoricalDtype(categories=['one', 'three', 'two'], ordered=False)
>>>
>>> in 1.4.0 >>> new_df = pdf.replace(to_replace=[".", "one"], value=["_", None])
>>> new_df
a b
0 NaN NaN
1 two NaN
2 NaN two
3 three three
>>>
>>> new_df.dtypes
a category
b category
dtype: object
>>>
>>> new_df["a"]
0 NaN
1 two
2 NaN
3 three
Name: a, dtype: category
Categories (2, object): ['three', 'two']
>>>
>>> new_df.dtypes.a
CategoricalDtype(categories=['three', 'two'], ordered=False)
>>>
>>> pdf.dtypes.a
CategoricalDtype(categories=['one', 'three', 'two'], ordered=False)
>>> in 1.4.2 >>> new_df = pdf.replace(to_replace=[".", "one"], value=["_", None])
>>> new_df
a b
0 None None
1 two NaN
2 NaN two
3 three three
>>>
>>> new_df.dtypes
a object
b object
dtype: object
>>>
>>> new_df["a"]
0 None
1 two
2 NaN
3 three
Name: a, dtype: object
>>>
>>> new_df.dtypes.a
dtype('O')
>>>
>>> pdf.dtypes.a
CategoricalDtype(categories=['one', 'three', 'two'], ordered=False)
>>> So in 1.4.2 we have reverted to 1.3.5 behavior and convert to object when replacing a value. In 1.4.0 we retain the categorical dtype, but with a different CategoricalDtype (and convert the |
Haven't looked super-closely, but I think the |
moving to 1.4.4 |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
When the series is of
category
dtype, callingreplace
with above inputs is not preserving the dtype.Expected Behavior
Installed Versions
The text was updated successfully, but these errors were encountered: