Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series.replace does not preserve dtype of original Series #37512

Closed
wants to merge 3 commits into from

Conversation

arw2019
Copy link
Member

@arw2019 arw2019 commented Oct 30, 2020

Reheating #33622 (stale)

@arw2019 arw2019 added Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. replace replace method Strings String extension data type and string data labels Oct 30, 2020
@arw2019 arw2019 self-assigned this Oct 30, 2020
@@ -1174,6 +1177,9 @@ def coerce_to_target_dtype(self, other):
f"possible recursion in coerce_to_target_dtype: {self} {other}"
)

if is_categorical_dtype(dtype) or self.is_datetime:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the is_datetime condition is already handled above, if for some reason you actually need to do this then do it above (and I am not really sure why you are trying to do this)

@@ -1130,7 +1130,10 @@ def coerce_to_target_dtype(self, other):
if is_dtype_equal(self.dtype, dtype):
return self

if self.is_bool or is_object_dtype(dtype) or is_bool_dtype(dtype):
if is_extension_array_dtype(self.dtype) and not is_categorical_dtype(dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this if/then is already very complicated, what is the intent here?

@arw2019 arw2019 marked this pull request as draft November 7, 2020 04:13
@arw2019
Copy link
Member Author

arw2019 commented Nov 7, 2020

Coverting to draft while I debug

@jbrockmendel
Copy link
Member

@arw2019 the underlying problem (or one of them at least) is that ExtensionBlock._can_hold_element and ExtensionBlock.putmask are completely FUBAR.

I suggest you try adding the two following methods to CategoricalBlock (taken from a branch I have trying to get rid of CategoricalBlock.replace and _replace_list)

    def _can_hold_element(self, element: Any) -> bool:
        try:
            self.values._validate_setitem_value(element)
            return True
        except (TypeError, ValueError):
            return False

    def putmask(
        self, mask, new, inplace: bool = False, axis: int = 0, transpose: bool = False
    ) -> List["Block"]:
        if self._can_hold_element(new):
            return super().putmask(mask, new, inplace, axis, transpose)

        # TODO: should this be inplace?
        # TODO: use coerce_to_target_dtype?
        cat = self.values.add_categories(new)
        nb = self.make_block(cat)
        assert nb._can_hold_element(new)
        return nb.putmask(mask, new, inplace, axis, transpose)

and in coerce_to_target_dtype just after the is_dtype_equal(self.dtype, dtype) check:

        if self._can_hold_element(other):
            return self
        elif self.is_categorical:
            # Note: this will be wrong if we ever have a tuple category
            cat = self.values.add_categories(other)
            return self.make_block(cat)

This changes the behavior for a handful of tests causing Series/DataFrame.replace to return CategoricalDtype (with added categories) instead of object-dtype, so may not be quite what we want. But I'm pretty sure its in the right ballpark.

@arw2019
Copy link
Member Author

arw2019 commented Nov 11, 2020

if self._can_hold_element(other):
return self
elif self.is_categorical:
# Note: this will be wrong if we ever have a tuple category
cat = self.values.add_categories(other)
return self.make_block(cat)

Thanks!!! Pushed this, will see what fails

Update: there are 7 test failures:

FAILED pandas/tests/arrays/categorical/test_replace.py::test_replace[to_replace10-value10-expected10-True]
FAILED pandas/tests/frame/indexing/test_categorical.py::TestDataFrameIndexingCategorical::test_assigning_ops
FAILED pandas/tests/frame/methods/test_replace.py::TestDataFrameReplace::test_replace_intervals
FAILED pandas/tests/series/methods/test_replace.py::TestSeriesReplace::test_replace_mixed_types
FAILED pandas/tests/series/methods/test_replace.py::TestSeriesReplace::test_replace_categorical[categorical0-numeric0]
FAILED pandas/tests/series/methods/test_replace.py::TestSeriesReplace::test_replace_categorical[categorical1-numeric1]
FAILED pandas/tests/series/methods/test_replace.py::TestSeriesReplace::test_replace_categorical[categorical2-numeric2]

Unfortunately they're undesirable behavioral changes (e.g. replacing a categorical with non-identical categories no longer raises, etc.)

xref #36226, November 2020 dev call

Mothballing this to clear the queue

@arw2019 arw2019 added the Mothballed Temporarily-closed PR the author plans to return to label Nov 11, 2020
@arw2019 arw2019 closed this Nov 11, 2020
@jbrockmendel
Copy link
Member

cc @jorisvandenbossche the latest commit here is pretty much exactly the edits i was referring to re CategoricalBlock.replace. Only difference is I would also remove CategoricalBlock.replace and _replace_list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Mothballed Temporarily-closed PR the author plans to return to replace replace method Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pd.Series.replace(to_replace=...) does not preserve the original dtype of the series
3 participants