BUG: inconsistency in dtype of replace() #44897

shubham11941140 · 2021-12-15T12:07:40Z

closes BUG: inconsistency in dtype of replace() #44864
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

Prevented direct return of Regex=True allowing the dtype conversion to happen.

phofl

This broke lots of tests

phofl · 2021-12-15T14:27:49Z

pandas/core/internals/blocks.py

@@ -663,7 +663,9 @@ def replace(
        regex = should_use_regex(regex, to_replace)

        if regex:
-            return self._replace_regex(to_replace, value, inplace=inplace)
+            self.values = np.asarray(


I do not think that this is the right place. Please try doing this deeper when operating with actual arrays

phofl

Please ensure that ci passes before requesting reviews. And please do thid inside replace regex

jbrockmendel · 2021-12-16T19:26:20Z

pandas/core/internals/blocks.py

@@ -739,7 +739,10 @@ def _replace_regex(
        replace_regex(new_values, rx, value, mask)

        block = self.make_block(new_values)
-        return [block]
+        if self.ndim == 1 or self.shape[0] == 1:


when we use this pattern the 'else' path usually involves split_and_operate

In the previous case where the dimensions are as follows split and operate don't operate on these dimensions, for split and operate you need

assert self.ndim == 2 and self.shape[0] != 1, which we do not use here.

Since we have a single element here, we should not split and upcast, right?

not sure i understand. im saying its weird to see this check on L734 and not see a split_and_operate on L737.

e.g. in the test you implemented, what happens if you use pd.DataFrame({"A": ["0"], "B": ["0"]})?

In replace_regex, we have aldready applied the replacement with the only issue of the type cast. So, if I apply split_and_operate again, it won't give any specific result except for converting dtype before breaking a lot of tests. The solution is correct with the dtype being wrong.

pd.DataFrame({"A": ["0"], "B: ["0"]}) is giving a dtype assertion error

is giving a dtype assertion error

can you paste it?

I rectified it, the testcase has been added.

jreback · 2021-12-16T20:48:03Z

pandas/tests/series/methods/test_replace.py

+    def test_replace_regex_dtype(self):
+        # GH-48644
+        s = pd.Series(["0"])
+        exp = s.replace(to_replace="0", value=1, regex=False).dtype


explictly create the expected and use tm.assert_series_equal

No, you should create expected = Series(...) not using replace. And please don't use one letter variable names

jreback

comments

pandas/tests/series/methods/test_replace.py

jreback · 2021-12-17T22:24:57Z

looks like this is a more limited form of #44940 cc @jbrockmendel

jbrockmendel · 2021-12-18T03:59:21Z

looks like this is a more limited form of #44940 cc @jbrockmendel

Related, but that focuses on the non-regex paths whereas this is only-regex paths.

jbrockmendel · 2021-12-18T17:27:04Z

pandas/tests/frame/methods/test_replace.py

+        result_df1 = df1.replace(to_replace="0", value=1, regex=regex)
+        tm.assert_frame_equal(result_df1, expected_df1)
+
+        df2 = DataFrame({"A": ["0"], "B": [np.NaN]})


i was unclear in the request: the "B" column should also be strings, just not "0"

jbrockmendel · 2021-12-18T23:25:28Z

Looks good. Could use a whatsnew note for the bugfix.

shubham11941140 · 2021-12-19T05:04:07Z

Whatsnew note is added @jbrockmendel

jreback · 2021-12-19T23:31:53Z

@shubham11941140 can you merge master (conflicts in the tests) and ping on green

shubham11941140 · 2021-12-20T04:34:54Z

Merged master @jreback

shubham11941140 · 2021-12-20T06:04:49Z

I am getting very weird errors which are not related to the big fix, request you to merge so I can move ahead with the next bug fix.

shubham11941140 · 2021-12-21T04:27:52Z

@jbrockmendel @jbrockmendel any update?

shubham11941140 · 2021-12-21T13:00:37Z

@jreback , any update?

jbrockmendel

LGTM

jreback

pls rebase as well, ping on green.

jreback · 2021-12-22T02:42:48Z

doc/source/whatsnew/v1.4.0.rst

@@ -824,7 +824,7 @@ Reshaping
 - Bug in :meth:`DataFrame.stack` with ``ExtensionDtype`` columns incorrectly raising (:issue:`43561`)
 - Bug in :meth:`Series.unstack` with object doing unwanted type inference on resulting columns (:issue:`44595`)
 - Bug in :class:`MultiIndex` failing join operations with overlapping ``IntervalIndex`` levels (:issue:`44096`)
-
+- Bug in :func:`replace` results is different ``dtype`` based on ``regex`` parameter (:issue:`44864`)


use :meth:`DataFrame.replace` and for Series as well

shubham11941140 · 2021-12-22T08:26:54Z

@jreback is is green, request you to merge.

jreback · 2021-12-22T15:18:54Z

thanks @shubham11941140

shubham11941140 added 2 commits December 15, 2021 17:29

Dtype is same

d0565b1

pre-commit hook solved

2fc7c96

phofl requested changes Dec 15, 2021

View reviewed changes

Update blocks.py

cb4dd4f

shubham11941140 requested a review from phofl December 15, 2021 16:35

phofl requested changes Dec 15, 2021

View reviewed changes

shubham11941140 added 2 commits December 15, 2021 22:52

changed replace regex

1129c9a

precommit solved

1a1c378

shubham11941140 requested a review from phofl December 15, 2021 18:10

jbrockmendel reviewed Dec 16, 2021

View reviewed changes

shubham11941140 requested a review from jbrockmendel December 16, 2021 19:45

jreback reviewed Dec 16, 2021

View reviewed changes

jreback requested changes Dec 16, 2021

View reviewed changes

jreback added Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 16, 2021

Changed test to assert series equal

eb06d43

shubham11941140 requested a review from jreback December 17, 2021 10:08

Changes test

fdbd5f5

phofl requested changes Dec 17, 2021

View reviewed changes

pandas/tests/series/methods/test_replace.py Outdated Show resolved Hide resolved

pandas/tests/series/methods/test_replace.py Outdated Show resolved Hide resolved

Added parameterise

087aad2

shubham11941140 requested a review from phofl December 17, 2021 16:35

shubham11941140 added 3 commits December 17, 2021 23:12

Merge branch 'master' into b6

d0f3c7f

changed

54f5676

Update blocks.py

c83c2b6

jbrockmendel reviewed Dec 17, 2021

View reviewed changes

pandas/tests/series/methods/test_replace.py Outdated Show resolved Hide resolved

shubham11941140 requested a review from jbrockmendel December 18, 2021 05:19

shubham11941140 added 2 commits December 18, 2021 11:01

Added test

a438762

precommit

9617165

jbrockmendel reviewed Dec 18, 2021

View reviewed changes

Changed to string

f04d05d

shubham11941140 requested a review from jbrockmendel December 18, 2021 17:35

shubham11941140 added 3 commits December 19, 2021 10:21

whatsnew note added - 1.4

ddbb113

whatsnew fixed

75e3bd8

whatsnew-fix

af0585c

shubham11941140 added 2 commits December 20, 2021 10:30

fix1

2a80b69

fix2

e281264

shubham11941140 force-pushed the b6 branch from 5a2d126 to e281264 Compare December 20, 2021 05:28

Moved testcase below

d00915b

jbrockmendel approved these changes Dec 21, 2021

View reviewed changes

jreback requested changes Dec 22, 2021

View reviewed changes

jreback added this to the 1.4 milestone Dec 22, 2021

shubham11941140 added 2 commits December 22, 2021 10:34

Merge branch 'master' of https://github.com/pandas-dev/pandas into b6

2fd474c

Changed Whatsnew note

0d93bdd

shubham11941140 requested a review from jreback December 22, 2021 05:50

jreback approved these changes Dec 22, 2021

View reviewed changes

jreback merged commit b1a2f48 into pandas-dev:master Dec 22, 2021

shubham11941140 deleted the b6 branch December 22, 2021 15:30

simonjayhawkins mentioned this pull request Mar 16, 2022

REGR: only convert at end for Block.replace_list #46393

Closed

simonjayhawkins mentioned this pull request Aug 25, 2022

BUG: Series.replace converts np.nan into pd.NaT implicitly #48034

Open

3 tasks

Uh oh!

BUG: inconsistency in dtype of replace() #44897

BUG: inconsistency in dtype of replace() #44897

Uh oh!

Conversation

shubham11941140 commented Dec 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phofl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

phofl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Dec 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jreback commented Dec 17, 2021

Uh oh!

jbrockmendel commented Dec 18, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Dec 18, 2021

Uh oh!

shubham11941140 commented Dec 19, 2021

Uh oh!

jreback commented Dec 19, 2021

Uh oh!

shubham11941140 commented Dec 20, 2021

Uh oh!

shubham11941140 commented Dec 20, 2021

Uh oh!

shubham11941140 commented Dec 21, 2021

Uh oh!

shubham11941140 commented Dec 21, 2021

Uh oh!

jbrockmendel left a comment

Choose a reason for hiding this comment

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shubham11941140 commented Dec 22, 2021

Uh oh!

shubham11941140 commented Dec 15, 2021 •

edited

Loading

jbrockmendel Dec 17, 2021 •

edited

Loading