TST: test for inconsistency due to dtype=string #46512 #47793

Shadimrad · 2022-07-19T23:49:48Z

closes BUG: Inconsistency in DataFrame.where between inplace and not inplace with na like value for StringArray #46512
Tests added and passed if fixing a bug or adding a new feature
Added an entry in the latest pandas/core/generic.py file if fixing a bug or adding a new feature.

pep8speaks · 2022-07-20T00:08:58Z

Hello @Shadimrad! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-08-18 23:51:54 UTC

phofl · 2022-07-20T00:35:13Z

pandas/core/generic.py

            self._check_inplace_setting(other)
            new_data = self._mgr.putmask(mask=cond, new=other, align=align)
            result = self._constructor(new_data)
+            for i in range(len(result.dtypes)):


This is the wrong place for this. We can not special case this here

Would you mind elaborating on what you mean? I believe it is not a special case since it just affects the type in the case that the inplace is True. Do you mean I should put it within the putmask? @phofl

Probably, but this looks already correct on main, so no need to fix I think.

1.5.0.dev0+1180.g8c3a2f2ba7 <StringArray> ['1', <NA>, '3'] Length: 3, dtype: string <class 'pandas._libs.missing.NAType'> A 0 1 1 <NA> 2 3 <StringArray> ['1', <NA>, '3'] Length: 3, dtype: string <class 'pandas._libs.missing.NAType'>

Could you simply add a test?

cc @simonjayhawkins

Probably, but this looks already correct on main, so no need to fix I think.

see #46512 (comment)

I'll confirm the commit where the fix occurred and if we agree that this is the correct behavior, then indeed we just need a test.

I think this is essentially the same
As #47628

yes from #47628 (comment)

But the strange thing is that doing this on the Series level doesn't end up calling StringArray.setitem, it seems to go through Series._where and eventually BlockManager.putmask, and ExtensionArray._putmask, and that last one is not correctly implemented for StringArray.

but we should probably have tests for DataFrame.where also incase the implementation of __setitem__ changes to no longer go through Series._where

Just out of curiosity, how would it be implemented if we should not have special cased it? Is there a link to the change that fixed it on the main by any chance? @phofl

Shadimrad · 2022-07-26T15:41:42Z

take

mroeschke · 2022-08-17T17:00:53Z

pandas/tests/arrays/string_/test_string.py

+def test_consitency_inplace():
+    df = pd.DataFrame({"M": [""]}, dtype="string")
+    df2 = pd.DataFrame({"M": [""]}, dtype="string")
+    df2.where(df2 != "", np.nan, inplace=True)


Would be good to compare df and df2 to a separately created DataFrame e.g.

expected = pd.DataFrame(...) tm.assert_frame_equal(df, expected) tm.assert_frame_equal(df2, expected)

Shadimrad · 2022-08-17T17:06:08Z

Sure. I'll fix it.

…

On Wed, 17 Aug 2022, 10:01 am Matthew Roeschke, ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pandas/tests/arrays/string_/test_string.py <#47793 (comment)>: > @@ -611,3 +611,11 @@ def test_setitem_scalar_with_mask_validation(dtype): msg = "Scalar must be NA or str" with pytest.raises(ValueError, match=msg): ser[mask] = 1 + + +def test_consitency_inplace(): + df = pd.DataFrame({"M": [""]}, dtype="string") + df2 = pd.DataFrame({"M": [""]}, dtype="string") + df2.where(df2 != "", np.nan, inplace=True) Would be good to compare df and df2 to a separately created DataFrame e.g. expected = pd.DataFrame(...) tm.assert_frame_equal(df, expected) tm.assert_frame_equal(df2, expected) — Reply to this email directly, view it on GitHub <#47793 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AWI7Q465NG3MD5XUBTTZ7DDVZULFDANCNFSM54BVWMNA> . You are receiving this because you were assigned.Message ID: ***@***.***>

mroeschke · 2022-08-20T00:13:31Z

pandas/tests/arrays/string_/test_string.py

+    df = pd.DataFrame({"M": [""]}, dtype="string")
+    df.where(df != "", np.nan, inplace=True)
+    expected = expected.where(expected != "", np.nan)
+    tm.assert_frame_equal(expected, df)


In my prior comment, I was referring to that there should be 2 tm.assert_frame_equal so that both inplace=True/False are tested separately e.g.

expected = pd.DataFrame({"M": [""]}, dtype="string") df_inplace = ... tm.assert_frame_equal(df_inplace, expected) df_not_inplace = ... tm.assert_frame_equal(df_not_inplace, expected)

mroeschke · 2022-08-23T18:29:10Z

I don't think this test is entirely required for 1.5 so removing that milestone. Once ready, we can scope for the 1.6 branch.

github-actions · 2022-09-23T00:06:33Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

mroeschke · 2022-10-04T18:30:57Z

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

Shadimrad added 14 commits July 13, 2022 21:17

TST

5b39d02

TST

e0a27b2

TST

bd32b97

updated strign formatting

48a27f8

updated formatting

dceccf0

updating the test place within files

1904094

removing an additional parentheses

7270b34

removing the buggy file that was pushed

7fb362d

removing the buggy file that was pushed

e48a63d

fixing initial file

15be2f5

fixing order of import

204affb

fixing space

22843cc

fixing space

d1d0601

fixing space

c795d5d

Shadimrad changed the title ~~Issue2~~ BUG Inconsistency due to dtype=string #46512 Jul 19, 2022

Shadimrad added 2 commits July 19, 2022 17:07

issue2

da73d0e

Merge branch 'main' into issue2

91979a4

Update test_arithmetic.py

d002ce0

phofl reviewed Jul 20, 2022

View reviewed changes

test

a8146e9

Shadimrad changed the title ~~BUG Inconsistency due to dtype=string #46512~~ TST test for inconsistency due to dtype=string #46512 Jul 20, 2022

Shadimrad changed the title ~~TST test for inconsistency due to dtype=string #46512~~ TST: test for inconsistency due to dtype=string #46512 Jul 20, 2022

Shadimrad added 7 commits July 19, 2022 19:16

Update test_where.py

1d3e914

Update test_where.py

e92bb7c

Update generic.py

6d446f9

Update generic.py

8d60097

Update generic.py

007612f

Update test_where.py

c655774

Merge branch 'main' into issue2

6df1d30

mroeschke added the Strings String extension data type and string data label Jul 22, 2022

github-actions bot assigned Shadimrad Jul 26, 2022

Shadimrad marked this pull request as draft August 3, 2022 17:55

Shadimrad added 9 commits August 4, 2022 00:21

Update test_where.py

3452783

Update test_where.py

bcdff67

Merge branch 'issue2' of https://github.com/Shadimrad/pandas into issue2

39e53f6

Update test_where.py

ca7a705

do

4e9b374

Update test_where.py

b78896f

branch

775f979

fix

f4dec29

relocate

86b6b9a

Shadimrad marked this pull request as ready for review August 4, 2022 08:03

Merge branch 'main' into issue2

e6a73a8

Shadimrad requested review from phofl and simonjayhawkins August 4, 2022 14:29

Shadimrad added 2 commits August 16, 2022 12:16

Merge branch 'main' into issue2

cfeb393

Merge branch 'main' into issue2

acffd23

mroeschke reviewed Aug 17, 2022

View reviewed changes

Shadimrad added 2 commits August 18, 2022 16:34

Update test_string.py

d4ec9b5

Merge branch 'main' into issue2

b2a8c1f

mroeschke reviewed Aug 20, 2022

View reviewed changes

mroeschke removed this from the 1.5 milestone Aug 23, 2022

github-actions bot added the Stale label Sep 23, 2022

mroeschke closed this Oct 4, 2022

Uh oh!

TST: test for inconsistency due to dtype=string #46512 #47793

TST: test for inconsistency due to dtype=string #46512 #47793

Uh oh!

Conversation

Shadimrad commented Jul 19, 2022

Uh oh!

pep8speaks commented Jul 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2022-08-18 23:51:54 UTC

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Shadimrad commented Jul 26, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Shadimrad commented Aug 17, 2022 via email

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mroeschke commented Aug 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 23, 2022

Uh oh!

mroeschke commented Oct 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pep8speaks commented Jul 20, 2022 •

edited

Loading

mroeschke commented Aug 23, 2022 •

edited

Loading