-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: test for inconsistency due to dtype=string #46512 #47793
Conversation
Hello @Shadimrad! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2022-08-18 23:51:54 UTC |
pandas/core/generic.py
Outdated
@@ -9500,6 +9500,11 @@ def _where( | |||
self._check_inplace_setting(other) | |||
new_data = self._mgr.putmask(mask=cond, new=other, align=align) | |||
result = self._constructor(new_data) | |||
for i in range(len(result.dtypes)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the wrong place for this. We can not special case this here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind elaborating on what you mean? I believe it is not a special case since it just affects the type in the case that the inplace is True. Do you mean I should put it within the putmask? @phofl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, but this looks already correct on main, so no need to fix I think.
1.5.0.dev0+1180.g8c3a2f2ba7
<StringArray>
['1', <NA>, '3']
Length: 3, dtype: string
<class 'pandas._libs.missing.NAType'>
A
0 1
1 <NA>
2 3
<StringArray>
['1', <NA>, '3']
Length: 3, dtype: string
<class 'pandas._libs.missing.NAType'>
Could you simply add a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh! sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, but this looks already correct on main, so no need to fix I think.
see #46512 (comment)
I'll confirm the commit where the fix occurred and if we agree that this is the correct behavior, then indeed we just need a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is essentially the same
As #47628
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes from #47628 (comment)
But the strange thing is that doing this on the Series level doesn't end up calling StringArray.setitem, it seems to go through Series._where and eventually BlockManager.putmask, and ExtensionArray._putmask, and that last one is not correctly implemented for StringArray.
but we should probably have tests for DataFrame.where
also incase the implementation of __setitem__
changes to no longer go through Series._where
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity, how would it be implemented if we should not have special cased it? Is there a link to the change that fixed it on the main by any chance? @phofl
take |
def test_consitency_inplace(): | ||
df = pd.DataFrame({"M": [""]}, dtype="string") | ||
df2 = pd.DataFrame({"M": [""]}, dtype="string") | ||
df2.where(df2 != "", np.nan, inplace=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to compare df
and df2
to a separately created DataFrame e.g.
expected = pd.DataFrame(...)
tm.assert_frame_equal(df, expected)
tm.assert_frame_equal(df2, expected)
Sure. I'll fix it.
…On Wed, 17 Aug 2022, 10:01 am Matthew Roeschke, ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pandas/tests/arrays/string_/test_string.py
<#47793 (comment)>:
> @@ -611,3 +611,11 @@ def test_setitem_scalar_with_mask_validation(dtype):
msg = "Scalar must be NA or str"
with pytest.raises(ValueError, match=msg):
ser[mask] = 1
+
+
+def test_consitency_inplace():
+ df = pd.DataFrame({"M": [""]}, dtype="string")
+ df2 = pd.DataFrame({"M": [""]}, dtype="string")
+ df2.where(df2 != "", np.nan, inplace=True)
Would be good to compare df and df2 to a separately created DataFrame e.g.
expected = pd.DataFrame(...)
tm.assert_frame_equal(df, expected)
tm.assert_frame_equal(df2, expected)
—
Reply to this email directly, view it on GitHub
<#47793 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AWI7Q465NG3MD5XUBTTZ7DDVZULFDANCNFSM54BVWMNA>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
df = pd.DataFrame({"M": [""]}, dtype="string") | ||
df.where(df != "", np.nan, inplace=True) | ||
expected = expected.where(expected != "", np.nan) | ||
tm.assert_frame_equal(expected, df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my prior comment, I was referring to that there should be 2 tm.assert_frame_equal
so that both inplace=True/False
are tested separately e.g.
expected = pd.DataFrame({"M": [""]}, dtype="string")
df_inplace = ...
tm.assert_frame_equal(df_inplace, expected)
df_not_inplace = ...
tm.assert_frame_equal(df_not_inplace, expected)
I don't think this test is entirely required for 1.5 so removing that milestone. Once ready, we can scope for the 1.6 branch. |
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |
pandas/core/generic.py
file if fixing a bug or adding a new feature.