-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REGR: DataFrame.replace when the replacement value was explicitly None #46404
REGR: DataFrame.replace when the replacement value was explicitly None #46404
Conversation
cool, looks like a conflict |
thanks @simonjayhawkins |
@meeseeksdev backport 1.4.x |
@meeseeksdev backport 1.4.x |
…ement value was explicitly None
…e was explicitly None (#46441) Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
@@ -777,6 +777,13 @@ def _replace_coerce( | |||
mask=mask, | |||
) | |||
else: | |||
if value is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why here instead of in 'replace'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because on main the recursion error occurs and we normally fix backports by opening PR against main and then backporting rather than against the backport branch directly.
Also, we would split the blocks in Block.replace which we didn't do on 1.3.5 and the regression fix restores previous behavior for now, see #45601 (comment).
I think is we do move to replace after the recursion is fixed we could also backport as a bug fix if we think that the block splitting is desirable to be consistent for 1.4.x
None handling is also slightly different in Block.replace than for a list-like so I suspect would need some other changes which happy as a followup on master.
This PR was a very targeted regression fix as a suitable backport for 1.4.x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense, thanks.
@@ -661,6 +661,20 @@ def test_replace_simple_nested_dict_with_nonexistent_value(self): | |||
result = df.replace({"col": {-1: "-", 1: "a", 4: "b"}}) | |||
tm.assert_frame_equal(expected, result) | |||
|
|||
def test_replace_NA_with_None(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in all of the relevant examples the both the value being replaced and the replacement are NA. are these the only affected cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC for a list like to_replace None is treated explicitly at the moment, whereas if using a scalar None, the behavior is different in some cases. My understanding is that users are therefore using a dictionary to get the explicit replacement behavior. To make these consistent, we would need to deprecate this?
pd.NA
byNone
has no effect #45601doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.