-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: RecursionError when attempting to replace np.nan values (#45725) #45749
Conversation
@github-actions pre-commit |
doc/source/whatsnew/v1.5.0.rst
Outdated
@@ -248,6 +248,7 @@ Conversion | |||
- Bug in :meth:`Series.astype` and :meth:`DataFrame.astype` from floating dtype to unsigned integer dtype failing to raise in the presence of negative values (:issue:`45151`) | |||
- Bug in :func:`array` with ``FloatingDtype`` and values containing float-castable strings incorrectly raising (:issue:`45424`) | |||
- Bug when comparing string and datetime64ns objects causing ``OverflowError`` exception. (:issue:`45506`) | |||
- Bug when attempting to replace ``numpy.nan`` values causing ``RecursionError`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if i read the original issue correctly this then this was working in 1.4 so no actual change (its only a change in master), if so delete this note
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you're right. I removed the note from the changelog.
@@ -661,6 +661,14 @@ def test_replace_simple_nested_dict_with_nonexistent_value(self): | |||
result = df.replace({"col": {-1: "-", 1: "a", 4: "b"}}) | |||
tm.assert_frame_equal(expected, result) | |||
|
|||
@pytest.mark.parametrize("to_replace, value", [(np.nan, pd.NA), (np.nan, None)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add np.nan
as a target
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, but had to change the tests to explicitly use object
type even after replacing. Replacing np.nan
with np.nan
can cause an implicit type conversion to float64
. There is still discussion going on about this, but for now this is the same behavior in pandas 1.4.0 and 1.3.5 so it shouldn't be changed in this bug fix.
def test_replace_numpy_nan(self, to_replace, value): | ||
# GH#45725 ensure numpy.nan can be replaced with pandas.NA or None | ||
df = DataFrame({"A": [to_replace]}, dtype=object) | ||
result = df.replace({to_replace: value}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also test usage with df.replace(to_replace, value)
or is this tested close by here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes done.
@@ -36,6 +36,15 @@ def test_replace_explicit_none(self): | |||
assert expected.iloc[-1] is None | |||
tm.assert_series_equal(result, expected) | |||
|
|||
@pytest.mark.parametrize("to_replace, value", [(np.nan, pd.NA), (np.nan, None)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comments as above. consider just using the null_fixture which has all the nans
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nulls_fixture
was a good solution. Implemented.
thanks @roib20 very nice! keep em coming! |
The tests added in this PR don't seem to fail on master with Also before #45568, which maybe the cause of the regression, the values were replaced.
from bisect with following code sample
on master, same code sample gives
not sure yet, but I think we will need to remove the ``value = self._standardize_fill_value(value) # GH#45725` added in this PR to fix #45601 and #45836 might propose reverting this PR until the regressions in 1.4.0 are fixed |
but to be clear, the recursion error still exists when the fix added here is removed. Looks like need to revisit these tests. For the record the traceback is currently Traceback (most recent call last):
File "/home/simon/45725.py", line 15, in <module>
ser = ser.replace({np.nan: pd.NA})
File "/home/simon/pandas/pandas/core/series.py", line 5000, in replace
return super().replace(
File "/home/simon/pandas/pandas/core/generic.py", line 6681, in replace
return self.replace(
File "/home/simon/pandas/pandas/core/series.py", line 5000, in replace
return super().replace(
File "/home/simon/pandas/pandas/core/generic.py", line 6730, in replace
new_data = self._mgr.replace_list(
File "/home/simon/pandas/pandas/core/internals/managers.py", line 421, in replace_list
bm = self.apply(
File "/home/simon/pandas/pandas/core/internals/managers.py", line 304, in apply
applied = getattr(b, f)(**kwargs)
File "/home/simon/pandas/pandas/core/internals/blocks.py", line 725, in replace_list
result = blk._replace_coerce(
File "/home/simon/pandas/pandas/core/internals/blocks.py", line 780, in _replace_coerce
return self.replace(
File "/home/simon/pandas/pandas/core/internals/blocks.py", line 605, in replace
return blk.replace(
File "/home/simon/pandas/pandas/core/internals/blocks.py", line 605, in replace
return blk.replace(
File "/home/simon/pandas/pandas/core/internals/blocks.py", line 605, in replace
return blk.replace(
[Previous line repeated 977 more times]
File "/home/simon/pandas/pandas/core/internals/blocks.py", line 604, in replace
blk = self.coerce_to_target_dtype(value)
File "/home/simon/pandas/pandas/core/internals/blocks.py", line 449, in coerce_to_target_dtype
return self.astype(new_dtype, copy=False)
File "/home/simon/pandas/pandas/core/internals/blocks.py", line 523, in astype
new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
File "/home/simon/pandas/pandas/core/dtypes/astype.py", line 283, in astype_array_safe
new_values = astype_array(values, dtype, copy=copy)
File "/home/simon/pandas/pandas/core/dtypes/astype.py", line 215, in astype_array
if is_datetime64tz_dtype(dtype) and is_datetime64_dtype(values.dtype):
File "/home/simon/pandas/pandas/core/dtypes/common.py", line 386, in is_datetime64tz_dtype
return DatetimeTZDtype.is_dtype(arr_or_dtype)
File "/home/simon/pandas/pandas/core/dtypes/base.py", line 312, in is_dtype
if isinstance(dtype, (ABCSeries, ABCIndex, ABCDataFrame, np.dtype)):
File "/home/simon/pandas/pandas/core/dtypes/generic.py", line 45, in _check
return getattr(inst, attr, "_typ") in comp
RecursionError: maximum recursion depth exceeded while calling a Python object |
Because the behavior no longer matches the pre-regression behavior and the fact that the tests do not actually catch the reported regression, I now propose to revert this PR and reopen #45725. Even though this fix prevents the recursion, I think it will be cleaner to undo this and redo properly rather than patching it up. |
np.nan
values under main branch #45725doc/source/whatsnew/v1.5.0.rst
file if fixing a bug or adding a new feature.