-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: df.replace over pd.Period columns (#34871) #36867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is my first PR, so I would really appreciate guidance on how to improve the commit, and/or feedback on any process I am not following properly. Thanks in advance |
You're doing well so far. A couple of comments, then the next thing to do will be to add a note in doc/source/whatsnew/1.2.0.rst giving a short description of the fixed bug |
Thanks for your feedback, addressing your comments now. Also, what is the cause of the build failures? They seem to be happening to everyone, is someone working on fixing them? I was trying to find documentation of the build issue, if it is in fact a widespread thing. |
Yes, don't worry about it (already fixed in fact) |
Pardon my ignorance, is there anything else I need to address/do to get this PR pulled? Thanks for all your help @jbrockmendel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u run some asvs (indexing / reshaping ones)
this could potentially cause some perf issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also there was a reference in the OP to #35268, if this fixes as well would be great to add the test (if not ok too)
doc/source/whatsnew/v1.2.0.rst
Outdated
Period | ||
^^^^^^ | ||
|
||
- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` where :class:`Period` dtypes would be converted to object dytpes (:issue:34871) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo on dtypes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dytpes
typo is still there. BTW in general let the person who made the original comment hit the "conversation resolved" button
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha thanks for the heads up, haven't used github at all really. Fixed
Is there a server to run these on? Having issues running these locally in my docker env |
No.
are you familiar with using ipython's |
can you merge master |
Will do, sorry I've taken so long on this, haven't had much time to try and get the ASVs working |
pandas/core/internals/blocks.py
Outdated
def _can_hold_element(self, element: Any) -> bool: | ||
if is_valid_nat_for_dtype(element, self.dtype): | ||
return True | ||
if element is NaT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel ok here?
@samc1213 are all of these branches actually hit? in particular why L2008 and L2010
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback Added these due to unit tests. Specifically, L2010 is required for pandas/tests/extension/test_interval.py::TestSetitem::test_setitem_empty_indxer
to pass. L2008 may not be required, I just removed it and will see if the tests pass...
lgtm. @jbrockmendel if you'd look and merge if ok. |
cc @jbrockmendel ok here |
will take a look this afternoon |
doc/source/whatsnew/v1.2.0.rst
Outdated
@@ -434,6 +434,8 @@ Strings | |||
|
|||
Interval | |||
^^^^^^^^ | |||
|
|||
- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` where :class:`Interval` dtypes would be converted to object dtypes (:issue:34871) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backticks around the issue number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed
Regression test for corrolary to GH#34871: if series.replace(1.0, 0.0) | ||
is called on a Period/Interval Series, the old, faulty behavior | ||
is to raise TypeError. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be a comment, not a docstring, just needs to see # GH#34871
, ditto above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually you can share this test using the frame_or_series fixture
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to a comment, but I grepped for frame_or_series and not finding much in the codebase. I did find index_or_series
. Can you please elaborate more on how to use frame_or_series? There do seem to be a fair amount of duplication between these two files (series/test_replace and frame/test_replace). test_replace_with_compiled_regex
is basically copy-pasted between the two, for example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please elaborate more on how to use frame_or_series?
It's a pytest fixture:
Line 296 in aa390ec
def frame_or_series(request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @arw2019 - I needed to merge master in locally to find this
regex = re.compile("^a$") | ||
result = s.replace({regex: "z"}, regex=True) | ||
expected = pd.Series(["z", "b", "c"]) | ||
tm.assert_series_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests only belong in tests.generic.methods if all tests for that method are fully parametrized over DataFrame/Series, which these are not.
(i think this is confusing, so will likely just get rid of this directory at some point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, makes sense then. I'll just test frame and series in the same file then
@@ -1610,14 +1597,6 @@ def test_replace_dict_category_type(self, input_category_df, expected_category_d | |||
|
|||
tm.assert_frame_equal(result, expected) | |||
|
|||
def test_replace_with_compiled_regex(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is unrelated to the current PR, right? if you want to make a separate PR to parametrize this test, go for it. better to just use frame_or_series in this file and remove the version in the series tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, yeah its unrelated. Undone, and just used frame_or_series here, just trying to organize it properly (evidently was misguided :D )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
evidently was misguided
not at all, organizing/parametrizing/breaking-large-ancient-tests-into-specific-tests is something we very much want to encourage. just in dedicated PRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, definitely makes sense - thanks
@pytest.mark.parametrize("value", [pd.Period("2020-01"), pd.Interval(0, 5)]) | ||
def test_replace_ea_ignore_float(self, frame_or_series, value): | ||
# GH#34871 | ||
df = frame_or_series([value] * 3) | ||
result = df.replace(1.0, 0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is now frame_or_series, can you call it obj
instead of df
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, done
One last nitpick, otherwise LGTM, cc @jreback |
thanks @samc1213 |
@jreback @jbrockmendel Thanks so much for your patience and feedback, this was a cool experience for me. Hopefully I'll be back! Appreciate all the time you put into this great project |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
This commit ensures that PeriodArrays return False for _can_hold_element for any element that is not a pd.Period. This prevents upstream code from casting the dtype to object. Also un-xfail test written in #34904