Skip to content

BUG: df.replace over pd.Period columns (#34871) #36867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Nov 17, 2020
Merged

Conversation

samc1213
Copy link
Contributor

@samc1213 samc1213 commented Oct 4, 2020

This commit ensures that PeriodArrays return False for _can_hold_element for any element that is not a pd.Period. This prevents upstream code from casting the dtype to object. Also un-xfail test written in #34904

@samc1213
Copy link
Contributor Author

samc1213 commented Oct 4, 2020

This is my first PR, so I would really appreciate guidance on how to improve the commit, and/or feedback on any process I am not following properly. Thanks in advance

@jbrockmendel
Copy link
Member

This is my first PR, so I would really appreciate guidance on how to improve the commit, and/or feedback on any process I am not following properly. Thanks in advance

You're doing well so far. A couple of comments, then the next thing to do will be to add a note in doc/source/whatsnew/1.2.0.rst giving a short description of the fixed bug

@samc1213
Copy link
Contributor Author

samc1213 commented Oct 5, 2020

This is my first PR, so I would really appreciate guidance on how to improve the commit, and/or feedback on any process I am not following properly. Thanks in advance

You're doing well so far. A couple of comments, then the next thing to do will be to add a note in doc/source/whatsnew/1.2.0.rst giving a short description of the fixed bug

Thanks for your feedback, addressing your comments now. Also, what is the cause of the build failures? They seem to be happening to everyone, is someone working on fixing them? I was trying to find documentation of the build issue, if it is in fact a widespread thing.

@pep8speaks
Copy link

pep8speaks commented Oct 5, 2020

Hello @samc1213! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-11-16 03:22:57 UTC

@jbrockmendel
Copy link
Member

Also, what is the cause of the build failures? They seem to be happening to everyone, is someone working on fixing them?

Yes, don't worry about it (already fixed in fact)

@samc1213
Copy link
Contributor Author

samc1213 commented Oct 5, 2020

Pardon my ignorance, is there anything else I need to address/do to get this PR pulled? Thanks for all your help @jbrockmendel

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u run some asvs (indexing / reshaping ones)

this could potentially cause some perf issues

@jreback jreback added Period Period data type Bug replace replace method labels Oct 6, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also there was a reference in the OP to #35268, if this fixes as well would be great to add the test (if not ok too)

Period
^^^^^^

- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` where :class:`Period` dtypes would be converted to object dytpes (:issue:34871)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo on dtypes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dytpes

typo is still there. BTW in general let the person who made the original comment hit the "conversation resolved" button

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha thanks for the heads up, haven't used github at all really. Fixed

@samc1213
Copy link
Contributor Author

samc1213 commented Oct 8, 2020

can u run some asvs (indexing / reshaping ones)

this could potentially cause some perf issues

Is there a server to run these on? Having issues running these locally in my docker env

@jbrockmendel
Copy link
Member

Is there a server to run these on?

No.

Having issues running these locally in my docker env

are you familiar with using ipython's %timeit?

@jreback
Copy link
Contributor

jreback commented Oct 14, 2020

can you merge master

@samc1213
Copy link
Contributor Author

can you merge master

Will do, sorry I've taken so long on this, haven't had much time to try and get the ASVs working

@samc1213
Copy link
Contributor Author

also there was a reference in the OP to #35268, if this fixes as well would be great to add the test (if not ok too)

It does not fix #35268 - however, I think it's a very similar code change to fix - I can take care of that once this is approved

@jreback jreback added this to the 1.2 milestone Oct 31, 2020
def _can_hold_element(self, element: Any) -> bool:
if is_valid_nat_for_dtype(element, self.dtype):
return True
if element is NaT:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel ok here?

@samc1213 are all of these branches actually hit? in particular why L2008 and L2010

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Added these due to unit tests. Specifically, L2010 is required for pandas/tests/extension/test_interval.py::TestSetitem::test_setitem_empty_indxer to pass. L2008 may not be required, I just removed it and will see if the tests pass...

@jreback
Copy link
Contributor

jreback commented Nov 2, 2020

lgtm. @jbrockmendel if you'd look and merge if ok.

@jreback
Copy link
Contributor

jreback commented Nov 4, 2020

cc @jbrockmendel ok here

@jbrockmendel
Copy link
Member

will take a look this afternoon

@@ -434,6 +434,8 @@ Strings

Interval
^^^^^^^^

- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` where :class:`Interval` dtypes would be converted to object dtypes (:issue:34871)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backticks around the issue number

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed

Regression test for corrolary to GH#34871: if series.replace(1.0, 0.0)
is called on a Period/Interval Series, the old, faulty behavior
is to raise TypeError.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be a comment, not a docstring, just needs to see # GH#34871, ditto above

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually you can share this test using the frame_or_series fixture

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to a comment, but I grepped for frame_or_series and not finding much in the codebase. I did find index_or_series. Can you please elaborate more on how to use frame_or_series? There do seem to be a fair amount of duplication between these two files (series/test_replace and frame/test_replace). test_replace_with_compiled_regex is basically copy-pasted between the two, for example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate more on how to use frame_or_series?

It's a pytest fixture:

def frame_or_series(request):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @arw2019 - I needed to merge master in locally to find this

regex = re.compile("^a$")
result = s.replace({regex: "z"}, regex=True)
expected = pd.Series(["z", "b", "c"])
tm.assert_series_equal(result, expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests only belong in tests.generic.methods if all tests for that method are fully parametrized over DataFrame/Series, which these are not.

(i think this is confusing, so will likely just get rid of this directory at some point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, makes sense then. I'll just test frame and series in the same file then

@@ -1610,14 +1597,6 @@ def test_replace_dict_category_type(self, input_category_df, expected_category_d

tm.assert_frame_equal(result, expected)

def test_replace_with_compiled_regex(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unrelated to the current PR, right? if you want to make a separate PR to parametrize this test, go for it. better to just use frame_or_series in this file and remove the version in the series tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, yeah its unrelated. Undone, and just used frame_or_series here, just trying to organize it properly (evidently was misguided :D )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evidently was misguided

not at all, organizing/parametrizing/breaking-large-ancient-tests-into-specific-tests is something we very much want to encourage. just in dedicated PRs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, definitely makes sense - thanks

@pytest.mark.parametrize("value", [pd.Period("2020-01"), pd.Interval(0, 5)])
def test_replace_ea_ignore_float(self, frame_or_series, value):
# GH#34871
df = frame_or_series([value] * 3)
result = df.replace(1.0, 0.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is now frame_or_series, can you call it obj instead of df

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, done

@jbrockmendel
Copy link
Member

One last nitpick, otherwise LGTM, cc @jreback

@jreback jreback merged commit d45903e into pandas-dev:master Nov 17, 2020
@jreback
Copy link
Contributor

jreback commented Nov 17, 2020

thanks @samc1213

@samc1213
Copy link
Contributor Author

@jreback @jbrockmendel Thanks so much for your patience and feedback, this was a cool experience for me. Hopefully I'll be back! Appreciate all the time you put into this great project

@samc1213 samc1213 deleted the 34871 branch November 17, 2020 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Period Period data type replace replace method
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Inconsistent behavior for df.replace over pd.Period columns
5 participants