-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG fix deprecation of limit
and fill_method
in pct_change
#55527
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, I'm guessing tests need to be updated.
pandas/core/generic.py
Outdated
if fill_method is not lib.no_default or limit is not lib.no_default: | ||
# GH#53491: deprecate the `fill_method` and `limit` keyword, except | ||
# `fill_method=None` that does not fill missing values | ||
if fill_method not in (lib.no_default, None) and limit is not lib.no_default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need warning messages for every single case. Does something like
The 'fill_method' being not None and the 'limit' argument are deprecated. Either fill in NA values prior to calling pct_change or specify fill_method=None to not fill NA values.
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may want to prompt users to use fill_method=None
even if they have filled NA values. See the original comment of this PR. The case is, even after ffill
or bfill
, calling pct_change
without keyword will raise deprecation warning unless we explicitly check if there are NA values to fill. However, I don't think this is a good approach: (1) this may add too much overhead, and (2) if a user is not filling NA values and uses pct_change
without keyword, and if the data occasionally does not contain NA values, he/she will not get a warning message and the logic would be incorrect.
Due to these reasons, I think this deprecation would be especially confusing, especially since we are having "incorrect" deprecation warnings in the current version. That's why I'm trying to give extra specific guide for each case. If maintainers do not think this is necessary, I can implement using only a single message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, I need a confirmation about whether we should prompt users to do obj.ffill/bfill().pct_change(fill_method=None)
or obj.ffill/bfill().pct_change()
. Personally I prefer the former as I explained in the previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I don't think this is a good approach: (1) this may add too much overhead, and (2) if a user is not filling NA values and uses
pct_change
without keyword, and if the data occasionally does not contain NA values, he/she will not get a warning message and the logic would be incorrect.
I'm seeing the current logic takes 7.5% of the runtime for the current warning on a Series with 100k rows - I don't think overhead is a concern. This will cause users to modify their code unnecessarily in what I think is the uncommon case. They will then need to change their code again when we deprecate the fill_method
argument. I do not think we should do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will implement your suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps "Either fill in any non-leading NA values" is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed with @rhshadrach
@rhshadrach I've implemented your suggestions, please check if it is correct. I've also updated the test cases and now they seem to pass correctly. Not sure if I need to add some additional tests? By the way, for instance >>> ser = pd.Series([np.nan, 1, 2, 3, np.nan])
>>> ser.bfill().pct_change() still raises a warning. Should we fix that? |
/preview |
Website preview of this PR available at: https://pandas.pydata.org/preview/55527/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Small request on the tests.
Also - this needs a whatsnew note. |
Done, not sure if the changelog should be added in Also, may I have your response to #55527 (comment) please @rhshadrach? |
From #55527 (comment):
I believe the behavior is going to change between now and pandas 3.0. So that means we need to warn. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed one request in the whatsnew, otherwise I think we're all set.
Done @rhshadrach, thanks for your review! |
Bumping off the milestone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks @Charlie-XIAO |
This comment was marked as outdated.
This comment was marked as outdated.
@meeseeksdev backport 2.1.x |
Owee, I'm MrMeeseeks, Look at me. There seem to be a conflict, please backport manually. Here are approximate instructions:
And apply the correct labels and milestones. Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon! Remember to remove the If these instructions are inaccurate, feel free to suggest an improvement. |
…l_method` in `pct_change`
doc/source/whatsnew/vX.X.X.rst
fileI'm thinking that we should not over-complicate things by allowing
obj.pct_change()
. The thing is, even if users doobj.ffill/bfill().pct_change()
, a warning will still be raised unless we check for NA values, for instance, #54981. I propose that we should let users useobj.ffill/bfill().pct_change(fill_method=None)
to suppress the warnings, and this PR intends to give extra specific deprecation warning messages.I haven't added test cases for it, but want to make sure if this is the right way to go. Ping @rhshadrach and @jbrockmendel who were involved in the discussion in the original issue. I'm seeing many users and libraries complaining about this might be better to get this done before the next release.