-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
API: numeric_only in Series ops behavior #47500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
When interpreting the
e.g.
IMO given the docstring I think users could be a little surprised |
Side note regarding the |
Thanks @mroeschke; I think we should potentially consider modifying the docstring (for 2.0). I am wondering if given that option changes your preferences here.
A quick check and I'm seeing |
Most definitely. I haven't dove deeply into the
Yes, this is more specifically what I was asking. I think for Series the |
That makes sense. Allowing |
From #47561 (review) and #47561 (comment) I ran a git bisect 730b307 is the first bad commit
On both main and 1.4.x, ops like sum raise on any args and ignore any kwargs except for "axis", "dtype", "out" (numeric_only does not go into kwargs). If those three kwargs are provided, the op raises. I think we should deprecate unused args/kwargs; but it's less clear how to handle numeric_only. If we are to still include numeric_only in rolling ops for 1.5, I do think they should raise when used with Series and non-numeric dtypes. Technically this is a breaking change, but one where I think an exception is warranted (pun intended). |
This was discussed in the July dev meeting and consensus was that raising in this situation for 1.5 is preferred. |
…0 and enabling tests ### What changes were proposed in this pull request? This PR proposes to match the behavior with pandas 2.0.0 and above for stat functions, such as `sum`, `quantile`, `prod`, etc. See pandas-dev/pandas#41480 and pandas-dev/pandas#47500 for more detail. ### Why are the changes needed? To match the behavior to latest pandas. ### Does this PR introduce _any_ user-facing change? Yes, the behaviors for stat funcs are now matched with pandas 2.0.0 and above. ### How was this patch tested? Enabling & updating the existing UTs. Closes #42526 from itholic/pandas_stat. Authored-by: itholic <haejoon.lee@databricks.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
…0 and enabling tests ### What changes were proposed in this pull request? This PR proposes to match the behavior with pandas 2.0.0 and above for stat functions, such as `sum`, `quantile`, `prod`, etc. See pandas-dev/pandas#41480 and pandas-dev/pandas#47500 for more detail. ### Why are the changes needed? To match the behavior to latest pandas. ### Does this PR introduce _any_ user-facing change? Yes, the behaviors for stat funcs are now matched with pandas 2.0.0 and above. ### How was this patch tested? Enabling & updating the existing UTs. Closes apache#42526 from itholic/pandas_stat. Authored-by: itholic <haejoon.lee@databricks.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
…0 and enabling tests ### What changes were proposed in this pull request? This PR proposes to match the behavior with pandas 2.0.0 and above for stat functions, such as `sum`, `quantile`, `prod`, etc. See pandas-dev/pandas#41480 and pandas-dev/pandas#47500 for more detail. ### Why are the changes needed? To match the behavior to latest pandas. ### Does this PR introduce _any_ user-facing change? Yes, the behaviors for stat funcs are now matched with pandas 2.0.0 and above. ### How was this patch tested? Enabling & updating the existing UTs. Closes apache#42526 from itholic/pandas_stat. Authored-by: itholic <haejoon.lee@databricks.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
Part of #46560
For Series and SeriesGroupBy ops (and perhaps others like resample, rolling, window, expanding, ewm), there is an inconsistency when numeric_only is passed to ops that have it as an argument:
I see three possible options:
I am in favor of 1 - I view passing in
numeric_only=True
as erroneous; it doesn't make sense to request this from a Series even if it is a no-op. On the other hand, I view passing innumeric_only=False
as "requesting nothing", though perhaps it is slightly odd. Option 1 would also allow the default to be False across all ops (in pandas 2.0).The text was updated successfully, but these errors were encountered: