-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: idxmin & idxmax axis = 1 str reducer for transform #50329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix!
As it's somewhat relevant here, I'll mention that I think we should deprecate the axis argument across most (all?) groupby methods where it is not useful, but this is to be done separately.
pandas/core/groupby/generic.py
Outdated
axis: Axis = 0, | ||
skipna: bool = True, | ||
numeric_only: bool = False, | ||
self, axis: Axis = 0, skipna: bool = True, numeric_only: bool = False, **kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be adding kwargs to the function signature. Whenever self.axis=1
and axis = 0
in idxmin we will raise (whether using with transform or not), so I'd suggest instead just ignoring the axis
argument altogether in this case.
Assuming this sounds like a good route, can you change the axis argument in the docstring to state that it is ignored. Right now we're using the shared_docs; I'd recommend copying this and adding it as an explicit docstring to this method - that would then allow us in the future to add groupby-specific docs for this method and correct some difference in behavior between DataFrame and DataFrameGroupBy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the feedback, it does sound like a good route. As for the axis
parameter, it does not seem intuitive to raise when we are stating in the docs that the argument will be ignored. We could however, instead of ignoring axis
, just mention in the docs that axis
and self.axis
should match. If they don't match, we will raise with the message - they should match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the
axis
parameter, it does not seem intuitive to raise when we are stating in the docs that the argument will be ignored.
I think this is in reply to my sentence Whenever self.axis=1 and axis = 0 in idxmin we will raise (whether using with transform or not), so I'd suggest...
. When I said "we will raise", I mean "currently pandas raises". In other words, I think it's okay to ignore the axis
argument because currently no one can be using .groupby(..., axis=1).idxmin(axis=0)
.
We could however, instead of ignoring axis, just mention in the docs that axis and self.axis should match. If they don't match, we will raise with the message - they should match.
I agree this is a viable route, but I think a bit odd. This is perhaps an over simplification, but it's like having a function
def foo(x=1):
if x != 2:
raise ValueError("You must set x to be 2")
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I said "we will raise", I mean "currently pandas raises". In other words, I think it's okay to ignore the axis argument because currently no one can be using .groupby(..., axis=1).idxmin(axis=0).
Okay now I understand what you mean. For now we will ignore axis
, in the future remove it and directly take axis
from the grouper.
I have implemented the required changes in the latest commit. We ignore the |
pandas/core/groupby/generic.py
Outdated
>>> df = pd.DataFrame({{'consumption': [10.51, 103.11, 55.48], | ||
... 'co2_emissions': [37.2, 19.66, 1712]}}, | ||
... index=['Pork', 'Wheat Products', 'Beef']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doctest appears to be failing because of the {{
. I think keeping a single brace should fix it, but not sure if we need it as an escape character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the previous docstring had two {{
due to its use as a template; agreed there should only be one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, small requests below. Also, would you be willing to do idxmax here as well?
Also needs a whatsnew entry in 2.0.0 under the bugfix section within groupby.
pandas/core/groupby/generic.py
Outdated
The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for column-wise. | ||
The axis argument is ignored, instead we use the grouper's axis. | ||
|
||
.. versionchanged:: 1.5.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2.0.0 here; in generally only regression fixes make it to patch versions.
pandas/core/groupby/generic.py
Outdated
dtype: object | ||
""" | ||
|
||
axis = self.axis | ||
axis = DataFrame._get_axis_number(axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L1958 here is now unnecessary, as self.axis is always 0 or 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense
Sure, I will do that. |
@iofall - I think this looks good, just need to resolve a whatsnew conflict |
doc/source/whatsnew/v2.0.0.rst
Outdated
@@ -944,7 +944,7 @@ Groupby/resample/rolling | |||
- Bug in :meth:`.SeriesGroupBy.nunique` would incorrectly raise when the grouper was an empty categorical and ``observed=True`` (:issue:`21334`) | |||
- Bug in :meth:`.SeriesGroupBy.nth` would raise when grouper contained NA values after subsetting from a :class:`DataFrameGroupBy` (:issue:`26454`) | |||
- Bug in :meth:`DataFrame.groupby` would not include a :class:`.Grouper` specified by ``key`` in the result when ``as_index=False`` (:issue:`50413`) | |||
- | |||
- Bug in :meth:`.DataFrameGroupBy.transform` and :meth:`.SeriesGroupBy.transform` would raise incorrectly when grouper had ``axis=1`` for ``"idxmin"`` and ``"idxmax"`` arguments (:issue:`45986`). The method now ignores the ``axis`` argument, instead we use the grouper's ``axis``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this one sentence, no period, ending with (:issue:`45986`)
as the other lines do. I think it's okay to leave off the last sentence.
I looked at the failing test, we might need to remove / rewrite it, since it depends on supplying pandas/pandas/tests/groupby/test_function.py Lines 509 to 511 in a28cadb
Here, we groupby on line 509, this happens with >>> gb = df.groupby("A")
>>> gb.groups
{1: [0, 3, 6], 2: [1, 4, 7], 3: [2, 5, 8], 4: [9]}
>>> gb = df.groupby("A", axis = 1)
>>> gb.groups
{} Thus the I am not sure how to proceed but we can try:
|
I was under the impression
+1 on this |
Sorry I did not convey this properly, it does raise but only when the number of rows and columns mismatch in the issue's minimal example. # raises
df = pd.DataFrame({"a": [1, 2,], "b": [3, 4], "c": [5, 6]}, index=["x", "y"])
result = df.groupby([0, 1, 1], axis=0).transform("idxmin", axis = 1)
# passes
df = pd.DataFrame({"a": [1, 2, 3], "b": [3, 4, 5], "c": [5, 6, 7]}, index=["x", "y", "z"])
result = df.groupby([0, 1, 1], axis=0).transform("idxmin", axis = 1)
What do you recommend?
|
@iofall - The code
raises because you have two rows to group and you're specifying groups for three rows.
This sounds like a great way forward. |
Sorry I meant even this raises - df = pd.DataFrame({"a": [1, 2,], "b": [3, 4], "c": [5, 6]}, index=["x", "y"])
result = df.groupby([0, 1, 1], axis=1).transform("idxmin", axis=0) |
I think we have the right changes now, we have completed what the issue had asked for: This example now works and all the tests are passing as well. df = DataFrame({"a": [1, 2], "b": [3, 4], "c": [5, 6]}, index=["x", "y"])
result = df.groupby([0, 0, 1], axis=1).transform("first") However, examples like these still raise df = pd.DataFrame({"a": [1, 2,], "b": [3, 4], "c": [5, 6]}, index=["x", "y"])
result = df.groupby([0, 1, 1], axis=1).transform("idxmin", axis=0) Although this should be included in another issue or PR, but here is some investigation pandas/pandas/core/groupby/groupby.py Lines 1070 to 1076 in 209de28
idxmin and idxmax , when the indexes mismatch, we end up at this function, which tries to reset the axis for each of the v s in the values that are obtained after grouping. However Series objects don't have axis which results in the error we see in the above idxmin call.
If we put a conditional such that a b c
a x NaN NaN
b NaN x x
c NaN x x This might help solve this issue. |
Thanks for the update, will take a look in the next day or so. Just wanted to give my thoughts here:
I think the axis argument should be deprecated entirely; see #50405. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
pandas/core/groupby/generic.py
Outdated
def idxmax( | ||
self, | ||
axis: Axis = 0, | ||
axis: Axis = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is causing mypy to fail on the CI. Need to type-hint as Axis | None
. Same for idxmin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have changed it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm; failure is unrelated (SQL test)
Thanks @iofall - very nice! |
Thanks! It was fun 🎉 |
idxmin
for BUG: Some str reducers fail in Groupby.transform with axis=1 #45986doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.To Do:
test_api_consistency
due to changed signatures._shared_docs
foridxmin
to reflect the changes in the signature.Description
.transform("idxmin")
&.transform("idxmax")
now works withaxis=1
groupby objects.axis=0
argument is ignored. Instead we directly use the grouper's axis.idxmin
call which still respects theaxis
argument.