-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix issue with apply on empty DataFrame #28213
Conversation
needs a test |
@jbrockmendel Added a few tests that appear to be failing on master. There are some existing tests failing on this branch that I still need to work through. |
Does this close any other issues? Can you add a release note in 1.0.0.rst? |
This reverts commit 1993c7c.
doc/source/whatsnew/v1.0.0.rst
Outdated
@@ -84,6 +84,7 @@ Performance improvements | |||
Bug fixes | |||
~~~~~~~~~ | |||
|
|||
- Bug in :meth:`DataFrame.apply` that caused incorrect output with empty :class:`DataFrame` (:issue:`28202`, :issue:`21959`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is specific to np.*
and .nunique()
? if so can you be more specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd apply to any reduction whose output isn't empty when the input is, should I say something to that effect?
reduce = not isinstance(r, Series) | ||
except Exception: | ||
pass | ||
|
||
if reduce: | ||
return self.obj._constructor_sliced(np.nan, index=self.agg_axis) | ||
if len(self.agg_axis): | ||
r = self.f(Series([])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass args & kwargs
reduce = not isinstance(r, Series) | ||
except Exception: | ||
pass | ||
|
||
if reduce: | ||
return self.obj._constructor_sliced(np.nan, index=self.agg_axis) | ||
if len(self.agg_axis): | ||
r = self.f(Series([])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already are trying to reduce above (line 208), why are you calling the function again? does this hit the Except?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did that in case reduce
was already True
at line 206, so that the try
block wouldn't have been executed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am still puzzled why you can not pass args/kwargs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what's happening is the function self.f
is getting curried around here:
Line 109 in cb68153
if (kwds or args) and not isinstance(func, (np.ufunc, str)): |
df.nunique()
test):
> r = self.f(Series([]), *self.args, **self.kwds)
E TypeError: f() got an unexpected keyword argument 'dropna'
because at that point f
only takes a single argument. I could imagine there could end up being a problem if this currying doesn't happen, so there's probably a hidden corner case that just isn't covered by the existing tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was confused by this to but I think this makes sense. I suppose this was hitting the except
before due to a TypeError
for wrong number of arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the discussion on passing through args & kwargs resolved?
If I turn off the |
Sounds good. I think we're good to merge then? |
I'd say so unless @jreback can foresee any other potential problems |
OK. Let's give it a few more days in case. |
lgtm but need to edit the top of the PR to add closes on the other issue number |
Done, thanks for the review! |
Thanks @dsaxton |
Fixes a bug where the return value for certain functions was hard-coded as
np.nan
when operating on an emptyDataFrame
.Before
After
Edit: Closes #28202 and closes #21959 after cb68153. The issue was that the arguments of
self.f
were already unpacked here:pandas/pandas/core/apply.py
Line 112 in cb68153
and then we tried to do this again inside
apply_empty_result
which was raising an error and causing the unusual output.