Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix issue with apply on empty DataFrame #28213

Merged
merged 22 commits into from
Sep 20, 2019
Merged

BUG: Fix issue with apply on empty DataFrame #28213

merged 22 commits into from
Sep 20, 2019

Conversation

dsaxton
Copy link
Member

@dsaxton dsaxton commented Aug 29, 2019

Fixes a bug where the return value for certain functions was hard-coded as np.nan when operating on an empty DataFrame.

Before

>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame(columns=["a", "b", "c"])
>>> df.apply(np.sum)
a   NaN
b   NaN
c   NaN
dtype: float64
>>> df.apply(np.prod)
a   NaN
b   NaN
c   NaN
dtype: float64

After

>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame(columns=["a", "b", "c"])
>>> df.apply(np.sum)
a    0.0
b    0.0
c    0.0
dtype: float64
>>> df.apply(np.prod)
a    1.0
b    1.0
c    1.0
dtype: float64

Edit: Closes #28202 and closes #21959 after cb68153. The issue was that the arguments of self.f were already unpacked here:

return func(x, *args, **kwds)

and then we tried to do this again inside apply_empty_result which was raising an error and causing the unusual output.

@pep8speaks
Copy link

pep8speaks commented Aug 29, 2019

Hello @dsaxton! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-09-12 13:19:32 UTC

pandas/core/apply.py Outdated Show resolved Hide resolved
@jbrockmendel
Copy link
Member

needs a test

@dsaxton
Copy link
Member Author

dsaxton commented Aug 29, 2019

needs a test

@jbrockmendel Added a few tests that appear to be failing on master. There are some existing tests failing on this branch that I still need to work through.

@TomAugspurger
Copy link
Contributor

Also closes #28202

Does this close any other issues?

Can you add a release note in 1.0.0.rst?

@TomAugspurger TomAugspurger added the Apply Apply, Aggregate, Transform, Map label Aug 30, 2019
@TomAugspurger TomAugspurger added this to the 1.0 milestone Aug 30, 2019
pandas/tests/frame/test_apply.py Outdated Show resolved Hide resolved
@dsaxton
Copy link
Member Author

dsaxton commented Aug 30, 2019

Also closes #28202

Does this close any other issues?

Can you add a release note in 1.0.0.rst?

I believe it also closes #21959, which was still an issue at least as of 0.25.1 (not sure about master)

pandas/core/apply.py Show resolved Hide resolved
@@ -84,6 +84,7 @@ Performance improvements
Bug fixes
~~~~~~~~~

- Bug in :meth:`DataFrame.apply` that caused incorrect output with empty :class:`DataFrame` (:issue:`28202`, :issue:`21959`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is specific to np.* and .nunique()? if so can you be more specific

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd apply to any reduction whose output isn't empty when the input is, should I say something to that effect?

doc/source/whatsnew/v1.0.0.rst Outdated Show resolved Hide resolved
reduce = not isinstance(r, Series)
except Exception:
pass

if reduce:
return self.obj._constructor_sliced(np.nan, index=self.agg_axis)
if len(self.agg_axis):
r = self.f(Series([]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass args & kwargs

reduce = not isinstance(r, Series)
except Exception:
pass

if reduce:
return self.obj._constructor_sliced(np.nan, index=self.agg_axis)
if len(self.agg_axis):
r = self.f(Series([]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already are trying to reduce above (line 208), why are you calling the function again? does this hit the Except?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that in case reduce was already True at line 206, so that the try block wouldn't have been executed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am still puzzled why you can not pass args/kwargs

Copy link
Member Author

@dsaxton dsaxton Sep 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what's happening is the function self.f is getting curried around here:

if (kwds or args) and not isinstance(func, (np.ufunc, str)):
so when we pass the arguments again we get an error (this is from the df.nunique() test):

>           r = self.f(Series([]), *self.args, **self.kwds)
E           TypeError: f() got an unexpected keyword argument 'dropna'

because at that point f only takes a single argument. I could imagine there could end up being a problem if this currying doesn't happen, so there's probably a hidden corner case that just isn't covered by the existing tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was confused by this to but I think this makes sense. I suppose this was hitting the except before due to a TypeError for wrong number of arguments?

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the discussion on passing through args & kwargs resolved?

@dsaxton
Copy link
Member Author

dsaxton commented Sep 12, 2019

Was the discussion on passing through args & kwargs resolved?

If I turn off the try / catching completely to try to surface any other issues, then adding back the argument unpacking only causes test_nunique_empty to fail among the tests in test_apply.py (three failed tests instead of the two mentioned above), so I would say we shouldn't be passing them.

@TomAugspurger
Copy link
Contributor

Sounds good. I think we're good to merge then?

@dsaxton
Copy link
Member Author

dsaxton commented Sep 13, 2019

Sounds good. I think we're good to merge then?

I'd say so unless @jreback can foresee any other potential problems

@TomAugspurger
Copy link
Contributor

OK. Let's give it a few more days in case.

@jreback
Copy link
Contributor

jreback commented Sep 19, 2019

lgtm but need to edit the top of the PR to add closes on the other issue number

@dsaxton
Copy link
Member Author

dsaxton commented Sep 19, 2019

lgtm but need to edit the top of the PR to add closes on the other issue number

Done, thanks for the review!

@WillAyd WillAyd merged commit 95edcf1 into pandas-dev:master Sep 20, 2019
@WillAyd
Copy link
Member

WillAyd commented Sep 20, 2019

Thanks @dsaxton

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
@dsaxton dsaxton deleted the apply-fix branch April 9, 2020 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map
Projects
None yet
7 participants