Skip to content

ValueError on df.agg if a list of functions is given #31851

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
endremborza opened this issue Feb 10, 2020 · 5 comments
Open

ValueError on df.agg if a list of functions is given #31851

endremborza opened this issue Feb 10, 2020 · 5 comments
Labels
Apply Apply, Aggregate, Transform, Map Bug

Comments

@endremborza
Copy link
Contributor

Code Sample

import numpy as np
import pandas as pd

_arr = np.array([1,2,3,4,5,np.nan])
_arr2 = np.array([1,2,3,4,5,7])

df = pd.DataFrame({"A": _arr, "B": _arr2})

def perc_fun(q):
    def f(arr):
        return np.nanpercentile(arr, q)
    f.__name__ = f"p-{q}"
    return f

all is as expected here:

perc_fun(10)(_arr)
>>> 1.4

df.agg(perc_fun(10))
>>> A    1.4
>>> B    1.5
>>> dtype: float64

but this dies:

df.agg(["mean", perc_fun(10)])

TypeError Traceback (most recent call last)
~/.local/lib/python3.7/site-packages/pandas/core/base.py in _aggregate_multiple_funcs(self, arg, _axis)
553 try:
--> 554 return concat(results, keys=keys, axis=1, sort=False)
555 except TypeError:

~/.local/lib/python3.7/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
280 copy=copy,
--> 281 sort=sort,
282 )

~/.local/lib/python3.7/site-packages/pandas/core/reshape/concat.py in init(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
356 )
--> 357 raise TypeError(msg)
358

TypeError: cannot concatenate object of type '<class 'numpy.float64'>'; only Series and DataFrame objs are valid

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
in
----> 1 df.agg(["mean", perc_fun(10)])

~/.local/lib/python3.7/site-packages/pandas/core/frame.py in aggregate(self, func, axis, *args, **kwargs)
6704 result = None
6705 try:
-> 6706 result, how = self._aggregate(func, axis=axis, *args, **kwargs)
6707 except TypeError:
6708 pass

~/.local/lib/python3.7/site-packages/pandas/core/frame.py in _aggregate(self, arg, axis, *args, **kwargs)
6718 result = result.T if result is not None else result
6719 return result, how
-> 6720 return super()._aggregate(arg, *args, **kwargs)
6721
6722 agg = aggregate

~/.local/lib/python3.7/site-packages/pandas/core/base.py in _aggregate(self, arg, *args, **kwargs)
484 elif is_list_like(arg):
485 # we require a list, but not an 'str'
--> 486 return self._aggregate_multiple_funcs(arg, _axis=_axis), None
487 else:
488 result = None

~/.local/lib/python3.7/site-packages/pandas/core/base.py in _aggregate_multiple_funcs(self, arg, _axis)
530 colg = self._gotitem(col, ndim=1, subset=obj.iloc[:, index])
531 try:
--> 532 new_res = colg.aggregate(arg)
533 except (TypeError, DataError):
534 pass

~/.local/lib/python3.7/site-packages/pandas/core/series.py in aggregate(self, func, axis, *args, **kwargs)
3686 # Validate the axis parameter
3687 self._get_axis_number(axis)
-> 3688 result, how = self._aggregate(func, *args, **kwargs)
3689 if result is None:
3690

~/.local/lib/python3.7/site-packages/pandas/core/base.py in _aggregate(self, arg, *args, **kwargs)
484 elif is_list_like(arg):
485 # we require a list, but not an 'str'
--> 486 return self._aggregate_multiple_funcs(arg, _axis=_axis), None
487 else:
488 result = None

~/.local/lib/python3.7/site-packages/pandas/core/base.py in _aggregate_multiple_funcs(self, arg, _axis)
562 result = Series(results, index=keys, name=self.name)
563 if is_nested_object(result):
--> 564 raise ValueError("cannot combine transform and aggregation operations")
565 return result
566

ValueError: cannot combine transform and aggregation operations

Problem description

The same code produced a different error in 0.24 so I assume some work has been done, but the error message is not very informative and I really think this should just work by inserting the appropriate 1.4 and 1.5 values

I couldn't find any related issues, if no one has experience with this, I will look into .agg code

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 5.2.1-1.el7.elrepo.x86_64
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.1
numpy : 1.18.1
pytz : 2018.4
dateutil : 2.7.3
pip : 19.3.1
setuptools : 41.4.0
Cython : None
pytest : 4.1.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.1 (dt dec pq3 ext lo64)
jinja2 : 2.10
IPython : 7.9.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 4.1.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.2.8
tables : None
tabulate : 0.8.6
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

@TomAugspurger
Copy link
Contributor

Looks like the bug is in the result of Series.agg

In [7]: df.B.agg(perc_fun(10))
Out[7]:
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
5    7.0
Name: B, dtype: float64

I'm not sure what's going on here, but that should be a scalar, right?

@TomAugspurger TomAugspurger added Apply Apply, Aggregate, Transform, Map Bug labels Feb 10, 2020
@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Feb 10, 2020
@endremborza
Copy link
Contributor Author

Yes, that should be a scalar.
I'll look into it tomorrow then.

@endremborza
Copy link
Contributor Author

ok, so this test kills any possible change, and I dont really understand why this is how it currently works:

def test_agg_apply_evaluate_lambdas_the_same(self, string_series):
# test that we are evaluating row-by-row first
# before vectorized evaluation
result = string_series.apply(lambda x: str(x))
expected = string_series.agg(lambda x: str(x))
tm.assert_series_equal(result, expected)
result = string_series.apply(str)
expected = string_series.agg(str)
tm.assert_series_equal(result, expected)

in the actual aggregate code, the try catch simply does not fail, and the .agg just reverts to apply.

pandas/pandas/core/series.py

Lines 3665 to 3689 in 48cb5a9

def aggregate(self, func, axis=0, *args, **kwargs):
# Validate the axis parameter
self._get_axis_number(axis)
result, how = self._aggregate(func, *args, **kwargs)
if result is None:
# we can be called from an inner function which
# passes this meta-data
kwargs.pop("_axis", None)
kwargs.pop("_level", None)
# try a regular apply, this evaluates lambdas
# row-by-row; however if the lambda is expected a Series
# expression, e.g.: lambda x: x-x.quantile(0.25)
# this will fail, so we can try a vectorized evaluation
# we cannot FIRST try the vectorized evaluation, because
# then .agg and .apply would have different semantics if the
# operation is actually defined on the Series, e.g. str
try:
result = self.apply(func, *args, **kwargs)
except (ValueError, AttributeError, TypeError):
result = func(self, *args, **kwargs)
return result

if I add a

if isinstance(result, Series):
    raise ValueError

(which should absolutely not be the solution to this) the only test that breaks is the one on top, which seems to have gotten into the code 3 years ago, here: #14668

the issue we're facing here seems to have come up fairly quickly, but there are a lot of comments there and im not sure if there was any agreement

@parthi-siva
Copy link
Contributor

take

@parthi-siva parthi-siva removed their assignment Aug 2, 2022
@parthi-siva
Copy link
Contributor

take

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@parthi-siva parthi-siva removed their assignment Dec 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug
Projects
None yet
Development

No branches or pull requests

4 participants