Skip to content

BUG: DataFrame.groupby.agg has inconsistent behaviour depending on DataFrame.groupby by's Iterable length and use of DataFrame.groupby.agg's *args/**kwargs #47092

@sondalex

Description

@sondalex

Pandas version checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandas.
  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

One element in by:

import pandas as pd
import numpy as np

data = {'group_1':[1,1,2,2,3,4], 'group_2':["A", "B", "C", "C", "D", "D"], 
"x":np.random.uniform(-10,10, 6), "y":np.random.uniform(0,10,6)
}
df = pd.DataFrame(data)


df.groupby(["group_1"]).agg(lambda x : print(x))

The snippet above prints Series as intermediate objects.

0    A
1    B
Name: group_2, dtype: object
2    C
3    C
Name: group_2, dtype: object
4    D
Name: group_2, dtype: object
5    D
Name: group_2, dtype: object
0   -7.935897
1   -3.747197
Name: x, dtype: float64
2   -6.889313
3   -5.122488
Name: x, dtype: float64
4   -8.112173
Name: x, dtype: float64
5   -7.337516
Name: x, dtype: float64
0    8.209067
1    7.556609
Name: y, dtype: float64
2    3.595947
3    3.697471
Name: y, dtype: float64
4    2.58461
Name: y, dtype: float64
5    4.620535
Name: y, dtype: float64

When **kwargs are used, intermediate objects are DataFrame.

df.groupby(["group_1"]).agg(lambda x,y: print(x), y=1)
   group_1 group_2         x         y
0        1       A -7.935897  8.209067
1        1       B -3.747197  7.556609
   group_1 group_2         x         y
2        2       C -6.889313  3.595947
3        2       C -5.122488  3.697471
   group_1 group_2         x        y
4        3       D -8.112173  2.58461
   group_1 group_2         x         y
5        4       D -7.337516  4.620535

More than one element in by:

Here, no matter the use case, the intermediate objects are Series

df.groupby(["group_1", "group_2"]).agg(lambda x : print(x))
0    7.130014
Name: x, dtype: float64
1    8.12832
Name: x, dtype: float64
2   -7.127394
3   -8.320946
Name: x, dtype: float64
4   -4.383301
Name: x, dtype: float64
5    2.152988
Name: x, dtype: float64
0    3.789269
Name: y, dtype: float64
1    0.040574
Name: y, dtype: float64
2    2.372136
3    9.548322
Name: y, dtype: float64
4    3.798372
Name: y, dtype: float64
5    6.701758
Name: y, dtype: float64
df.groupby(["group_1", "group_2"]).agg(lambda x,y: print(x), y=1)
0    7.130014
Name: x, dtype: float64
1    8.12832
Name: x, dtype: float64
2   -7.127394
3   -8.320946
Name: x, dtype: float64
4   -4.383301
Name: x, dtype: float64
5    2.152988
Name: x, dtype: float64
0    3.789269
Name: y, dtype: float64
1    0.040574
Name: y, dtype: float64
2    2.372136
3    9.548322
Name: y, dtype: float64
4    3.798372
Name: y, dtype: float64
5    6.701758
Name: y, dtype: float64

Issue Description

The intermediate objects' type is not consistent when ITERABLE provided in df.groupby(by=<ITERABLE>).agg(func = lambda x:print(x), y=1) is of length one with respect to length bigger than one.

Expected Behavior

When ITERABLE is of length one and **kwargs/*args , intermediate objects should also be expected to be instances of Series.

Installed Version

commit : 4bfe3d0
python : 3.10.0.final.0
python-bits : 64
OS : Linux
OS-release : 5.14.10-300.fc35.x86_64
Version : #1 SMP Thu Oct 7 20:48:44 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.4.2
numpy : 1.22.4
pytz : 2022.1
dateutil : 2.8.2
pip : 22.1.1
setuptools : 57.4.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
markupsafe : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapBugDuplicate ReportDuplicate issue or pull requestGroupby

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions