-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
Pandas version checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
One element in by
:
import pandas as pd
import numpy as np
data = {'group_1':[1,1,2,2,3,4], 'group_2':["A", "B", "C", "C", "D", "D"],
"x":np.random.uniform(-10,10, 6), "y":np.random.uniform(0,10,6)
}
df = pd.DataFrame(data)
df.groupby(["group_1"]).agg(lambda x : print(x))
The snippet above prints Series
as intermediate objects.
0 A
1 B
Name: group_2, dtype: object
2 C
3 C
Name: group_2, dtype: object
4 D
Name: group_2, dtype: object
5 D
Name: group_2, dtype: object
0 -7.935897
1 -3.747197
Name: x, dtype: float64
2 -6.889313
3 -5.122488
Name: x, dtype: float64
4 -8.112173
Name: x, dtype: float64
5 -7.337516
Name: x, dtype: float64
0 8.209067
1 7.556609
Name: y, dtype: float64
2 3.595947
3 3.697471
Name: y, dtype: float64
4 2.58461
Name: y, dtype: float64
5 4.620535
Name: y, dtype: float64
When **kwargs
are used, intermediate objects are DataFrame
.
df.groupby(["group_1"]).agg(lambda x,y: print(x), y=1)
group_1 group_2 x y
0 1 A -7.935897 8.209067
1 1 B -3.747197 7.556609
group_1 group_2 x y
2 2 C -6.889313 3.595947
3 2 C -5.122488 3.697471
group_1 group_2 x y
4 3 D -8.112173 2.58461
group_1 group_2 x y
5 4 D -7.337516 4.620535
More than one element in by
:
Here, no matter the use case, the intermediate objects are Series
df.groupby(["group_1", "group_2"]).agg(lambda x : print(x))
0 7.130014
Name: x, dtype: float64
1 8.12832
Name: x, dtype: float64
2 -7.127394
3 -8.320946
Name: x, dtype: float64
4 -4.383301
Name: x, dtype: float64
5 2.152988
Name: x, dtype: float64
0 3.789269
Name: y, dtype: float64
1 0.040574
Name: y, dtype: float64
2 2.372136
3 9.548322
Name: y, dtype: float64
4 3.798372
Name: y, dtype: float64
5 6.701758
Name: y, dtype: float64
df.groupby(["group_1", "group_2"]).agg(lambda x,y: print(x), y=1)
0 7.130014
Name: x, dtype: float64
1 8.12832
Name: x, dtype: float64
2 -7.127394
3 -8.320946
Name: x, dtype: float64
4 -4.383301
Name: x, dtype: float64
5 2.152988
Name: x, dtype: float64
0 3.789269
Name: y, dtype: float64
1 0.040574
Name: y, dtype: float64
2 2.372136
3 9.548322
Name: y, dtype: float64
4 3.798372
Name: y, dtype: float64
5 6.701758
Name: y, dtype: float64
Issue Description
The intermediate objects' type is not consistent when ITERABLE
provided in df.groupby(by=<ITERABLE>).agg(func = lambda x:print(x), y=1)
is of length one with respect to length bigger than one.
Expected Behavior
When ITERABLE
is of length one and **kwargs
/*args
, intermediate objects should also be expected to be instances of Series
.
Installed Version
commit : 4bfe3d0
python : 3.10.0.final.0
python-bits : 64
OS : Linux
OS-release : 5.14.10-300.fc35.x86_64
Version : #1 SMP Thu Oct 7 20:48:44 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
pandas : 1.4.2
numpy : 1.22.4
pytz : 2022.1
dateutil : 2.8.2
pip : 22.1.1
setuptools : 57.4.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
markupsafe : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None