-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: group by with categorical columns causes an exception #45128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Categoricals are not memory efficient compared to int or object. They have to store categories codes etc. The operation you are performing really blows the result, so this is not that surprising I think. But investigations are welcome |
I believe this is just the confusing @ItayGabbay you need to use |
@Liam3851 thanks for the suggestion. |
Ok yes I think you've hit a bug in pandas/pandas/core/groupby/generic.py Lines 1517 to 1520 in 9512393
but it's not passing through the pandas/pandas/core/groupby/generic.py Lines 1446 to 1452 in 9512393
And so it appears it's just getting the same result as if it were unspecified. Edit: I've confirmed that a one-line fix passing |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
Issue Description
When trying to group by a DataFrame with many categories, the following exception is raised:
However, when I cast the category columns to object dtype everything works smoothly.
Expected Behavior
I would expect that I'll get the output of the groupby operation
Installed Versions
INSTALLED VERSIONS
commit : 945c9ed
python : 3.9.0.final.0
python-bits : 64
OS : Darwin
OS-release : 21.1.0
Version : Darwin Kernel Version 21.1.0: Wed Oct 13 17:33:23 PDT 2021; root:xnu-8019.41.5~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8
pandas : 1.3.4
numpy : 1.21.4
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3.1
setuptools : 58.3.0
Cython : None
pytest : 6.2.5
hypothesis : 6.31.6
sphinx : 4.3.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 7.30.1
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.5.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.7.3
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered: