-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: DataFrame.groupby with axis=1 is deprecated.
<-- please do not deprecate it
#56226
Comments
this looks fine to be honest - transposing is expensive and shouldn't be done under-the-hood in my opinion |
Then the best solution for this issue is to implement a performant |
Thanks for the feedback here! Though we may disagree, it's important to hear from users on issues such as this.
I don't see how that's possible short of adding a whole new implementation for groupby just for axis=1. The current implementation is already a large maintenance burden, and this would exacerbate the issue.
You should be able to use
I don't think this helps the conversation. My view is that it sees much less usage compared to |
Thank you @rhshadrach! I'll go with the two-fold transposing solution for now, even though it cramps my Zen. Pandas is at the core of a data analysis library I'm developing and it uses MultiIndex for every index and for columns a fair bit, too. In terms of usability it makes a significant difference to be able to group by column levels, so I'm glad as long as y'all be putting up with the maintenance burden of it. 🙏 |
I will just add two cents that Is there a recommended compact equivalent to |
Why can't you just transpose your DataFrame? |
We talked a little about this at the dev meeting today. The consensus here was that the added maintenance burden isn't worth it, just to save doing some transpose calls. |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Related to #51203, #51778 where people fail to see the use of
.groupby(axis=1)
. These issues fail to name good reasons for deprecating this that could counter this: It should be retained for API consistency and for backward compatibility. It does not matter if transforms are happening under the hood, as a user I don't care. I care about consistency and being able to use my code in the future.So here comes the problem: Here's a dataframe with two-by-two multicolumns, four columns in total. The aim is to eliminate the second level and to end up with only two columns in total (according to the first level) that should contain tuples being pairs which correspond to a combination of the two squashed columns from the second level (
desired_result
). The problem is, when, instead of grouping columns and applyingtuple
to rows, I transform, group, and applytuple
to all colums, the output oftuple
, by default, will be interpreted as a transformation of the column (corresponding toresult_type="expand"
behaviour) and not as a "squashed" tuple that would get rid of the second, now-index level (corresponding toresult_type="reduce"
behaviour). That is, I still end up with four columns. Even worse, the fact thatgroup_keys=True
by default results in an additional column level duplicating the original first one.As shown above, the fix is to include
.apply(tuple, result_type="reduce")
. In this case, no need to includegroup_keys=False
to avoid the duplicated column level.So my plea, again: Please do not deprecate
.groupby(axis=1)
. Rather, seek to make the behaviours consistent.Installed Versions
INSTALLED VERSIONS
commit : 2a953cf
python : 3.10.13.final.0
python-bits : 64
OS : Linux
OS-release : 6.1.62-1-MANJARO
Version : #1 SMP PREEMPT_DYNAMIC Thu Nov 9 03:01:44 UTC 2023
machine : x86_64
processor :
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.1.3
numpy : 1.26.2
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 69.0.2
pip : 23.3.1
Cython : None
pytest : None
hypothesis : None
sphinx : 5.3.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.18.0
pandas_datareader : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.2
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.11.4
sqlalchemy : 2.0.23
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None
...lib/python3.10/site-packages/_distutils_hack/init.py:33: UserWarning:
Setuptools is replacing distutils.
The text was updated successfully, but these errors were encountered: