-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: Add test for col names during groupby().agg() #43244
Conversation
Column names should consistently be retained when using df.groupby().agg()
["id1", "id2"] | ||
) | ||
|
||
df_sum_idx = df.sum().index.names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tests. Could you please check the whole DataFrame? Additionally it would be good, if you could parametrize here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phofl sorry for the delay
After some experimenting, here is what I came up with to parametrize the test. I'm unsure of how to parametrize the sum
function.
Also, I am sorry but could you elaborate on the meaning of the "whole dataframe" as I am not sure what you meant.
Thank you.
@pytest.mark.parametrize(
"agg_params",
[
{"start": pd.NamedAgg(column="time", aggfunc="min")},
{
"start": pd.NamedAgg(column="time", aggfunc="min"),
"peak_time": pd.NamedAgg(column="values", aggfunc="idxmax"),
},
{"peak_time": pd.NamedAgg(column="values", aggfunc="idxmax")},
],
)
def test_groupby_agg_column_names(agg_params):
# GH42332
grouped = (
DataFrame(columns=["id1", "id2", "time", "values"], dtype="int")
.groupby(["id1", "id2"])
)
aggregated = grouped.agg(**agg_params)
assert (
grouped.sum().index.names == aggregated.index.names == ["id1", "id2"]
)
|
||
expected = ["id1", "id2"] | ||
|
||
assert df_sum_idx == expected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use tm.assert_frame_equal
and construct the actual expected value.
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
Sorry, yes I am still interested in working on this |
Appears this PR has been dormant for a while and still needs updates so closing. If interested in continuing, please merge master, address related comments and we can reopen. |
Column names should consistently be retained when using df.groupby().agg()
Whats new:
Add a test to
pandas/tests/groupby/aggregate/test_aggregate.py
The issue appears to be fixed now:
Tested on 1.4.0.dev0+508.g5fe02971f8
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : 5fe0297
python : 3.9.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.11.0-31-generic
Version : #33-Ubuntu SMP Wed Aug 11 13:19:04 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_CA.UTF-8
LOCALE : en_CA.UTF-8
pandas : 1.4.0.dev0+508.g5fe02971f8
numpy : 1.20.3
pytz : 2021.1
dateutil : 2.8.2
pip : 20.3.4
setuptools : 57.4.0
Cython : 0.29.24
pytest : 6.2.4
hypothesis : 6.14.9
sphinx : 4.1.2
blosc : 1.10.4
feather : None
xlsxwriter : 3.0.1
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.26.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 2021.05.0
fastparquet : 0.7.1
gcsfs : 2021.05.0
matplotlib : 3.4.3
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 5.0.0
pyxlsb : None
s3fs : 2021.05.0
scipy : 1.7.1
sqlalchemy : 1.4.23
tables : 3.6.1
tabulate : 0.8.9
xarray : 0.18.2
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.54.0