Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: Add test for col names during groupby().agg() #43244

Closed
wants to merge 1 commit into from

Conversation

calvh
Copy link
Contributor

@calvh calvh commented Aug 27, 2021

Column names should consistently be retained when using df.groupby().agg()

Whats new:
Add a test to pandas/tests/groupby/aggregate/test_aggregate.py

The issue appears to be fixed now:

Tested on 1.4.0.dev0+508.g5fe02971f8

Screenshot from 2021-08-27 00-06-24

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 5fe0297
python : 3.9.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.11.0-31-generic
Version : #33-Ubuntu SMP Wed Aug 11 13:19:04 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_CA.UTF-8
LOCALE : en_CA.UTF-8

pandas : 1.4.0.dev0+508.g5fe02971f8
numpy : 1.20.3
pytz : 2021.1
dateutil : 2.8.2
pip : 20.3.4
setuptools : 57.4.0
Cython : 0.29.24
pytest : 6.2.4
hypothesis : 6.14.9
sphinx : 4.1.2
blosc : 1.10.4
feather : None
xlsxwriter : 3.0.1
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.26.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 2021.05.0
fastparquet : 0.7.1
gcsfs : 2021.05.0
matplotlib : 3.4.3
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 5.0.0
pyxlsb : None
s3fs : 2021.05.0
scipy : 1.7.1
sqlalchemy : 1.4.23
tables : 3.6.1
tabulate : 0.8.9
xarray : 0.18.2
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.54.0

Column names should consistently be retained when using df.groupby().agg()
["id1", "id2"]
)

df_sum_idx = df.sum().index.names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tests. Could you please check the whole DataFrame? Additionally it would be good, if you could parametrize here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phofl sorry for the delay

After some experimenting, here is what I came up with to parametrize the test. I'm unsure of how to parametrize the sum function.

Also, I am sorry but could you elaborate on the meaning of the "whole dataframe" as I am not sure what you meant.

Thank you.

@pytest.mark.parametrize(
    "agg_params",
    [
            {"start": pd.NamedAgg(column="time", aggfunc="min")},
            {
                "start": pd.NamedAgg(column="time", aggfunc="min"),
                "peak_time": pd.NamedAgg(column="values", aggfunc="idxmax"),
            },
            {"peak_time": pd.NamedAgg(column="values", aggfunc="idxmax")},
    ],
)
def test_groupby_agg_column_names(agg_params):
    # GH42332
    grouped = (
        DataFrame(columns=["id1", "id2", "time", "values"], dtype="int")
        .groupby(["id1", "id2"])
    )

    aggregated = grouped.agg(**agg_params)

    assert (
        grouped.sum().index.names == aggregated.index.names == ["id1", "id2"]
    )

@jreback jreback added this to the 1.4 milestone Aug 31, 2021
@jreback jreback added Groupby Testing pandas testing functions or related to the test suite labels Aug 31, 2021

expected = ["id1", "id2"]

assert df_sum_idx == expected
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use tm.assert_frame_equal and construct the actual expected value.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 1, 2021

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Oct 1, 2021
@calvh
Copy link
Contributor Author

calvh commented Oct 1, 2021

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

Sorry, yes I am still interested in working on this

@mroeschke
Copy link
Member

Appears this PR has been dormant for a while and still needs updates so closing. If interested in continuing, please merge master, address related comments and we can reopen.

@mroeschke mroeschke closed this Oct 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Stale Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: groupby().agg() loses column names for an empty dataframe with 'idxmax' as an aggregation function
4 participants