Skip to content

TST: Add test for col names during groupby().agg() #43244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions pandas/tests/groupby/aggregate/test_aggregate.py
Original file line number Diff line number Diff line change
Expand Up @@ -1274,3 +1274,35 @@ def func(ser):

expected = DataFrame([[1.0]], index=[1])
tm.assert_frame_equal(res, expected)


def test_groupby_agg_column_names():
# GH42332

df = DataFrame(columns=["id1", "id2", "time", "values"], dtype="int").groupby(
["id1", "id2"]
)

df_sum_idx = df.sum().index.names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tests. Could you please check the whole DataFrame? Additionally it would be good, if you could parametrize here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phofl sorry for the delay

After some experimenting, here is what I came up with to parametrize the test. I'm unsure of how to parametrize the sum function.

Also, I am sorry but could you elaborate on the meaning of the "whole dataframe" as I am not sure what you meant.

Thank you.

@pytest.mark.parametrize(
    "agg_params",
    [
            {"start": pd.NamedAgg(column="time", aggfunc="min")},
            {
                "start": pd.NamedAgg(column="time", aggfunc="min"),
                "peak_time": pd.NamedAgg(column="values", aggfunc="idxmax"),
            },
            {"peak_time": pd.NamedAgg(column="values", aggfunc="idxmax")},
    ],
)
def test_groupby_agg_column_names(agg_params):
    # GH42332
    grouped = (
        DataFrame(columns=["id1", "id2", "time", "values"], dtype="int")
        .groupby(["id1", "id2"])
    )

    aggregated = grouped.agg(**agg_params)

    assert (
        grouped.sum().index.names == aggregated.index.names == ["id1", "id2"]
    )


df_agg1_idx = df.agg(
**{"start": pd.NamedAgg(column="time", aggfunc="min")}
).index.names

df_agg2_idx = df.agg(
**{
"start": pd.NamedAgg(column="time", aggfunc="min"),
"peak_time": pd.NamedAgg(column="values", aggfunc="idxmax"),
}
).index.names

df_agg3_idx = df.agg(
**{"peak_time": pd.NamedAgg(column="values", aggfunc="idxmax")}
).index.names

expected = ["id1", "id2"]

assert df_sum_idx == expected
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use tm.assert_frame_equal and construct the actual expected value.

assert df_agg1_idx == expected
assert df_agg2_idx == expected
assert df_agg3_idx == expected