Skip to content

BUG: Incorrect index shape when using a user-defined function for aggregating a grouped series with object-typed index. #40835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Apr 15, 2021
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
5ded786
Add failing test.
DriesSchaumont Apr 2, 2021
30cd16e
Merge remote-tracking branch 'upstream/master' into fix-40014
DriesSchaumont Apr 7, 2021
e97b413
Merge remote-tracking branch 'upstream/master' into fix-40014
DriesSchaumont Apr 8, 2021
c5e1cf7
Merge remote-tracking branch 'upstream/master' into fix-40014
DriesSchaumont Apr 8, 2021
9ccb324
Merge remote-tracking branch 'upstream/master' into fix-40014
DriesSchaumont Apr 8, 2021
04d61b5
Dont use fastpath for series whith object index.
DriesSchaumont Apr 8, 2021
d4d9fb5
Add comment.
DriesSchaumont Apr 9, 2021
307bbe5
Add whatsnew
DriesSchaumont Apr 9, 2021
ace5f81
Merge remote-tracking branch 'upstream/master' into fix-40014
DriesSchaumont Apr 9, 2021
bb7f09e
Move test.
DriesSchaumont Apr 9, 2021
4f31837
Check test expected result.
DriesSchaumont Apr 9, 2021
db9b29b
Move whatsnew
DriesSchaumont Apr 9, 2021
fa4291a
Merge remote-tracking branch 'upstream/master' into fix-40014
DriesSchaumont Apr 9, 2021
b0dfbcd
Adjustment for review
DriesSchaumont Apr 9, 2021
4ddcc12
Merge remote-tracking branch 'upstream/master' into fix-40014
DriesSchaumont Apr 9, 2021
a21be6f
New solution.
DriesSchaumont Apr 12, 2021
7a4a793
Merge remote-tracking branch 'upstream/master' into fix-40014
DriesSchaumont Apr 12, 2021
4bf3f20
Fix len error.
DriesSchaumont Apr 12, 2021
0c16e6c
Fix index caching.
DriesSchaumont Apr 12, 2021
f457dff
Merge remote-tracking branch 'upstream/master' into fix-40014
DriesSchaumont Apr 12, 2021
3eb7b79
Fix index caching (2)
DriesSchaumont Apr 13, 2021
d608b5b
Merge remote-tracking branch 'upstream/master' into fix-40014
DriesSchaumont Apr 13, 2021
86db870
Styling and remove stray print statement
DriesSchaumont Apr 13, 2021
ded8433
Adjustments for review
DriesSchaumont Apr 13, 2021
24a1344
Merge remote-tracking branch 'upstream/master' into fix-40014
DriesSchaumont Apr 13, 2021
5cca8d2
Merge remote-tracking branch 'upstream/master' into fix-40014
DriesSchaumont Apr 13, 2021
337edbd
Merge remote-tracking branch 'upstream/master' into fix-40014
DriesSchaumont Apr 14, 2021
b89eee0
Add issue number, change almost_equal to assert_equal
DriesSchaumont Apr 14, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pandas/core/groupby/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -738,7 +738,7 @@ def agg_series(self, obj: Series, func: F):
# TODO: can we get a performant workaround for EAs backed by ndarray?
return self._aggregate_series_pure_python(obj, func)

elif obj.index._has_complex_internals:
elif obj.index._has_complex_internals or obj.index.dtype == "object":
# Preempt TypeError in _aggregate_series_fast
return self._aggregate_series_pure_python(obj, func)

Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/groupby/test_function.py
Original file line number Diff line number Diff line change
Expand Up @@ -1087,3 +1087,11 @@ def test_groupby_sum_below_mincount_nullable_integer():
result = grouped.sum(min_count=2)
expected = DataFrame({"b": [pd.NA] * 3, "c": [pd.NA] * 3}, dtype="Int64", index=idx)
tm.assert_frame_equal(result, expected)


def test_groupby_index_object_dtype():
# GH 40014
df = DataFrame({"c0": ["x", "x", "x"], "c1": ["x", "x", "y"], "p": [0, 1, 2]})
df.index = df.index.astype("O")
grouped = df.groupby(["c0", "c1"])
grouped.p.agg(lambda x: all(x > 0))