-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffGroupbyNeeds TestsUnit test(s) needed to prevent regressionsUnit test(s) needed to prevent regressionsgood first issue
Milestone
Description
groupby-with-rank and non-unique groupers, which include nan, raise
an odd error (the 2nd one), in reality this cannot reindex properly (which is really the error).
In [103]: df = DataFrame({'A': [1., 2., 3., np.nan], 'value': 1.}, index=[pd.Timestamp('20170101', tz='US/Eastern')] * 4)
In [104]: df.groupby([df.index, 'A']).value.rank(ascending=True, pct=True)
ValueError: cannot reindex from a duplicate axis
AttributeError: 'SeriesGroupBy' object has no attribute '_aggregate_item_by_item'
but works when this is a column (and not an index)
In [105]: df.reset_index().groupby([df.index, 'A']).value.rank(ascending=True, pct=True)
Out[105]:
0 1.0
1 1.0
2 1.0
3 NaN
Name: value, dtype: float64
```
so 2 interelated bugs here.
xref to #11759
Metadata
Metadata
Assignees
Labels
AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffGroupbyNeeds TestsUnit test(s) needed to prevent regressionsUnit test(s) needed to prevent regressionsgood first issue