-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffBugGroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateRegressionFunctionality that used to work in a prior pandas versionFunctionality that used to work in a prior pandas version
Milestone
Description
Closely related to #48476
As far as I can tell this only occurs when the input dtype to groupby is object.
df = pd.DataFrame({'a': [np.nan, pd.NA, None], 'b': [1, 2, 3]})
gb = df.groupby('a', sort=True, dropna=False)
print(gb.sum())
# b
# a
# NaN 6
but with sort=False
:
df = pd.DataFrame({'a': [np.nan, pd.NA, None], 'b': [1, 2, 3]})
gb = df.groupby('a', sort=False, dropna=False)
print(gb.sum())
# b
# a
# NaN 1
# <NA> 2
# None 3
I think we should prefer the sort=True
behavior as that is the default value for now, but prefer sort=False
in the long run.
Metadata
Metadata
Assignees
Labels
AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffBugGroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateRegressionFunctionality that used to work in a prior pandas versionFunctionality that used to work in a prior pandas version