Skip to content

ENH: groupby.apply for Categorical groupers should preserve categories (like .agg) #10138

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue May 14, 2015 · 1 comment · Fixed by #10142
Closed

ENH: groupby.apply for Categorical groupers should preserve categories (like .agg) #10138

jreback opened this issue May 14, 2015 · 1 comment · Fixed by #10142
Labels
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented May 14, 2015

from SO

missing = pd.Categorical(list('aaa'), categories=['a', 'b'])
dense = pd.Categorical(list('abc'))
values = np.arange(len(dense))
df = pd.DataFrame({'missing': missing, 'dense': dense, 'values': values})

grouped = df.groupby(['missing', 'dense'])

# does reindex output for missing categories
grouped.mean()
grouped.agg(np.mean)

# does not reindex the output for the missing categories
grouped.apply(lambda chunk: np.mean(chunk))

So the _wrap_applied_output need a call to _reindex_output as a post-processing step.

@jreback jreback added this to the Next Major Release milestone May 14, 2015
@mortada
Copy link
Contributor

mortada commented May 15, 2015

seems straight forward, I'll submit a PR for this

@jreback jreback modified the milestones: 0.17.0, Next Major Release May 16, 2015
@jreback jreback modified the milestones: 0.16.2, 0.17.0 Jun 2, 2015
cgevans added a commit to cgevans/pandas that referenced this issue Jun 5, 2015
* https://github.com/pydata/pandas: (26 commits)
  disable some deps on 3.2 build
  Fix meantim typo
  DOC: use current ipython in doc build
  PERF: write basic datetimes faster pandas-dev#10271
  TST: fix for bottleneck >= 1.0 nansum behavior, xref pandas-dev#9422
  add numba example to enhancingperf.rst
  BUG: SparseSeries constructor ignores input data name
  BUG: Raise TypeError only if key DataFrame is not empty pandas-dev#10126
  ENH: groupby.apply for Categorical should preserve categories (closes pandas-dev#10138)
  DOC: add in whatsnew/0.17.0.txt
  DOC: move whatsnew from 0.17.0 -> 0.16.2
  BUG:  Holiday(..) with both offset and observance raises NotImplementedError pandas-dev#10217
  BUG: Index.union cannot handle array-likes
  BUG: SparseSeries.abs() resets name
  BUG: Series arithmetic methods incorrectly hold name
  ENH: Don't infer WOM-5MON if we don't support it (pandas-dev#9425)
  BUG: Series.align resets name when fill_value is specified
  BUG: GroupBy.get_group raises ValueError when group key contains NaT
  Close mysql connection in TestXMySQL to prevent tests freezing
  BUG: plot doesnt default to matplotlib axes.grid setting (pandas-dev#9792)
  ...
yarikoptic added a commit to neurodebian/pandas that referenced this issue Jul 2, 2015
* commit 'v0.16.1-97-gbc7d48f': (56 commits)
  disable some deps on 3.2 build
  Fix meantim typo
  DOC: use current ipython in doc build
  PERF: write basic datetimes faster pandas-dev#10271
  TST: fix for bottleneck >= 1.0 nansum behavior, xref pandas-dev#9422
  add numba example to enhancingperf.rst
  BUG: SparseSeries constructor ignores input data name
  BUG: Raise TypeError only if key DataFrame is not empty pandas-dev#10126
  ENH: groupby.apply for Categorical should preserve categories (closes pandas-dev#10138)
  DOC: add in whatsnew/0.17.0.txt
  DOC: move whatsnew from 0.17.0 -> 0.16.2
  BUG:  Holiday(..) with both offset and observance raises NotImplementedError pandas-dev#10217
  BUG: Index.union cannot handle array-likes
  BUG: SparseSeries.abs() resets name
  BUG: Series arithmetic methods incorrectly hold name
  ENH: Don't infer WOM-5MON if we don't support it (pandas-dev#9425)
  BUG: Series.align resets name when fill_value is specified
  BUG: GroupBy.get_group raises ValueError when group key contains NaT
  Close mysql connection in TestXMySQL to prevent tests freezing
  BUG: plot doesnt default to matplotlib axes.grid setting (pandas-dev#9792)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants