@@ -396,6 +396,58 @@ documentation. If you build an extension array, publicize it on our
396396
397397.. _cyberpandas: https://cyberpandas.readthedocs.io/en/latest/
398398
399+ .. _whatsnew_0230.enhancements.categorical_grouping:
400+
401+ Categorical Groupers has gained an observed keyword
402+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
403+
404+ In previous versions, grouping by 1 or more categorical columns would result in an index that was the cartesian product of all of the categories for
405+ each grouper, not just the observed values.``.groupby()`` has gained the ``observed`` keyword to toggle this behavior. The default remains backward
406+ compatible (generate a cartesian product). (:issue:`14942`, :issue:`8138`, :issue:`15217`, :issue:`17594`, :issue:`8669`, :issue:`20583`)
407+
408+
409+ .. ipython:: python
410+
411+ cat1 = pd.Categorical(["a", "a", "b", "b"],
412+ categories=["a", "b", "z"], ordered=True)
413+ cat2 = pd.Categorical(["c", "d", "c", "d"],
414+ categories=["c", "d", "y"], ordered=True)
415+ df = pd.DataFrame({"A": cat1, "B": cat2, "values": [1, 2, 3, 4]})
416+ df['C'] = ['foo', 'bar'] * 2
417+ df
418+
419+ To show all values, the previous behavior:
420+
421+ .. ipython:: python
422+
423+ df.groupby(['A', 'B', 'C'], observed=False).count()
424+
425+
426+ To show only observed values:
427+
428+ .. ipython:: python
429+
430+ df.groupby(['A', 'B', 'C'], observed=True).count()
431+
432+ For pivotting operations, this behavior is *already* controlled by the ``dropna`` keyword:
433+
434+ .. ipython:: python
435+
436+ cat1 = pd.Categorical(["a", "a", "b", "b"],
437+ categories=["a", "b", "z"], ordered=True)
438+ cat2 = pd.Categorical(["c", "d", "c", "d"],
439+ categories=["c", "d", "y"], ordered=True)
440+ df = DataFrame({"A": cat1, "B": cat2, "values": [1, 2, 3, 4]})
441+ df
442+
443+ .. ipython:: python
444+
445+ pd.pivot_table(df, values='values', index=['A', 'B'],
446+ dropna=True)
447+ pd.pivot_table(df, values='values', index=['A', 'B'],
448+ dropna=False)
449+
450+
399451.. _whatsnew_0230.enhancements.other:
400452
401453Other Enhancements
@@ -527,68 +579,6 @@ If you wish to retain the old behavior while using Python >= 3.6, you can use
527579 'Taxes': -200,
528580 'Net result': 300}).sort_index()
529581
530- .. _whatsnew_0230.api_breaking.categorical_grouping:
531-
532- Categorical Groupers will now require passing the observed keyword
533- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
534-
535- In previous versions, grouping by 1 or more categorical columns would result in an index that was the cartesian product of all of the categories for
536- each grouper, not just the observed values.``.groupby()`` has gained the ``observed`` keyword to toggle this behavior. The default remains backward
537- compatible (generate a cartesian product). Pandas will show a ``FutureWarning`` if the ``observed`` keyword is not passed; the default will
538- change to ``observed=True`` in the future. (:issue:`14942`, :issue:`8138`, :issue:`15217`, :issue:`17594`, :issue:`8669`, :issue:`20583`)
539-
540-
541- .. ipython:: python
542-
543- cat1 = pd.Categorical(["a", "a", "b", "b"],
544- categories=["a", "b", "z"], ordered=True)
545- cat2 = pd.Categorical(["c", "d", "c", "d"],
546- categories=["c", "d", "y"], ordered=True)
547- df = pd.DataFrame({"A": cat1, "B": cat2, "values": [1, 2, 3, 4]})
548- df['C'] = ['foo', 'bar'] * 2
549- df
550-
551- ``observed`` must now be passed when grouping by categoricals, or a
552- ``FutureWarning`` will show:
553-
554- .. ipython:: python
555- :okwarning:
556-
557- df.groupby(['A', 'B', 'C']).count()
558-
559-
560- To suppress the warning, with previous Behavior (show all values):
561-
562- .. ipython:: python
563-
564- df.groupby(['A', 'B', 'C'], observed=False).count()
565-
566-
567- Future Behavior (show only observed values):
568-
569- .. ipython:: python
570-
571- df.groupby(['A', 'B', 'C'], observed=True).count()
572-
573- For pivotting operations, this behavior is *already* controlled by the ``dropna`` keyword:
574-
575- .. ipython:: python
576-
577- cat1 = pd.Categorical(["a", "a", "b", "b"],
578- categories=["a", "b", "z"], ordered=True)
579- cat2 = pd.Categorical(["c", "d", "c", "d"],
580- categories=["c", "d", "y"], ordered=True)
581- df = DataFrame({"A": cat1, "B": cat2, "values": [1, 2, 3, 4]})
582- df
583-
584- .. ipython:: python
585-
586- pd.pivot_table(df, values='values', index=['A', 'B'],
587- dropna=True)
588- pd.pivot_table(df, values='values', index=['A', 'B'],
589- dropna=False)
590-
591-
592582.. _whatsnew_0230.api_breaking.deprecate_panel:
593583
594584Deprecate Panel
0 commit comments