@@ -91,10 +91,10 @@ The mapping can be specified many different ways:
9191 - A Python function, to be called on each of the axis labels.
9292 - A list or NumPy array of the same length as the selected axis.
9393 - A dict or ``Series ``, providing a ``label -> group name `` mapping.
94- - For ``DataFrame `` objects, a string indicating a column to be used to group.
94+ - For ``DataFrame `` objects, a string indicating a column to be used to group.
9595 Of course ``df.groupby('A') `` is just syntactic sugar for
9696 ``df.groupby(df['A']) ``, but it makes life simpler.
97- - For ``DataFrame `` objects, a string indicating an index level to be used to
97+ - For ``DataFrame `` objects, a string indicating an index level to be used to
9898 group.
9999 - A list of any of the above things.
100100
@@ -120,7 +120,7 @@ consider the following ``DataFrame``:
120120 ' D' : np.random.randn(8 )})
121121 df
122122
123- On a DataFrame, we obtain a GroupBy object by calling :meth: `~DataFrame.groupby `.
123+ On a DataFrame, we obtain a GroupBy object by calling :meth: `~DataFrame.groupby `.
124124We could naturally group by either the ``A `` or ``B `` columns, or both:
125125
126126.. ipython :: python
@@ -360,8 +360,8 @@ Index level names may be specified as keys directly to ``groupby``.
360360 DataFrame column selection in GroupBy
361361~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
362362
363- Once you have created the GroupBy object from a DataFrame, you might want to do
364- something different for each of the columns. Thus, using ``[] `` similar to
363+ Once you have created the GroupBy object from a DataFrame, you might want to do
364+ something different for each of the columns. Thus, using ``[] `` similar to
365365getting a column from a DataFrame, you can do:
366366
367367.. ipython :: python
@@ -421,7 +421,7 @@ statement if you wish: ``for (k1, k2), group in grouped:``.
421421Selecting a group
422422-----------------
423423
424- A single group can be selected using
424+ A single group can be selected using
425425:meth: `~pandas.core.groupby.DataFrameGroupBy.get_group `:
426426
427427.. ipython :: python
@@ -444,8 +444,8 @@ perform a computation on the grouped data. These operations are similar to the
444444:ref: `aggregating API <basics.aggregate >`, :ref: `window functions API <stats.aggregate >`,
445445and :ref: `resample API <timeseries.aggregate >`.
446446
447- An obvious one is aggregation via the
448- :meth: `~pandas.core.groupby.DataFrameGroupBy.aggregate ` or equivalently
447+ An obvious one is aggregation via the
448+ :meth: `~pandas.core.groupby.DataFrameGroupBy.aggregate ` or equivalently
449449:meth: `~pandas.core.groupby.DataFrameGroupBy.agg ` method:
450450
451451.. ipython :: python
@@ -517,12 +517,12 @@ Some common aggregating functions are tabulated below:
517517 :meth: `~pd.core.groupby.DataFrameGroupBy.nth `;Take nth value, or a subset if n is a list
518518 :meth: `~pd.core.groupby.DataFrameGroupBy.min `;Compute min of group values
519519 :meth: `~pd.core.groupby.DataFrameGroupBy.max `;Compute max of group values
520-
521520
522- The aggregating functions above will exclude NA values. Any function which
521+
522+ The aggregating functions above will exclude NA values. Any function which
523523reduces a :class: `Series ` to a scalar value is an aggregation function and will work,
524524a trivial example is ``df.groupby('A').agg(lambda ser: 1) ``. Note that
525- :meth: `~pd.core.groupby.DataFrameGroupBy.nth ` can act as a reducer *or * a
525+ :meth: `~pd.core.groupby.DataFrameGroupBy.nth ` can act as a reducer *or * a
526526filter, see :ref: `here <groupby.nth >`.
527527
528528.. _groupby.aggregate.multifunc :
@@ -732,7 +732,7 @@ and that the transformed data contains no NAs.
732732 .. note ::
733733
734734 Some functions will automatically transform the input when applied to a
735- GroupBy object, but returning an object of the same shape as the original.
735+ GroupBy object, but returning an object of the same shape as the original.
736736 Passing ``as_index=False `` will not affect these transformation methods.
737737
738738 For example: ``fillna, ffill, bfill, shift. ``.
@@ -926,7 +926,7 @@ The dimension of the returned result can also change:
926926
927927 In [11]: grouped.apply(f)
928928
929- ``apply `` on a Series can operate on a returned value from the applied function,
929+ ``apply `` on a Series can operate on a returned value from the applied function,
930930that is itself a series, and possibly upcast the result to a DataFrame:
931931
932932.. ipython :: python
@@ -984,20 +984,48 @@ will be (silently) dropped. Thus, this does not pose any problems:
984984
985985 df.groupby(' A' ).std()
986986
987- Note that ``df.groupby('A').colname.std(). `` is more efficient than
987+ Note that ``df.groupby('A').colname.std(). `` is more efficient than
988988``df.groupby('A').std().colname ``, so if the result of an aggregation function
989- is only interesting over one column (here ``colname ``), it may be filtered
989+ is only interesting over one column (here ``colname ``), it may be filtered
990990*before * applying the aggregation function.
991991
992+ .. _groupby.observed :
993+
994+ Handling of (un)observed Categorical values
995+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
996+
997+ When using a ``Categorical `` grouper (as a single or as part of multipler groupers), the ``observed `` keyword
998+ controls whether to return a cartesian product of all possible groupers values (``observed=False ``) or only those
999+ that are observed groupers (``observed=True ``).
1000+
1001+ Show all values:
1002+
1003+ .. ipython :: python
1004+
1005+ pd.Series([1 , 1 , 1 ]).groupby(pd.Categorical([' a' , ' a' , ' a' ], categories = [' a' , ' b' ]), observed = False ).count()
1006+
1007+ Show only the observed values:
1008+
1009+ .. ipython :: python
1010+
1011+ pd.Series([1 , 1 , 1 ]).groupby(pd.Categorical([' a' , ' a' , ' a' ], categories = [' a' , ' b' ]), observed = True ).count()
1012+
1013+ The returned dtype of the grouped will *always * include *all * of the catergories that were grouped.
1014+
1015+ .. ipython :: python
1016+
1017+ s = pd.Series([1 , 1 , 1 ]).groupby(pd.Categorical([' a' , ' a' , ' a' ], categories = [' a' , ' b' ]), observed = False ).count()
1018+ s.index.dtype
1019+
9921020 .. _groupby.missing :
9931021
9941022NA and NaT group handling
9951023~~~~~~~~~~~~~~~~~~~~~~~~~
9961024
997- If there are any NaN or NaT values in the grouping key, these will be
998- automatically excluded. In other words, there will never be an "NA group" or
999- "NaT group". This was not the case in older versions of pandas, but users were
1000- generally discarding the NA group anyway (and supporting it was an
1025+ If there are any NaN or NaT values in the grouping key, these will be
1026+ automatically excluded. In other words, there will never be an "NA group" or
1027+ "NaT group". This was not the case in older versions of pandas, but users were
1028+ generally discarding the NA group anyway (and supporting it was an
10011029implementation headache).
10021030
10031031Grouping with ordered factors
@@ -1084,8 +1112,8 @@ This shows the first or last n rows from each group.
10841112Taking the nth row of each group
10851113~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
10861114
1087- To select from a DataFrame or Series the nth item, use
1088- :meth: `~pd.core.groupby.DataFrameGroupBy.nth `. This is a reduction method, and
1115+ To select from a DataFrame or Series the nth item, use
1116+ :meth: `~pd.core.groupby.DataFrameGroupBy.nth `. This is a reduction method, and
10891117will return a single row (or no row) per group if you pass an int for n:
10901118
10911119.. ipython :: python
@@ -1153,7 +1181,7 @@ Enumerate groups
11531181.. versionadded :: 0.20.2
11541182
11551183To see the ordering of the groups (as opposed to the order of rows
1156- within a group given by ``cumcount ``) you can use
1184+ within a group given by ``cumcount ``) you can use
11571185:meth: `~pandas.core.groupby.DataFrameGroupBy.ngroup `.
11581186
11591187
@@ -1273,7 +1301,7 @@ Regroup columns of a DataFrame according to their sum, and sum the aggregated on
12731301Multi- column factorization
12741302~~~~~~~~~~~~~~~~~~~~~~~~~~
12751303
1276- By using :meth:`~ pandas.core.groupby.DataFrameGroupBy.ngroup` , we can extract
1304+ By using :meth:`~ pandas.core.groupby.DataFrameGroupBy.ngroup` , we can extract
12771305information about the groups in a way similar to :func:`factorize` (as described
12781306further in the :ref:`reshaping API < reshaping.factorize> ` ) but which applies
12791307naturally to multiple columns of mixed type and different
0 commit comments