API: Consolidate groupby as_index and group_keys

Everything in this issue also applies to `Series.groupby` and `SeriesGroupBy`; I will just be writing it for `DataFrame`.

Currently `DataFrame.groupby` have two arguments that are essentially for the same thing:

 - `as_index`: Whether to include the group keys in the index or, when the groupby is done on column labels (see [#49519](https://github.com/pandas-dev/pandas/issues/49519)), in the columns.
 - `group_keys`: Whether to include the group keys in the index when calling `DataFrameGroupBy.apply`.

`as_index` only applies to reductions, `group_keys` only applies to `apply`. I think this is confusing and unnecessarily restrictive.

I propose we

 - Deprecate both `as_index` and `group_keys`
 - Add `keys_axis` to both `DataFrame.groupby` and `DataFrameGroupBy.apply`; these take the same arguments, the only difference is that the value in `DataFrameGroupBy.apply`, if specified, overrides the value in `DataFrame.groupby`.

`keys_axis` can accept the following values:

 - "infer" (the default): One of the following behaviors, inferred from the computation depending on if it is a reduction, transform, or filter.
 - "index" or 0: Add the keys to the index (similar to `as_index=True` or `group_keys=False`)
 - "columns" or 1: Add the keys to the columns (similar to `as_index=False`)
 - "none": Don't add the keys to either the index nor the columns. For pandas methods (e.g. `sum`, `cumsum`, `head`), reductions will return a `RangeIndex`, transforms and filters will behave as they do today returning the input's index or a subset of it for a filter. For `apply`, this will behave the same as `group_keys=False` today.

Unlike `as_index`, this argument will be respected in all groupby functions whether they be reductions, transforms, or filters.

Path to implementation:
 - Add `keys_axis` in 2.0, and either add a PendingDeprecationWarning or a DeprecationWarning to as_index / group_keys
 - Change warnings for as_index / group_keys to a FutureWarning in 2.1
 - Enforce depredations in 3.0

A few natural questions come to mind:

1. Why introduce a new argument, why not keep either `as_index` or `group_keys`?

Currently these arguments are Boolean, the new argument needs to accept more than two values where the name reflects that it is accepting an `axis`. Also, adding a new argument provides a cleaner and more gradual path for deprecation.

2. Why add `group_keys` to `DataFrameGroupBy.apply`?

In other groupby methods, we can reliably use `keys_axis="infer"` to determine the correct placement of the keys. However in apply, it is inferred from the output, and various cases can coincide - e.g. a reduction and transformation on a DataFrame with a single row. We want the user to be able to use "infer" on other groupby methods, but be able to specify how their UDF in apply acts. E.g.

```
gb = df.groupby(["a", "b"], keys_axis="infer")
print(gb.sum())  # Act as a reduction
print(gb.head())  # Act as a filter
print(gb.cumsum())  # Act as a transform
print(gb.apply(my_udf, keys_axis="index"))  # infer from the groupby call is not reliable here, allow user to specify how apply should act
```

3. Why should `keys_axis` accept the value `"none"`?

This is currently how transforms and filters work - where the keys are added to neither the index nor the columns. We need to keep the ability to specify to `groupby(...).apply` that the UDF they are provided acts as a transform or filter.

4. Why not name the argument `group_keys_axis`?

I find "group" here redundant, but would be fine with this name too, and happy to consider other potential names.

cc @pandas-dev/pandas-core @pandas-dev/pandas-triage 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API: Consolidate groupby as_index and group_keys #49543

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

API: Consolidate groupby as_index and group_keys #49543

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions