DEPR: Some dropna behaviors in DataFrame.pivot_table

Currently `dropna` is used in four places within `DataFrame.pivot_table`:

 1. It takes the cartesian product of all index/column levels when there are multiple levels; this was [the original use](https://github.com/pandas-dev/pandas/commit/2d63a71d1526e6f325c64cd432e046c7532d19ed)
 2. [It is passed through to groupby](https://github.com/pandas-dev/pandas/commit/3cfd8685ee7fddba819932c92884427ee9a78866)
 3. After the groupby aggregation, [any rows that are all null are dropped](https://github.com/pandas-dev/pandas/issues/21133)
 4. When computing the margins, rows in the original data where the keys and values are all null [are dropped](https://github.com/pandas-dev/pandas/commit/f7faee0865d682c81f07134570b863e7e1d75f85#diff-40ee21c396f3aa4952234c175ecd5bad97097b08e9c4b0ba3fa212216446bc42)

1, 2, and 4 were all implemented for crosstab, which is essentially a call to pivot_table.

The API docs for crosstab document the `dropna` argument as:

> Do not include columns whose entries are all NaN.

The only other documentation in the API and User Guide mentions using `dropna=False` to include rows/columns for categorical data with missing categorical values.

I think this is too much for a single Boolean argument to handle. I propose the following:

  a. Add `cartesian_product=[True|False]` to pivot_table and crosstab
  b. Add `observed=[True|False]` to crosstab for use with categoricals
  c. Deprecate behavior (1) (with dropna), (3), and (4) above. The user may do each of these by dropping null values from the input data if they so desire.

We can implement (c) without affecting the behavior of crosstab by changing the data there to be a mixture of null/non-null values depending on the input and using the aggregation `count` instead of `len`.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DEPR: Some dropna behaviors in DataFrame.pivot_table #53521

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

DEPR: Some dropna behaviors in DataFrame.pivot_table #53521

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions