DEPR: Some dropna behaviors in DataFrame.pivot_table #53521
Labels
Deprecate
Functionality to remove in pandas
Missing-data
np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Needs Discussion
Requires discussion from core team before further action
Reshaping
Concat, Merge/Join, Stack/Unstack, Explode
Currently
dropna
is used in four places withinDataFrame.pivot_table
:1, 2, and 4 were all implemented for crosstab, which is essentially a call to pivot_table.
The API docs for crosstab document the
dropna
argument as:The only other documentation in the API and User Guide mentions using
dropna=False
to include rows/columns for categorical data with missing categorical values.I think this is too much for a single Boolean argument to handle. I propose the following:
a. Add
cartesian_product=[True|False]
to pivot_table and crosstabb. Add
observed=[True|False]
to crosstab for use with categoricalsc. Deprecate behavior (1) (with dropna), (3), and (4) above. The user may do each of these by dropping null values from the input data if they so desire.
We can implement (c) without affecting the behavior of crosstab by changing the data there to be a mixture of null/non-null values depending on the input and using the aggregation
count
instead oflen
.The text was updated successfully, but these errors were encountered: