Reductions

Next are listed the reductions over numerical types defined in pandas. These can be applied:

- To `Series`
- To N columns of a `DataFrame`
- To group by operations
- As window functions (window, rolling, expanding or ewm)
- In resample operations

pandas is not consistent, in letting any reduction to be applied to any of the above. Each method is
independent (`Series.sum`, `GroupBy.sum`, `Window.sum`...). Some reductions are not implemented for
some of the classes. And the signatures can change (e.g. `Series.var(ddof)` vs `EWM.var(bias)`)

I propose to have standard signatures for the reductions, and have all reductions available to all classes.

## Reductions for numerical data types and proposed signatures

- `all()`
- `any()`
- `count()` 
- `nunique()`  # may be the name could be `count_unique`, `count_distinct`...?
- `mode()`  # what to do if there is more than one mode? Ideally we would like all reductions to return a scalar
- `min()`
- `max()`
- `median()`
- `quantile(q, interpolation='linear')`  # in pandas `q` is by default `0.5`, but I think it's better to require it; interpolation can be {‘linear’, ‘lower’, ‘higher’, ‘midpoint’, ‘nearest’}
- `sum()`
- `prod()`
- `mean()`
- `var(ddof=1)`  # delta degrees of freedom (for some classes `bias` is used)
- `std(ddof=1)`
- `skew()`
- `kurt()`  # pandas has also the alias `kurtosis`
- `sem(ddof=1)`  # standard error of the mean
- `mad()`  # mean absolute deviation
- `autocorr(lag=1)`
- `is_unique()`  # in pandas is a property
- `is_monotonic()`  # in pandas is a property
- `is_monotonic_decreasing()`  # in pandas is a property
- `is_monotonic_increasing()`  # in pandas is a property

Reductions that may depend on row labels (and could potentially return a list, like `mode`):

- `idxmax()` / `argmax()`
- `idxmin()` / `argmin()`

These need an extra column `other`:

- `cov(other, ddof=1)`
- `corr(other, method='pearson')`  # method can be {‘pearson’, ‘kendall’, ‘spearman’}


## Questions

- Allow reductions over rows, or only over columns?
- What to do with NA?
- pandas has parameters (`bool_only`, `numeric_only`) to let only apply the operation over columns of certain types only. Do we want it?
  - I think something like `df.select_columns_by_dtype(int).sum()` would be preferrable than a parameter to all or some reductions
- pandas has a `level` parameter in many reductions, for MultiIndex. If Indexing/MultiIndexing is part of the API, do we want to have it?
- pandas has a `min_count`/`min_periods` parameter in some reductions (e.g. `sum`, `min`), to return `NA` if less than `min_count` values are present. Do we want to keep it?
- How should reductions be applied?
  - In the top-level namespace, as pandas (e.g. `df[col].sum()`)
  - Using an accessor (e.g. `df[col].reduce.sum()`)
  - Having a `reduce` function, and passing the specific functions as a parameter (e.g. `df[col].reduce(sum)`)
  - Other ideas
- Would it make sense to have a third-party package implementing reductions that can be reused by projects?

## Frequency of usage

![pandas_reductions](https://user-images.githubusercontent.com/10058240/83822656-8c421600-a6c9-11ea-93e9-db074b8f5755.png)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reductions #11

Reductions for numerical data types and proposed signatures

Questions

Frequency of usage

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reductions #11

Description

Reductions for numerical data types and proposed signatures

Questions

Frequency of usage

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions