Improve docs on what the axis= kwarg does in individual functions/methods

### `axis=0` or `axis=1`, which is it?

I've always found it hard to remember which axis (`0/"index"` vs. `1/"columns"`) does what for various operations. I suppose some people find it intuitive, while others (like me) find it confusing and inconsistent.

Case in point, `DataFrame.sum` vs. `DataFrame.drop`: if I want **column sums**, I need `axis=0`...

```python
>>> import pandas as pd
>>> df = pd.DataFrame(dict(a=[1, 2, 3], b=[4, 5, 6]))
>>> df
   a  b
0  1  4
1  2  5
2  3  6
>>> df.sum(axis=0)
a     6
b    15
dtype: int64
```

... but if I want to **drop a column**, I need `axis=1`:

```python
>>> df.drop("a", axis=1)
   b
0  4
1  5
2  6
```

There's an analogous discrepancy in numpy (which is probably where pandas inherited it from?):

```python
>>> import numpy as np
>>> a = df.to_numpy()
>>> a
array([[1, 4],
       [2, 5],
       [3, 6]])
>>> np.sum(a, axis=0)  # sum columns
array([ 6, 15])
>>> np.delete(a, 0, axis=1)  # delete first column
array([[4],
       [5],
       [6]])
```

I just intuitively conceptualize these operations as working along the same axis, so it's hard for me to internalize that the value of the axis parameter is different in each case. Apparently, [I'm not the only person to find this confusing](https://www.sharpsightlabs.com/blog/numpy-axes-explained/) (quoting from the article: "For example, in the np.sum() function, the axis parameter behaves in a way that many people think is counter intuitive").

At the same time, I can imagine that some people find this behavior completely natural (at the very least those who designed the API). And I understand that changing this in pandas while keeping the status quo in numpy would introduce a (probably) worse inconsistency, so I'm not suggesting that.

### Suggestion for improvement

What I am suggesting is **reviewing the documentation of functions/methods using the `axis=` keyword argument** and (where applicable) **improving the description of what it controls in each case**. Pandas is typically used interactively, so documentation is easily accessible. If it contains useful hints on what each `axis` value does (and possibly why), it's not such a big problem if this behavior goes against some people's expectations.

### Examples

For example, based on the [current master docs](https://pandas-docs.github.io/pandas-docs-travis/), the [description of the `axis` parameter for `drop`](https://pandas-docs.github.io/pandas-docs-travis/reference/api/pandas.DataFrame.drop.html#pandas.DataFrame.drop) does a good job at this:

> axis : {0 or ‘index’, 1 or ‘columns’}, default 0
> Whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’).

This makes it reasonably clear to me that if I specify `0`, I'll be removing rows, whereas `1` will result in removing columns.

By contrast, the [description of the `axis` parameter for `sum`](https://pandas-docs.github.io/pandas-docs-travis/reference/api/pandas.DataFrame.sum.html#pandas.DataFrame.sum) is somewhat too generic:

> axis : {index (0), columns (1)}
> Axis for the function to be applied on.

Based on this, I could conclude (and have repeatedly done so) that if I want column sums, I need to "apply the function on columns", hence `axis=1` (which is wrong, cf. above).

A revised description could look something like the following:

> axis : {index (0), columns (1)}
> Whether to collapse the index (0 or ‘index’), resulting in column sums, or the columns (1 or ‘columns’), resulting in row sums.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve docs on what the axis= kwarg does in individual functions/methods #29203

`axis=0` or `axis=1`, which is it?

Suggestion for improvement

Examples

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Improve docs on what the axis= kwarg does in individual functions/methods #29203

Description

axis=0 or axis=1, which is it?

Suggestion for improvement

Examples

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`axis=0` or `axis=1`, which is it?