-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Improve docs on what the axis= kwarg does in individual functions/methods #29203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If you can break the request up into clear actionable items, we generally accept PRs to improve docs so sure.
Maybe helpful for you to think of it as the "sum of rows by column" |
Sure, I just wanted to provide a broader rationale first and see if changes along these lines feel acceptable at all :) I'll comb through the source code for places where the descriptions could be improved. Should I:
I hope I'll remember now, it's one of the reasons I decided to type up the issue -- to etch this into my brain once and for all, hopefully :) |
|
@dlukes You're definitely not the only one, I would love to see these changes in the docs. Have you created PRs on this issue yet? If there are still any open issues I'm happy to work on them! |
@marielledado I wasn't quite sure how to go forward with it (cf. the bullet points with questions in my previous post), and then life got in the way, so no, unfortunately, there's no active PR... If you have time to pick this up though, it would be great! |
@dlukes life always gets in the way 😌 but yes happy to pick this up and make suggestions on this issue page based on those bullet points. For good measure: Take! |
Hi @simonjayhawkins first time contributor here and relatively new to programming and OS contribution, I'd like to work on this issue but it's not clear to me from the contributing guidelines where I should edit documentation. Should I edit the docstring of the function ( |
the answer is yes, but I think the docstring for df.drop is clear on the usage. maybe there are other methods where it is less clear.
some other methods either use templates, inherit docstrings or inherit templates. |
Taking a look at the code, the functions like I guess the way forward is to define a way to put different text in the inherited templates for different methods? That way the description could be written for each method based on what makes sense. |
take |
axis=0
oraxis=1
, which is it?I've always found it hard to remember which axis (
0/"index"
vs.1/"columns"
) does what for various operations. I suppose some people find it intuitive, while others (like me) find it confusing and inconsistent.Case in point,
DataFrame.sum
vs.DataFrame.drop
: if I want column sums, I needaxis=0
...... but if I want to drop a column, I need
axis=1
:There's an analogous discrepancy in numpy (which is probably where pandas inherited it from?):
I just intuitively conceptualize these operations as working along the same axis, so it's hard for me to internalize that the value of the axis parameter is different in each case. Apparently, I'm not the only person to find this confusing (quoting from the article: "For example, in the np.sum() function, the axis parameter behaves in a way that many people think is counter intuitive").
At the same time, I can imagine that some people find this behavior completely natural (at the very least those who designed the API). And I understand that changing this in pandas while keeping the status quo in numpy would introduce a (probably) worse inconsistency, so I'm not suggesting that.
Suggestion for improvement
What I am suggesting is reviewing the documentation of functions/methods using the
axis=
keyword argument and (where applicable) improving the description of what it controls in each case. Pandas is typically used interactively, so documentation is easily accessible. If it contains useful hints on what eachaxis
value does (and possibly why), it's not such a big problem if this behavior goes against some people's expectations.Examples
For example, based on the current master docs, the description of the
axis
parameter fordrop
does a good job at this:This makes it reasonably clear to me that if I specify
0
, I'll be removing rows, whereas1
will result in removing columns.By contrast, the description of the
axis
parameter forsum
is somewhat too generic:Based on this, I could conclude (and have repeatedly done so) that if I want column sums, I need to "apply the function on columns", hence
axis=1
(which is wrong, cf. above).A revised description could look something like the following:
The text was updated successfully, but these errors were encountered: