-
-
Notifications
You must be signed in to change notification settings - Fork 19.2k
Description
Pandas version checks
- I have checked that the issue still exists on the latest versions of the docs on
mainhere
Location of the documentation
https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#splitting-an-object-into-groups
Documentation problem
The user guide says:
A string passed to groupby may refer to either a column or an index level. If a string matches both a column name and an index level name, a ValueError will be raised.
So I can wither a column or an index level.
But there seems to be no mention of this in the API reference:
If a list or ndarray of length equal to the selected axis is passed (see the groupby user guide), the values are used as-is to determine the groups.
It talks about the "selected axis", and the axis is selected with the axis parameter.
So it looks like if I pass a column name with axis=0, pandas will "change" that into axis=1?
Although it is a deprecated parameter, until it is removed the documentation of groupby should make clear what is happening, and not talk about the "selected axis".
In fact, in the groupby user guide, it says.
To split by columns, first do a transpose:
def get_letter_type(letter):
if letter.lower() in 'aeiou':
return 'vowel'
else:
return 'consonant'
grouped = df.T.groupby(get_letter_type)
And if I try:
grouped = df.groupby(get_letter_type)
It doesn't work.
So it looks like pandas "understand" the axis automatically with list of strings, but not if you pass a function.
Suggested fix for documentation
Make clear how pandas is selecting the axis on which to split until the axis parameter is removed.