Skip to content

DOC: add basic documentation to some of the GroupBy properties and methods #5459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 27, 2013

Conversation

dwiel
Copy link
Contributor

@dwiel dwiel commented Nov 7, 2013

I added some simple documentation to the GroupBy and Grouper classes

@rockg
Copy link
Contributor

rockg commented Nov 7, 2013

@dwiel I think you should add this to the API documentation as well.

@@ -693,6 +711,7 @@ def apply(self, f, data, axis=0):

@cache_readonly
def indices(self):
""" dict {group name -> group indices """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing the second }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doh, looks like the same thing happened on the GroupBy.indicies docstring as well. Just pushed a fix for both of them

@dwiel
Copy link
Contributor Author

dwiel commented Nov 7, 2013

@rockg, is there documentation somewhere about how to do that? I've looked around a little bit, but I'm not finding anything.

@jtratner
Copy link
Contributor

jtratner commented Nov 7, 2013

look at api.rst

.. autosummary::
:toctree: generated/

GroupBy.__iter__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the GroupBy object is available in the pandas namespace. I think you will have to add something like .. currentmodule:: pandas.core.groupby

@cancan101
Copy link
Contributor

This might be a related issue: #4500

@dwiel
Copy link
Contributor Author

dwiel commented Nov 7, 2013

yeah, #4500 deffinitely looks related, though the bulk of the conversation is more about automatic missing documentation finding rather than the original topic of lacking GroupBy documentation.

I left GroupBy.ngroups undocumented because it always gives the same result as len, but the code paths are different, and it wasn't clear to me why.

@jorisvandenbossche
Copy link
Member

Super that you wan't to work on this! Docs on these groupby things are really a gap in the documentation at the moment IMHO..

I also have been thinking about it some time ago, and I think there are some issues to pay attention to:

  • Not all methods are implemented as methods from the Groupby class, also some in DataFrameGroupby and SeriesGroupby (eg aggregate) or NDFrameGroupby. This can make it complex to the user.
  • These methods appear as methods of the GroupBy (or other as DataFrameGrouBy etc) object. I think it has to be clearly stated that such an object is the result of a df.groupby() call and that these methods are applicable on that. Maybe some lines can be added to api.rst to explain that.
  • Another thing that makes it complicated, is that actually also other Series/DataFrame methods can be applied on a groupby method. Eg. the 'missing' doc on the nunique function as mentioned in Docs for lurking groupby methods #4500 is actually not really missing, as this is just a Series method (so adding these all to the docs is not really possible, but a better explanation of this in the narrative groupby docs can of course help a lot).

But it is certainly a good start!

Something else (but that is a broader discussion), the mentioning of NDFrame is also somewhat problematic I think, as nowhere in the docs it is really explained what this is (as far as I know?). But this is something that maybe needs some seperate discussion, as it also appears in other parts of the docs, certainly now more functions are made 'generic' and not seperately implemented for Series and DataFrame.

@jreback
Copy link
Contributor

jreback commented Nov 7, 2013

@dwiel #4887 is the issue which esentially whitelisted methods that can be used on groupby
(what @jorisvandenbossche mentioned above). These are created as wrapper methods so they don't explicity live in groupby (except the cythonized versions of some of these), others are delegated to the underlying object. This is why its 'magical' why some things just work!

@jorisvandenbossche
Copy link
Member

@dwiel Do you have time to finish this?
My main comment on this is to add some explanation in api.rst that the GroupBy object is the result of a df.groupby() for clarity, and you should also squash your commits.

If you have some more time, the GroupBy.aggregate and GroupBy.transform methods you added to api.rst don't have a docstring yet. But that could also be something for a another PR.

@dwiel
Copy link
Contributor Author

dwiel commented Nov 16, 2013

I've added an explanation to api.rst and squashed the commits. I'm not entirely sure how to get the comments into the api.rst since the doc string is there, its just loaded with the Appender decorator. The docstring shows up in ipython help for example. If you would rather I remove them from the api.rst for now since it isn't picking them up, that is fine.

I haven't made many contributions to Open Source projects yet, but your previous comment @jorisvandenbossche was the first appreciative comment I've received, and it was very welcome. Thanks!

-------
.. currentmodule:: pandas.core.groupby

**GroupBy objects are returned by groupby calls: DataFrame.groupby, Series.groupby, etc.**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make DataFrame.groupby link to the rendered docstring of groupby by using :func:pandas.DataFrame.groupby``. However, this will not work in the bold text environment ** ... **.

@jorisvandenbossche
Copy link
Member

And you are thanked for your contribution!

I added one more comment.
The problem with aggregate and transform docstrings not appearing in the html builds, while you see them in ipython help, is because it are methods of DataFrameGroupBy instead of the generic GroupBy (like I already commented earlier as a difficulty to document these functions).

…thods

DOC: add GroupBy class to api.rst
DOC: add import for GroupBy object to api.rst
DOC: add GroupBy.aggregate and GroupBy.transform to api.rst
DOC: add meta comment to api.rst about when GroupBy objects are created
DOC: add links to api.rst
@jreback
Copy link
Contributor

jreback commented Nov 27, 2013

@jorisvandenbossche can you review?

@jorisvandenbossche
Copy link
Member

@jreback I think my most important comments are adressed, and the others can wait for later PRs (I will turn them in some issues).
The only thing I am not certain about is that now GroupBy.aggregate/transform is listed in api.rst, but if you click on it to go the docstring page, it is empty (as it is actually DataFrame/SeriesGroupBy.aggregate that has a docstring in the code).

@jreback
Copy link
Contributor

jreback commented Nov 27, 2013

ok.....will merge this then

jreback added a commit that referenced this pull request Nov 27, 2013
DOC: add basic documentation to some of the GroupBy properties and methods
@jreback jreback merged commit 6e86c2c into pandas-dev:master Nov 27, 2013
@jtratner
Copy link
Contributor

@dwiel thanks for this! (and what a bummer that you didn't get positive
feedback on other contributions!)

@dwiel
Copy link
Contributor Author

dwiel commented Nov 27, 2013

The feedback is appreciated, I was getting worried that I wasn't actually helping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants