-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Adding guide for the pandas documentation sprint #19704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Adding guide for the pandas documentation sprint #19704
Conversation
numpydoc recommends avoiding "obvious" imports and importing them with | ||
aliases, so for example `import numpy as np`. While this is now an standard | ||
in the data ecosystem of Python, it doesn't seem a good practise, for the | ||
next reasons: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fully acknowledging that I'm bike-shedding...but I really disagree with this. Personally think aliasing is a great compromise between import *
(like R, etc), while still having enough brevity for interactive workflows. Also conflicts with majority of pandas/numpy code in the wild.
I also agree with the numpydoc suggestion to avoid obvious imports - I suspect the first most common use of docstrings is inside a repl/notebook/etc - showing the imports adds noise in that context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the aliasing (import pandas as pd
, import numpy as np
), I agree with @chris-b1. It's something new users will have to learn indeed, but it's something they will need to learn anyhow, as almost any code you will see online uses those aliases.
Also conflicts with majority of pandas/numpy code in the wild.
That's not my perception, but maybe I am a bit biased by my environment :)
About whether we want to show the imports or not, here I am more open to be convinced otherwise, although I also find having those two imports everywhere adds noise.
Apart from my one comment, I think this is great and appreciate what you've done to pull it together! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together.
|
||
The short summary must start with a verb infinitive, end with a dot, and fit in | ||
a single line. It needs to express what the function does without providing | ||
details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want a length limit here, so the lines in http://pandas-docs.github.io/pandas-docs-travis/api.html don't wrap. Though it looks like we can't set a hard limit, since the width available depends on the section...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't really care about the "verb infinitive" part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't solve the wrapping issue you mention but I wonder if either here or in the General section we should make note that comments still need to wrap at 79 characters for PEP-8 compliance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the line length, and seems that for the rendering in autosummaries something around 60 to 75 characters is the maximum length to avoid wrapping in. The document currently states to fit in a single line, which with PEP-8 line length means 76 characters for functions and 72 for methods. Unless you have a better idea, I'll leave like it is, as I think it's much easier for people to write a single line, than to count the characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the infinitive verb, it's a standard used in Python projects in investment banking. I think it helps people write concise summaries. All functions/methods do things, so starting with a verb always should make sense, and it avoids starts like "This function [...]", "Method to [...]". Being infinitive is just to standardize "Generates [...]", "Generating [...]" and "Generate [...]" to always have the same form. Not sure if any other reason was used besides the infinitive being shorter, but unless you really want to get rid of this rule, I'll keep it as it's used in investment banks.
- tuple of (str, int, int) | ||
- set of {str} | ||
|
||
In case there are just a set of values allowed, list them in curly brackets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specify that the default value, if any, is listed first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to require the curly braces? At the moment we don't really use it that much I think.
I also like a more explicit "default value" than just relying on the fact it is listed first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the default value, and that it goes first in a list of options.
Regarding curly brackets, didn't realize they're not being used. But that's part of the numpy convention "When a parameter can only assume one of a fixed set of values, those values can be listed in braces, with the default appearing first". Leaving that, unless you want to change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the default, I think the rule that the default is the first is rather obscure to readers. I am also not sure if the first place is necessarily the best. Eg for the hypothetical example below of {0, 10, 25}
, I would rather list them in numerical order even if 0 is not the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @jorisvandenbossche : again, we document the default value quite consistently already
- pandas.DataFrame | ||
|
||
If more than one type is accepted, separate them by commas, except the | ||
last two types, that need to be separated by the word 'or': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh do we get to bikeshed about using serial commas? :)
Section 5: See also | ||
~~~~~~~~~~~~~~~~~~~ | ||
|
||
This is an optional section, used to let users know about pandas functionality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily just pandas. We do "See Also" to other packages.
Specify that if you're referring to a method in another package you need the package name, like numpy.where
, not np.where
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's strictly optional, but I would maybe add "optional but strongly recommended section"
|
||
The way to present examples is as follows: | ||
|
||
1. Import required libraries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to import, since we import them in our doctest setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally we assume the imports
import numpy as np
import pandas as pd
other imports should be done explicitly
|
||
3. Show a very basic example that gives an idea of the most common use case | ||
|
||
4. Add commented examples that illustrate how the parameters can be used for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a different word than "commented", since people may interpret that as lines starting with #
example in the head method, where it requires to be higher than 5, to show | ||
the example with the default values. | ||
|
||
Avoid using data without interpretation, like a matrix of random numbers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an issue about using common, meaningful datasets for these. Maybe we can make a decision on that before the sprint (will try to find link later).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also remind such a discussion, but didn't find an open issue, only a discussion in a PR (with mention of gitter), so opened a new issue: #19710
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also make a kind of "summary checklist"
documents that explain this convention: | ||
|
||
- `Guide to NumPy/SciPy documentation <https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt>`_ | ||
- `numpydoc docstring guide <http://numpydoc.readthedocs.io/en/latest/format.html>`_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is enough to only link to this last one, normally it should contain everything that is in the first (they only recently made that doc page)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if the last one should contain the same as the first, but with a better presentation, I think it's good that people is aware of the first document. People is not really expected to read or follow them, they're presented here just for reference. I'll leave both unless you really feel we should get rid of the first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think it will just confuse people by giving 2 links. If there are two, I expect that somehow it is useful that I look at both of them, but then left wondering what is the difference, because the content is almost exactly the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I left just the numpy doc one in the list, and kept the other just in a comment. Let me know if you think it's still worthless to have it this way.
- tuple of (str, int, int) | ||
- set of {str} | ||
|
||
In case there are just a set of values allowed, list them in curly brackets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to require the curly braces? At the moment we don't really use it that much I think.
I also like a more explicit "default value" than just relying on the fact it is listed first
If the type is a pandas type, also specify pandas: | ||
|
||
- pandas.Series | ||
- pandas.DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that for those two just "Series" and "DataFrame" is enough? (otherwise it can become quite lengthy)
|
||
If the type is in a package, the module must be also specified: | ||
|
||
- numpy.ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, we now often say something like "array-like", when both lists and arrays are allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we actually have many cases in the user facing functions where we require a numpy array (I mean, where we don't accept a list as well, and thus we would not use 'array' or 'array-like' in general)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, although we could maybe have some place in the docs where we clearly define what an "array-like" is (e.g. tuples aren't)... and maybe even refer to it with a footnote?
- str or list of str | ||
|
||
If None is one of the accepted values, it always needs to be the last in | ||
the list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should discuss how we want to have the "notion" of optional:
- int or float, optional
- in, float or None
- int or float, default None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote option 3. I'd give a second place nod to option 1, but optional
and default None
are essentially the same thing. It seems like an unnatural rule to enforce that default 'foo'
is the rule when a keyword has a non-None
default argument but optional
is the route for None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like an unnatural rule to enforce that default 'foo' is the rule when a keyword has a non-None default argument but optional is the route for None
Having a different way to describe it, can also signal that it actually is something different in practice. In (many, but not all) cases, a value of None really means that it is not specified and is optional (like method=None
in fillna
, because the default is to fill with a fixed value and not to use a forward or backward filling method), in constrast with other keywords that have a default value (like skipna=True
. For users it 'feels' like optional because you typically don't need to specify it, but it is not optional)
|
||
Examples in docstrings are also unit tests, and besides illustrating the | ||
usage of the function or method, they need to be valid Python code, that in a | ||
deterministic way returns the presented output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would maybe not explicitly say "unit tests" (strictly spoken they also aren't at the moment, as we don't run doctests), but we want it be correct python syntax because people can copy paste it to interact themselves with the example or to reproduce the example?
|
||
The way to present examples is as follows: | ||
|
||
1. Import required libraries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally we assume the imports
import numpy as np
import pandas as pd
other imports should be done explicitly
Examples | ||
-------- | ||
>>> import pandas | ||
>>> s = pandas.Series(['Ant', 'Bear', 'Cow', 'Dog', 'Falcon', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas -> pd
numpydoc recommends avoiding "obvious" imports and importing them with | ||
aliases, so for example `import numpy as np`. While this is now an standard | ||
in the data ecosystem of Python, it doesn't seem a good practise, for the | ||
next reasons: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the aliasing (import pandas as pd
, import numpy as np
), I agree with @chris-b1. It's something new users will have to learn indeed, but it's something they will need to learn anyhow, as almost any code you will see online uses those aliases.
Also conflicts with majority of pandas/numpy code in the wild.
That's not my perception, but maybe I am a bit biased by my environment :)
About whether we want to show the imports or not, here I am more open to be convinced otherwise, although I also find having those two imports everywhere adds noise.
in the data ecosystem of Python, it doesn't seem a good practise, for the | ||
next reasons: | ||
|
||
* The code is not executable anymore (as doctests for example) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When running doctests with pytest, numpy and pandas will always be imported automatically
BTW, we should certainly keep this as a separate document, the contributing.rst is already long enough (we should rather start splitting more of our long doc pages in separate pieces IMO) |
so programmers can understand what it does without having to read the details | ||
of the implementation. | ||
|
||
Also, it is a commonn practice to generate online (html) documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
common
All comments should be addressed in this new version, except in the cases where I added a comment to the review. And also, it's pending the part on the standard datasets (#19710). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some further comments.
Another question, with your docstring_validation script, would it be easy to print out all current type descriptions? (to have a quick overview of what we currently have, and to know which ones are useful to include here in the docs or which ones we need to discuss to have a consistent usage)
description in this case would be "Description of the arg (default is X).". In | ||
some cases it may be useful to explain what the default argument means, which | ||
can be added after a comma "Description of the arg (default is -1, which means | ||
all cpus).". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, a very frequently occurring pattern to list the default is in the types, like color:, str, default 'blue'
or copy : boolean, default True
(I think we even do it relatively consistently)
I personally think we would like to keep this. Often the description can be quite long. Having it at the end of the type description gives it a prominent and consistent place to find it.
See eg the how
keyword in https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.join.html#pandas.DataFrame.join where the description is a list of the different possible values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also eg https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.take.html where it is done consistently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the default after the description (and not the type), and having the default first if it's a set like {0, 10, 25}, is in the numpy docstring convention. I agree with you in both cases, it can be clearer having the default after the type, and the options in a consistent order. Just pointing out where these come from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also like the current way, and our users might be used to it (I certainly am)
|
||
- int | ||
- float | ||
- str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add dict, list and tuple as other often occurring types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and boolean (or bool), whichever of the two we converge on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding bool. dict, list and tuple are already documented in the next block.
- tuple of (str, int, int) | ||
- set of {str} | ||
|
||
In case there are just a set of values allowed, list them in curly brackets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the default, I think the rule that the default is the first is rather obscure to readers. I am also not sure if the first place is necessarily the best. Eg for the hypothetical example below of {0, 10, 25}
, I would rather list them in numerical order even if 0 is not the default.
|
||
If the type is in a package, the module must be also specified: | ||
|
||
- numpy.ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we actually have many cases in the user facing functions where we require a numpy array (I mean, where we don't accept a list as well, and thus we would not use 'array' or 'array-like' in general)
|
||
For complex types, define the subtypes: | ||
|
||
- list of [int] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is also something we are currently not doing?
I am not sure if I find list of [int]
better than list of int
(for the dict and tuple it can be more illustrative)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just trying to be consistent for list. I agree, in dict and tuple I think it adds value.
Probably not something it'd ever happen, but it could be clearer:
list of [dict of {int: str}]
than
list of dict of {int: str}
But happy with list of int
too.
- str or list of str | ||
|
||
If None is one of the accepted values, it always needs to be the last in | ||
the list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like an unnatural rule to enforce that default 'foo' is the rule when a keyword has a non-None default argument but optional is the route for None
Having a different way to describe it, can also signal that it actually is something different in practice. In (many, but not all) cases, a value of None really means that it is not specified and is optional (like method=None
in fillna
, because the default is to fill with a fixed value and not to use a forward or backward filling method), in constrast with other keywords that have a default value (like skipna=True
. For users it 'feels' like optional because you typically don't need to specify it, but it is not optional)
think about what can be useful for the users reading the documentation, | ||
especially the less experienced ones. | ||
|
||
When relating to other methods (mainly `numpy`), use the name of the module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"When relating to other methods" -> was this intended to be "other libraries" or "other modules"?
|
||
Return | ||
------ | ||
pandas.Series |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas.Series -> Series
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this linked anywhere from the contributing docs?
0 | ||
""" | ||
return num1 + num2 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a See Also section
yield random.random() | ||
|
||
|
||
Section 5: See also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add refs to all of these sections, also capitalize as you would in the doc-string
... columns=('a', 'b', 'c')) | ||
""" | ||
pass | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe put the sections in the same order as we want in the doc-string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand this, sorry... There is just the Examples
section in this docstring. Do you mean adding the parameters, see also...?
…m the contributing page
@jorisvandenbossche, here you have the list of all parameters currently used in docstrings:
|
~~~~~~~~~~~~~ | ||
|
||
Docstrings must be defined with three double-quotes. No blank lines should be | ||
left before or after the docstring. The text starts immediately after the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means the vast majority of current pandas docstring do it wrong (and as a result... are more readable, I think). Everybody all right with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I personally also like how we mostly start on the following line (which alsi gives you 3 characters more for the summary line ... :-))
PEP257 also says both are ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I got that from the numpy convention. I'm more used to start in the same line, but as far as we always use the same, I don't think it'll make a difference for anyone.
can have multiple lines. The description must start with a capital letter, and | ||
finish with a dot. | ||
|
||
Keyword arguments with a default value, the default will be listed in brackets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For keywords arguments with a default value
- pandas.SparseArray | ||
|
||
If the exact type is not relevant, but must be compatible with a numpy | ||
array, array-like can be specified. If Any type that can be iterated is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any (lower case)
accepted, iterable can be used: | ||
|
||
- array-like | ||
- iterable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be subtle. For instance, pd.Series(i for i in range(3))
works, but it is undocumented. However, I don't think we want to replace "array-like" with "iterable": probably have both, although they are theoretically redundant.
Where's this at? Are we going through another round of reviews or can we merge this and iterate if needed? |
@TomAugspurger I made some changes based on the points discussed in pandas-dev in the python-sprints version: I need to address couple of comments that @jorisvandenbossche pointed out (the default is incorrectly defined, the fillna is not good...) and then should be a good first version. I'll make the changes later today and update this PR with them. |
I copied the latest version of the sprints repo, and pushed that as a commit.So people can already give this another round of review here if needed. |
Updated the documentation with the last changes (mainly the points discussed in pandas-dev), and two new sections, one at the beginning about when to use backticks in the docstrings, and another at the end on how to add plots to the documentation. Any feedback welcome. If you prefer to read it in html, the exact same version as this PR is available here: https://python-sprints.github.io/pandas/guide/pandas_docstring.html |
@datapythonista thanks a lot for the updates. Really looking great. At the others, if you could do a final read of it before the sprint, that would be very welcome. |
|
||
def add_values(arr): | ||
""" | ||
Add the values in `arr`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we settle on single or double backticks here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, typing
`arr`
Uses sphinx's default role. That's currently None (no role) in our conf.py
, but it could be whatever. Does sphinx have a "parameter role"? I'm not finding one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single backticks is what numpydoc spec says to do. But I don't know how useful it is to make the distinction between single backticks for parameters but double backticks for code (other function names, parameter name combined with a value, ..).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numpy uses 'autolink' as their default role. Which makes that they also use single backticks for other functions, and then they automatically become links to the docstring page, which is also nice.
Look eg at the keepdims explanation in the parameter section: https://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html
All of keepdims, ndarray and sum are single backticks. The first is rendered as italic, while the other are links to their docstring page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But probably a bit late to change now before the sprint, without really trying out. I would propose to keep it as is?
Using it would however make the docs a bit more pleasant to read (or write) in plain text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche should we copy this then? https://github.com/numpy/numpy/blob/master/doc/source/conf.py#L65
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I think that'd be the best behavior. I'm not a huge fan of double backticks in docstrings, because they make the text version too noisy. For parameters, we get italics in the HTML (code might be better, but we at least have some formatting), or a link to the object without all the :ref:
noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry didn't see your last post before posting.
Agreed it's too late to change for the sprint. But let's leave the recommendation as is (use single backtick for parameters), since I think it's what we'll want in the future.
- numpy.ndarray | ||
- scipy.sparse.coo_matrix | ||
|
||
If the type is a pandas type, also specify pandas except for Series and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the exception here? IMO, it'd be clearer to follow the rules that numpydoc / sphinx uses for discovery (so anything in the top-level pandas namespace should be found). That way we have consistency with the See Also
section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean everything with 'pandas', or everything without it?
I suppose without? I am fine with that, I think it was added mainly for being explicit for objects that maybe not everybody knows are coming from pandas.
Section 5: See Also | ||
~~~~~~~~~~~~~~~~~~~ | ||
|
||
This section is used to let users know about pandas functionality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strike pandas, since we'll often link to numpy / python / other libraries as well.
related to the one being documented. In rare cases, if no related methods | ||
or functions can be found at all, this section can be skipped. | ||
|
||
An obvious example would be the `head()` and `tail()` methods. As `tail()` does |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change these to be links to the methods? So people can click it and see the rendered docstring?
followed by a space, a colon, another space, and a short description that | ||
illustrated what this method or function does, why is it relevant in this | ||
context, and what are the key differences between the documented function and | ||
the one referencing. The description must also finish with a dot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to require a description, do we? I think if you're writing read_csv
and want to link to DataFrame.to_csv
, just the link should be sufficient.
Yes, without. Not a big deal though.
…On Fri, Mar 9, 2018 at 11:31 AM, Joris Van den Bossche < ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In doc/source/contributing_docstring.rst
<#19704 (comment)>:
> +- {'simple', 'advanced'}
+- {'low', 'medium', 'high'}
+- {'cat', 'dog', 'bird'}
+
+If the type is defined in a Python module, the module must be specified:
+
+- datetime.date
+- datetime.datetime
+- decimal.Decimal
+
+If the type is in a package, the module must be also specified:
+
+- numpy.ndarray
+- scipy.sparse.coo_matrix
+
+If the type is a pandas type, also specify pandas except for Series and
Do you mean everything with 'pandas', or everything without it?
I suppose without? I am fine with that, I think it was added mainly for
being explicit for objects that maybe not everybody knows are coming from
pandas.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19704 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHItBCAf3M2f_qXiogDyHdl5bqCfpGks5tcryDgaJpZM4SGDCd>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
I think we should merge this, as we already have sprint PRs coming in. We revise as needed in followup PRs.
Un up to date version is hosted on the sprint website, so merging is not that urgent. But also no problem to merge |
Ah, OK. does it make sense to include the shared doc guide from
#20016 in the version on the
sprint website?
…On Fri, Mar 9, 2018 at 2:17 PM, Joris Van den Bossche < ***@***.***> wrote:
Un up to date version is hosted on the sprint website, so merging is not
that urgent. But also no problem to merge
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19704 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIopshw72kAS5t0PhwpcObcHh5DBQks5tcuM8gaJpZM4SGDCd>
.
|
@TomAugspurger yeah, if you would have time for that, that would be welcome |
OK, I updated it with the latest version from the sprint website and merged this ones, I will create another issue to discuss further needed clarifications. |
Opened issue here: #20309 |
This PR is to make it easier to review the proposal guide for the pandas documentation sprint, as discussed in pandas-dev.