DOC: Adding guide for the pandas documentation sprint #19704

datapythonista · 2018-02-14T22:06:34Z

This PR is to make it easier to review the proposal guide for the pandas documentation sprint, as discussed in pandas-dev.

chris-b1 · 2018-02-14T22:20:39Z

doc/source/contributing_docstring.rst

+   numpydoc recommends avoiding "obvious" imports and importing them with
+   aliases, so for example `import numpy as np`. While this is now an standard
+   in the data ecosystem of Python, it doesn't seem a good practise, for the
+   next reasons:


Fully acknowledging that I'm bike-shedding...but I really disagree with this. Personally think aliasing is a great compromise between import * (like R, etc), while still having enough brevity for interactive workflows. Also conflicts with majority of pandas/numpy code in the wild.

I also agree with the numpydoc suggestion to avoid obvious imports - I suspect the first most common use of docstrings is inside a repl/notebook/etc - showing the imports adds noise in that context.

About the aliasing (import pandas as pd, import numpy as np), I agree with @chris-b1. It's something new users will have to learn indeed, but it's something they will need to learn anyhow, as almost any code you will see online uses those aliases.

Also conflicts with majority of pandas/numpy code in the wild.

That's not my perception, but maybe I am a bit biased by my environment :)

About whether we want to show the imports or not, here I am more open to be convinced otherwise, although I also find having those two imports everywhere adds noise.

chris-b1 · 2018-02-14T22:21:12Z

Apart from my one comment, I think this is great and appreciate what you've done to pull it together!

TomAugspurger

Thanks for putting this together.

TomAugspurger · 2018-02-14T22:15:17Z

doc/source/contributing_docstring.rst

+
+The short summary must start with a verb infinitive, end with a dot, and fit in
+a single line. It needs to express what the function does without providing
+details.


I think we want a length limit here, so the lines in http://pandas-docs.github.io/pandas-docs-travis/api.html don't wrap. Though it looks like we can't set a hard limit, since the width available depends on the section...

I also don't really care about the "verb infinitive" part.

This doesn't solve the wrapping issue you mention but I wonder if either here or in the General section we should make note that comments still need to wrap at 79 characters for PEP-8 compliance

I checked the line length, and seems that for the rendering in autosummaries something around 60 to 75 characters is the maximum length to avoid wrapping in. The document currently states to fit in a single line, which with PEP-8 line length means 76 characters for functions and 72 for methods. Unless you have a better idea, I'll leave like it is, as I think it's much easier for people to write a single line, than to count the characters.

Regarding the infinitive verb, it's a standard used in Python projects in investment banking. I think it helps people write concise summaries. All functions/methods do things, so starting with a verb always should make sense, and it avoids starts like "This function [...]", "Method to [...]". Being infinitive is just to standardize "Generates [...]", "Generating [...]" and "Generate [...]" to always have the same form. Not sure if any other reason was used besides the infinitive being shorter, but unless you really want to get rid of this rule, I'll keep it as it's used in investment banks.

TomAugspurger · 2018-02-14T22:23:27Z

doc/source/contributing_docstring.rst

+- tuple of (str, int, int)
+- set of {str}
+
+In case there are just a set of values allowed, list them in curly brackets


Specify that the default value, if any, is listed first.

Do we want to require the curly braces? At the moment we don't really use it that much I think.

I also like a more explicit "default value" than just relying on the fact it is listed first

Adding the default value, and that it goes first in a list of options.

Regarding curly brackets, didn't realize they're not being used. But that's part of the numpy convention "When a parameter can only assume one of a fixed set of values, those values can be listed in braces, with the default appearing first". Leaving that, unless you want to change that.

Regarding the default, I think the rule that the default is the first is rather obscure to readers. I am also not sure if the first place is necessarily the best. Eg for the hypothetical example below of {0, 10, 25}, I would rather list them in numerical order even if 0 is not the default.

I agree with @jorisvandenbossche : again, we document the default value quite consistently already

TomAugspurger · 2018-02-14T22:24:47Z

doc/source/contributing_docstring.rst

+- pandas.DataFrame
+
+If more than one type is accepted, separate them by commas, except the
+last two types, that need to be separated by the word 'or':


Ohh do we get to bikeshed about using serial commas? :)

TomAugspurger · 2018-02-14T22:26:30Z

doc/source/contributing_docstring.rst

+Section 5: See also
+~~~~~~~~~~~~~~~~~~~
+
+This is an optional section, used to let users know about pandas functionality


Not necessarily just pandas. We do "See Also" to other packages.

Specify that if you're referring to a method in another package you need the package name, like numpy.where, not np.where.

It's strictly optional, but I would maybe add "optional but strongly recommended section"

TomAugspurger · 2018-02-14T22:27:26Z

doc/source/contributing_docstring.rst

+
+The way to present examples is as follows:
+
+1. Import required libraries


Don't need to import, since we import them in our doctest setup.

Normally we assume the imports

import numpy as np import pandas as pd

other imports should be done explicitly

TomAugspurger · 2018-02-14T22:28:02Z

doc/source/contributing_docstring.rst

+
+3. Show a very basic example that gives an idea of the most common use case
+
+4. Add commented examples that illustrate how the parameters can be used for


Maybe a different word than "commented", since people may interpret that as lines starting with #

TomAugspurger · 2018-02-14T22:30:14Z

doc/source/contributing_docstring.rst

+example in the head method, where it requires to be higher than 5, to show
+the example with the default values.
+
+Avoid using data without interpretation, like a matrix of random numbers


There's an issue about using common, meaningful datasets for these. Maybe we can make a decision on that before the sprint (will try to find link later).

I also remind such a discussion, but didn't find an open issue, only a discussion in a PR (with mention of gitter), so opened a new issue: #19710

jorisvandenbossche

We should also make a kind of "summary checklist"

jorisvandenbossche · 2018-02-14T22:58:50Z

doc/source/contributing_docstring.rst

+documents that explain this convention:
+
+- `Guide to NumPy/SciPy documentation <https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt>`_
+- `numpydoc docstring guide <http://numpydoc.readthedocs.io/en/latest/format.html>`_


I think it is enough to only link to this last one, normally it should contain everything that is in the first (they only recently made that doc page)

Even if the last one should contain the same as the first, but with a better presentation, I think it's good that people is aware of the first document. People is not really expected to read or follow them, they're presented here just for reference. I'll leave both unless you really feel we should get rid of the first.

I personally think it will just confuse people by giving 2 links. If there are two, I expect that somehow it is useful that I look at both of them, but then left wondering what is the difference, because the content is almost exactly the same.

That's a good point. I left just the numpy doc one in the list, and kept the other just in a comment. Let me know if you think it's still worthless to have it this way.

jorisvandenbossche · 2018-02-14T23:06:17Z

doc/source/contributing_docstring.rst

+- tuple of (str, int, int)
+- set of {str}
+
+In case there are just a set of values allowed, list them in curly brackets


Do we want to require the curly braces? At the moment we don't really use it that much I think.

I also like a more explicit "default value" than just relying on the fact it is listed first

jorisvandenbossche · 2018-02-14T23:07:13Z

doc/source/contributing_docstring.rst

+If the type is a pandas type, also specify pandas:
+
+- pandas.Series
+- pandas.DataFrame


I would say that for those two just "Series" and "DataFrame" is enough? (otherwise it can become quite lengthy)

jorisvandenbossche · 2018-02-14T23:08:08Z

doc/source/contributing_docstring.rst

+
+If the type is in a package, the module must be also specified:
+
+- numpy.ndarray


In practice, we now often say something like "array-like", when both lists and arrays are allowed.

I am not sure we actually have many cases in the user facing functions where we require a numpy array (I mean, where we don't accept a list as well, and thus we would not use 'array' or 'array-like' in general)

I agree, although we could maybe have some place in the docs where we clearly define what an "array-like" is (e.g. tuples aren't)... and maybe even refer to it with a footnote?

jorisvandenbossche · 2018-02-14T23:10:19Z

doc/source/contributing_docstring.rst

+- str or list of str
+
+If None is one of the accepted values, it always needs to be the last in
+the list.


Maybe we should discuss how we want to have the "notion" of optional:

int or float, optional

in, float or None

int or float, default None

I vote option 3. I'd give a second place nod to option 1, but optional and default None are essentially the same thing. It seems like an unnatural rule to enforce that default 'foo' is the rule when a keyword has a non-None default argument but optional is the route for None

It seems like an unnatural rule to enforce that default 'foo' is the rule when a keyword has a non-None default argument but optional is the route for None

Having a different way to describe it, can also signal that it actually is something different in practice. In (many, but not all) cases, a value of None really means that it is not specified and is optional (like method=None in fillna, because the default is to fill with a fixed value and not to use a forward or backward filling method), in constrast with other keywords that have a default value (like skipna=True. For users it 'feels' like optional because you typically don't need to specify it, but it is not optional)

jorisvandenbossche · 2018-02-14T23:18:58Z

doc/source/contributing_docstring.rst

+
+Examples in docstrings are also unit tests, and besides illustrating the
+usage of the function or method, they need to be valid Python code, that in a
+deterministic way returns the presented output.


I would maybe not explicitly say "unit tests" (strictly spoken they also aren't at the moment, as we don't run doctests), but we want it be correct python syntax because people can copy paste it to interact themselves with the example or to reproduce the example?

jorisvandenbossche · 2018-02-14T23:19:43Z

doc/source/contributing_docstring.rst

+
+The way to present examples is as follows:
+
+1. Import required libraries


Normally we assume the imports

import numpy as np import pandas as pd

other imports should be done explicitly

jorisvandenbossche · 2018-02-14T23:20:06Z

doc/source/contributing_docstring.rst

+            Examples
+            --------
+            >>> import pandas
+            >>> s = pandas.Series(['Ant', 'Bear', 'Cow', 'Dog', 'Falcon',


pandas -> pd

jorisvandenbossche · 2018-02-14T23:24:28Z

doc/source/contributing_docstring.rst

+   numpydoc recommends avoiding "obvious" imports and importing them with
+   aliases, so for example `import numpy as np`. While this is now an standard
+   in the data ecosystem of Python, it doesn't seem a good practise, for the
+   next reasons:


About the aliasing (import pandas as pd, import numpy as np), I agree with @chris-b1. It's something new users will have to learn indeed, but it's something they will need to learn anyhow, as almost any code you will see online uses those aliases.

Also conflicts with majority of pandas/numpy code in the wild.

That's not my perception, but maybe I am a bit biased by my environment :)

About whether we want to show the imports or not, here I am more open to be convinced otherwise, although I also find having those two imports everywhere adds noise.

jorisvandenbossche · 2018-02-14T23:25:14Z

doc/source/contributing_docstring.rst

+   in the data ecosystem of Python, it doesn't seem a good practise, for the
+   next reasons:
+
+   * The code is not executable anymore (as doctests for example)


When running doctests with pytest, numpy and pandas will always be imported automatically

jorisvandenbossche · 2018-02-14T23:29:18Z

BTW, we should certainly keep this as a separate document, the contributing.rst is already long enough (we should rather start splitting more of our long doc pages in separate pieces IMO)

toobaz · 2018-02-15T07:20:32Z

doc/source/contributing_docstring.rst

+so programmers can understand what it does without having to read the details
+of the implementation.
+
+Also, it is a commonn practice to generate online (html) documentation


datapythonista · 2018-02-25T23:02:11Z

All comments should be addressed in this new version, except in the cases where I added a comment to the review. And also, it's pending the part on the standard datasets (#19710).

…renced

jorisvandenbossche

Some further comments.

Another question, with your docstring_validation script, would it be easy to print out all current type descriptions? (to have a quick overview of what we currently have, and to know which ones are useful to include here in the docs or which ones we need to discuss to have a consistent usage)

jorisvandenbossche · 2018-02-26T22:58:49Z

doc/source/contributing_docstring.rst

+description in this case would be "Description of the arg (default is X).". In
+some cases it may be useful to explain what the default argument means, which
+can be added after a comma "Description of the arg (default is -1, which means
+all cpus).".


Currently, a very frequently occurring pattern to list the default is in the types, like color:, str, default 'blue' or copy : boolean, default True (I think we even do it relatively consistently)

I personally think we would like to keep this. Often the description can be quite long. Having it at the end of the type description gives it a prominent and consistent place to find it.
See eg the how keyword in https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.join.html#pandas.DataFrame.join where the description is a list of the different possible values.

See also eg https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.take.html where it is done consistently

Having the default after the description (and not the type), and having the default first if it's a set like {0, 10, 25}, is in the numpy docstring convention. I agree with you in both cases, it can be clearer having the default after the type, and the options in a consistent order. Just pointing out where these come from.

I also like the current way, and our users might be used to it (I certainly am)

jorisvandenbossche · 2018-02-26T23:01:24Z

doc/source/contributing_docstring.rst

+
+- int
+- float
+- str


let's add dict, list and tuple as other often occurring types

and boolean (or bool), whichever of the two we converge on

Adding bool. dict, list and tuple are already documented in the next block.

jorisvandenbossche · 2018-02-26T23:04:10Z

doc/source/contributing_docstring.rst

+- tuple of (str, int, int)
+- set of {str}
+
+In case there are just a set of values allowed, list them in curly brackets


Regarding the default, I think the rule that the default is the first is rather obscure to readers. I am also not sure if the first place is necessarily the best. Eg for the hypothetical example below of {0, 10, 25}, I would rather list them in numerical order even if 0 is not the default.

jorisvandenbossche · 2018-02-26T23:10:16Z

doc/source/contributing_docstring.rst

+
+If the type is in a package, the module must be also specified:
+
+- numpy.ndarray


I am not sure we actually have many cases in the user facing functions where we require a numpy array (I mean, where we don't accept a list as well, and thus we would not use 'array' or 'array-like' in general)

jorisvandenbossche · 2018-02-26T23:11:29Z

doc/source/contributing_docstring.rst

+
+For complex types, define the subtypes:
+
+- list of [int]


I think this is also something we are currently not doing?
I am not sure if I find list of [int] better than list of int

(for the dict and tuple it can be more illustrative)

Just trying to be consistent for list. I agree, in dict and tuple I think it adds value.

Probably not something it'd ever happen, but it could be clearer:
list of [dict of {int: str}]

than
list of dict of {int: str}

But happy with list of int too.

jorisvandenbossche · 2018-02-26T23:15:09Z

doc/source/contributing_docstring.rst

+- str or list of str
+
+If None is one of the accepted values, it always needs to be the last in
+the list.


It seems like an unnatural rule to enforce that default 'foo' is the rule when a keyword has a non-None default argument but optional is the route for None

Having a different way to describe it, can also signal that it actually is something different in practice. In (many, but not all) cases, a value of None really means that it is not specified and is optional (like method=None in fillna, because the default is to fill with a fixed value and not to use a forward or backward filling method), in constrast with other keywords that have a default value (like skipna=True. For users it 'feels' like optional because you typically don't need to specify it, but it is not optional)

jorisvandenbossche · 2018-02-26T23:18:34Z

doc/source/contributing_docstring.rst

+think about what can be useful for the users reading the documentation,
+especially the less experienced ones.
+
+When relating to other methods (mainly `numpy`), use the name of the module


"When relating to other methods" -> was this intended to be "other libraries" or "other modules"?

jorisvandenbossche · 2018-02-26T23:19:16Z

doc/source/contributing_docstring.rst

+
+            Return
+            ------
+            pandas.Series


pandas.Series -> Series

jreback

is this linked anywhere from the contributing docs?

jreback · 2018-02-27T11:24:00Z

doc/source/contributing_docstring.rst

+    0
+    """
+    return num1 + num2
+


can you add a See Also section

jreback · 2018-02-27T11:25:15Z

doc/source/contributing_docstring.rst

+            yield random.random()
+
+
+Section 5: See also


I would add refs to all of these sections, also capitalize as you would in the doc-string

jreback · 2018-02-27T11:26:00Z

doc/source/contributing_docstring.rst

+        ...                   columns=('a', 'b', 'c'))
+        """
+        pass
+


maybe put the sections in the same order as we want in the doc-string

Not sure I understand this, sorry... There is just the Examples section in this docstring. Do you mean adding the parameters, see also...?

…m the contributing page

datapythonista · 2018-02-28T22:27:55Z

@jorisvandenbossche, here you have the list of all parameters currently used in docstrings:

 {'any', 'all'}, default 'any'
'all', list-like of dtypes or None (default), optional
'fixed(f)|table(t)', default is 'fixed'
'infer', bool-ndarray, 'NaT', default 'raise'
'raise', 'coerce', default 'raise'
(float,float), optional
1d array-like
1d ndarray or Series
2-length sequence (tuple, list, ...)
A tuple (width, height) in inches
CategoricalDtype
DataFrame
DataFrame or Panel
DataFrame or Series
DataFrame or Series/dict-like object, or list of these
DataFrame, Series
DataFrame, Series with name field set, or list of DataFrame
DataFrame, or object coercible into a DataFrame
DateOffset object, or string
DateOffset, timedelta, or offset alias string, optional
DateOffset, timedelta, or time rule string, default None
DateOffset, timedelta, or time rule string, optional
DatetimeIndex or TimedeltaIndex
Drop groups that do not pass the filter. True by default;
Grouped DataFrame
How to join individual DataFrames
Index
Index or array-like
Index or list/tuple of indices
Index, optional
Index-like
Index-like (unique), optional
IndexSlice
Keyword Arguments
Matplotlib axes object, optional
Matplotlib axis object, default None
Matplotlib axis object, optional
MultiIndex or list of tuples
NDFrame, default None
Name of the column containing class names
None
None or float value, default None
None or float value, default None (NaN)
None or str, optional
None, integer or string axis name, optional
NumPy array or integer, optional
NumPy dtype (default: float64)
NumPy dtype (default: int64)
NumPy dtype (default: object)
NumPy dtype (default: uint64)
Number of points to plot in each curve
Object
Panel or list of Panels
Panel, or object coercible to Panel
Period frequency
Period or compat.string_types, default None
Python write mode, default 'w'
SQLAlchemy connectable (engine/connection) or database string URI
SQLAlchemy engine or DBAPI2 connection (legacy mode)
Series
Series or DataFrame
Series or list/tuple of Series
Series or scalar value
Series, DataFrame
Series, DataFrame, or constant
Series, DataFrame, or ndarray, optional
Setting this to True will show the grid
StringIO-like, optional
The number of decimal places to use when encoding
The object to check.
Timedelta, timedelta, np.timedelta64, string, or integer
Type name or dict of column -> type, default None
a list of columns that if not None, will limit the return
a sequence or mapping of Series, DataFrame, or Panel objects
a valid JSON string or file-like, default: None
alignment axis if needed, default None
alignment level if needed, default None
allowed axis of the other object, default None
an iterable
array or boolean, default None
array-like
array-like (1-dimensional)
array-like (1-dimensional), optional
array-like or Categorical, (1-dimensional)
array-like or Index (1d)
array-like or callable, default None
array-like, Series, or DataFrame
array-like, Series, or list of arrays/Series
array-like, default None
array-like, dict, or scalar value
array-like, integers
array-like, optional
array-like, optional (should be specified using keywords)
array_like
axes to direct sorting
axis to shift, default 0
bool
bool (default True)
bool (default: True)
bool or None, default True
bool or list of bool, default True
bool or same types as ``to_replace``, default False
bool, default False
bool, default NaN
bool, default None
bool, default True
bool, default True.
bool, defaults to False
bool, optional
boolean
boolean (default: False)
boolean NDFrame, array-like, or callable
boolean array-like with the same length as self
boolean or dict, default True
boolean or list of ints or names or list of lists or dict, default False
boolean or list of string, default True
boolean or string, default False
boolean or string, default True
boolean whether to append to an existing msgpack
boolean,
boolean, (default False)
boolean, True by default
boolean, default False
boolean, default False, do not write an ALL nan row to
boolean, default None
boolean, default True
boolean, default True if ax is None else False
boolean, default True, append the input data to the
boolean, default ``True``
boolean, default is True,
boolean, default to False
boolean, defaults to False
boolean, defaults to True
boolean, if True, return an iterator to the unpacker
boolean, optional
boolean, return an iterator, default False
boolean, should automatically close the store when
boolean, {'all', 'index', 'columns'}, or {0,1}, default False
boolean/string, default None
callable
callable or tuple of (callable, string)
callable(1d-array) -> 1d-array<boolean>, default None
callable, default None
callable, optional
callable, string, dictionary, or list of string/callables
category or list of categories
category or list-like of category
character, default ","
class, default dict
closed end of interval; 'left' or 'right'
column label or list of column labels / arrays
column label or sequence of labels, optional
column name or list of names, or vector
column to aggregate, optional
column, Grouper, array, or list of the previous
data type, or dict of column name -> data type
date or array of dates
date, string, int
datetime
datetime-like, str, int, float
datetime.time or string
datetime.time, str
default NaN, fill value for missing values.
default None, provide an encoding for strings
deprecated, use `expand`
dict
dict (python 3), str or None (python 2)
dict of column name to SQL type, default None
dict of columns that specify minimum string sizes
dict or list of dicts
dict, default None
dict, default is None
dict, optional
dict-like or function, optional
dtype or None, default None
dtype, default None
dtype, default np.uint8
end time, datetime-like, optional
end time, timedelta-like, optional
end value, period-like, optional
expected TOTAL row size of this table
float
float or array-like, default 0.5 (50% quantile)
float or array_like
float or array_like, default None
float, default NaN
float, default None
float, defaults to NaN (missing)
float, optional
force encoded string to be ASCII, default True.
freq string/object
frequency string
function
function, default None
function, dict, or Series
function, list of functions, dict, default numpy.mean
function, optional
hint to the hashtable sizer
identifier of index column, defaults to None
ignored
index to direct sorting
index, columns to direct sorting
index-like
int
int (can only be zero)
int (default: 0)
int (default: 0), or other RangeIndex instance.
int (default: 1)
int or None
int or array
int or axis name
int or basestring
int or csv.QUOTE_* instance, default 0
int or level name or list of ints or list of level names
int or level name, default None
int or list of ints
int or list of ints, default 'infer'
int or list, default None
int or name
int or numpy.random.RandomState, optional
int or sequence or False, default None
int or str
int or str, default 0
int or str, optional
int or str, optional, default None
int or string
int or string axis name
int or string axis name, optional
int or string, default 0
int or string, optional
int, Series, or array-like
int, array, or Series, default None
int, array-like
int, default -1
int, default -1 (all)
int, default 0
int, default 0 (no flags)
int, default 1
int, default 5
int, default None
int, default None.
int, defaults None
int, dict, Series
int, float, Interval
int, level name, or sequence of int/level names (default None)
int, level name, or sequence of such, default None
int, list of ints, default 0
int, list of ints, default None
int, optional
int, optional, > 0
int, optional, default 0
int, or offset
int, sequence of scalars, or IntervalIndex
int, str or None
int, str, default None
int, str, tuple, or list, default None
int, string (can be mixed)
int, string, or list of these, default -1 (last level)
int, string, or list of these, default last level
int/level name or list thereof
integer (defaults to None), row number to start selection
integer (defaults to None), row number to stop selection
integer or array of quantiles
integer or sequence, default 10
integer, default 1
integer, default None
integer, float, string, datetime, list, tuple, 1-d array, Series
integer, optional
interval boundary to use for labeling; 'left' or 'right'
item label (panel item)
iterable, Series, DataFrame or dictionary
iterable, optional
keyword arguments to pass on to the constructor
keyword arguments to pass on to the interpolating function.
keyword, value pairs
keywords
label
label or list
label or list, or array-like
label or position, optional
label or tuple of labels (one for each level)
label rotation angle
label, default None
list
list / sequence of array-likes
list / sequence of iterables
list / sequence of strings or None
list / sequence of tuple-likes
list of Index objects
list of Term (or convertible) objects, optional
list of columns to create as data columns, or True to
list of int or list of str
list of int representing new level order.
list of ints. optional
list of pairs (int, int) or 'infer'. optional
list of paths (string or list of strings), default None
list of sequences, default None
list or None
list or dict of one-parameter functions, optional
list or dict, default: None
list, default None
list, default: None
list, tuple or dict, optional, default: None
list, tuple, 1-d array, or Series
list-like
list-like of Categorical, CategoricalIndex,
list-like of dtypes or None (default), optional,
list-like of numbers, optional
list-like or None, default None
list-like or integer or callable, default None
list-like, default None
list-like, dict-like or callable
list-like, int or str, default 0
list-like, or list of list-likes
mapping, function, label, or list of labels
mapping, optional
matplotlib axes object, default None
matplotlib axis object
name, tuple/list of names, or array-like
name/number, defaults to None
ndarray
ndarray (1-d)
ndarray (items x major x minor), or dict of DataFrames
ndarray (structured dtype), list of tuples, dict, or DataFrame
ndarray or object value
nrows to include in iteration, return an iterator
number/name of the axis, defaults to 0
numeric or datetime-like, default None
numeric, optional
numeric, string, or DateOffset, default None
numpy dtype or pandas type
numpy ndarray (structured or homogeneous), dict, or DataFrame
numpy.dtype or None
object
object to be converted
object, default ''
object, default None
object, defaults to first n levels (n=1 or len(key))
object, optional
one-parameter function, optional
optional
optional int
optional sequence of objects
optional, 'infer' or None, defaults to None
optional, array-like
optional, defaults False.
optional, defaults to tab
other plotting keyword arguments
path (string), buffer or path object (pathlib.Path or
pytz.timezone or dateutil.tz.tzfile
raise on invalid input
replace NaN with this value if the unstack produces
scalar
scalar or array_like, optional
scalar or list-like
scalar value
scalar, NDFrame, or callable
scalar, default 'value'
scalar, default None
scalar, default is 'unix'
scalar, default np.NaN
scalar, dict, Series, or DataFrame
scalar, dict, list, str, regex, default None
scalar, hashable sequence, dict-like or function, optional
scalar, list-like, dict-like or function, optional
scalar, list-like, optional
scalar, or array-like
scalar, str, list-like, or dict, default None
scipy.sparse.coo_matrix
sequence
sequence of (key, value) pairs
sequence of arrays
sequence or list of sequence
sequence, default None
sequence, optional
set or list-like
single label or list-like
size to chunk the writing
sort by the remaining levels after level.
starting value, datetime-like, optional
starting value, period-like, optional
starting value, timedelta-like, optional
str
str (length 1), default None
str (length 1), optional
str or None
str or PeriodDtype, default None
str or buffer
str or csv.Dialect instance, default None
str or file-like
str or int, optional
str or list
str or list of str
str or list-like
str or matplotlib colormap object, default None
str or ndarray-like, optional
str or sequence
str or unicode
str {'E', 'S'}
str {'dict', 'list', 'series', 'split', 'records', 'index'}
str, default ""
str, default ','
str, default '.'
str, default '\\d+'
str, default '\s+'.
str, default 'pad'
str, default None
str, default \t (tab-stop)
str, default ``'	' + ' '``
str, default ``None``
str, default is 'utf-8'
str, method of resampling ('ffill', 'bfill')
str, optional
str, optional (python 2)
str, pathlib.Path, py._path.local.LocalPath or any \
str, regex, list, dict, Series, numeric, or None
str, tuple, datetime.timedelta, DateOffset or None
str, {'raise', 'ignore'}, default 'raise'
string
string (regular expression)
string / frequency object, defaults to None
string File path, BytesIO like or string
string File path, buffer-like, or None
string SQL query or SQLAlchemy Selectable (select or text object)
string file path or file handle / StringIO
string file path, or file-like object
string or DateOffset, default 'B' (business daily)
string or DateOffset, default 'D' (calendar daily)
string or DateOffset, optional
string or ExcelWriter object
string or None
string or None, default None
string or SQLAlchemy Selectable (select or text object)
string or callable
string or compiled regex
string or datetime-like, default None
string or file handle, default None
string or file-like object
string or int, optional
string or list of strings, default None
string or list of strings, optional, default: None
string or object
string or object, optional
string or pandas offset object, optional
string or period object, optional
string or period-like, default None
string or pytz.timezone object
string or sequence
string or sequence, default None
string or timedelta-like, default None
string to use as string nan represenation
string {'xport', 'sas7bdat'} or None
string,
string, DateOffset, dateutil.relativedelta
string, None or encoding
string, default
string, default "Pandas"
string, default "|"
string, default ''
string, default '.'
string, default 'All'
string, default 'Sheet1'
string, default '_'
string, default 'inf'
string, default 'ms' (milliseconds)
string, default 'ns'
string, default None
string, default frequency of PeriodIndex
string, default is None
string, default whitespace
string, defaults to None
string, int, mixed list of strings/ints, or None, default 0
string, list of fields, array-like
string, list of strings, or dict of strings, default None
string, number, or hashable object
string, optional
string, optional, default: None
string, optional, {'pad', 'ffill', 'bfill'}
string, path object (pathlib.Path or py._path.local.LocalPath),
string, pytz.timezone, dateutil.tz.tzfile or None
string, timedelta, list, tuple, 1-d array, or Series
string, valid regular expression
string, {'ns', 'us', 'ms', 's', 'm', 'h', 'D'}, optional
the axis to convert
the axis to localize
the pandas object holding the data
the path (string) or HDFStore object
the path or buffer to write the result string
three positional arguments: each one of
timedelta
tuple
tuple (optional)
tuple and dict
tuple of integer (length 2), default None
tuple, default None
tuple, list, or ndarray, optional
tuple, optional
tuple/list
type of compressor (zlib or blosc), default to None (no
type of object to recover (series or frame), default 'frame'
unit of the arg (D,h,m,s,ms,us,ns) denote the unit, which is an
value
where to reorder levels
writable buffer, defaults to sys.stdout
{'NFC', 'NFKC', 'NFD', 'NFKD'}
{'all', 'any'}, default 'any'
{'any', 'all'}
{'auto', 'pyarrow', 'fastparquet'}, default 'auto'
{'average', 'min', 'max', 'first', 'dense'}
{'average', 'min', 'max', 'first', 'dense'}, efault 'average'
{'backfill', 'bfill', 'pad', 'ffill', None}, default None
{'backfill'/'bfill', 'pad'/'ffill'}, default None
{'block', 'integer'}
{'c', 'python'}, optional
{'columns', 'index'}, default 'columns'
{'fail', 'replace', 'append'}, default 'fail'
{'first', 'last', False}, default 'first'
{'first', 'last'}, default 'first'
{'first', 'last'}, default 'last'
{'forward', 'backward', 'both'}, default 'forward'
{'hist', 'kde'}
{'ignore', 'raise', 'coerce'}, default 'raise'
{'ignore', 'raise'}, default 'raise'
{'infer', 'gzip', 'bz2', 'xz', 'zip', None}, default 'infer'
{'infer', 'gzip', 'bz2', 'xz', None}, default 'infer'
{'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default 'infer'
{'inner', 'outer'}, default 'outer'
{'inside', 'outside'}, default None
{'integer', 'signed', 'unsigned', 'float'} , default None
{'items', 'major', 'minor'}
{'items', 'major', 'minor'} or {0, 1, 2}
{'items', 'major', 'minor}, default 1/'major'
{'items', 'minor', 'major'}, or {0, 1, 2}, or a tuple with two
{'items', 'minor'}, default 'items'
{'ix', 'loc', 'getitem'}
{'ix', 'loc', 'getitem'} or None
{'keep', 'top', 'bottom'}
{'left', 'right', 'both', 'neither'}, default 'right'
{'left', 'right', 'both'}, default 'left'
{'left', 'right', 'inner', 'outer'}
{'left', 'right', 'outer', 'inner'}
{'left', 'right', 'outer', 'inner'}, default 'inner'
{'left', 'right', 'outer', 'inner'}, default: 'left'
{'left', 'right'}
{'left', 'zero',' mid'}, default 'left'
{'left'}, default 'left'
{'linear', 'lower', 'higher', 'midpoint', 'nearest'}
{'linear', 'time', 'index', 'values', 'nearest', 'zero',
{'major', 'minor', 'items'}, default 'major'
{'mergesort', 'quicksort', 'heapsort'}, default 'quicksort'
{'outer', 'inner', 'left', 'right'}, default 'outer'
{'pearson', 'kendall', 'spearman'}
{'quicksort', 'mergesort', 'heapsort'}, default 'quicksort'
{'raise', 'ignore'}, default 'raise'
{'raise', 'ignore'}, default 'raise'.
{'right', 'left'}
{'s', 'e', 'start', 'end'}
{'snappy', 'gzip', 'brotli', None}, default 'snappy'
{'start', 'end', 'e', 's'}
{'start', 'end', 's', 'e'}
{'start', 'end'}, default end
{0 or 'index', 1 or 'columns'}
{0 or 'index', 1 or 'columns'}, default 0
{0 or 'index', 1 or 'columns'}, default None
{0 or 'index', 1 or 'columns'}, or tuple/list thereof
{0, 'index'}
{0, 'index'}, default 0
{0, 'index'}, default None
{0, 1, 'index', 'columns'}
{0, 1, 'index', 'columns'} (default 0)
{0, 1}, default 0
{0/'index', 1/'columns'}, default 0
{None, 'axes', 'dict', 'both'}, default None
{None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}, optional
{None, 'epoch', 'iso'}
{None, 'gzip', 'bz2', 'xz'}
{None, 'ignore'}
{None, 'pad'/'ffill', 'backfill'/'bfill', 'nearest'}, optional
{None, True, False}, optional
{Series, DataFrame, Panel}
{default 'raise', 'drop'}, optional
{index (0), columns (1)}
{index (0)}
{items (0), major_axis (1), minor_axis (2)}
{items, major_axis, minor_axis}

toobaz · 2018-03-02T21:22:23Z

doc/source/contributing_docstring.rst

+~~~~~~~~~~~~~
+
+Docstrings must be defined with three double-quotes. No blank lines should be
+left before or after the docstring. The text starts immediately after the


This means the vast majority of current pandas docstring do it wrong (and as a result... are more readable, I think). Everybody all right with this?

Yes, I personally also like how we mostly start on the following line (which alsi gives you 3 characters more for the summary line ... :-))

PEP257 also says both are ok.

I think I got that from the numpy convention. I'm more used to start in the same line, but as far as we always use the same, I don't think it'll make a difference for anyone.

toobaz · 2018-03-02T21:26:00Z

doc/source/contributing_docstring.rst

+can have multiple lines. The description must start with a capital letter, and
+finish with a dot.
+
+Keyword arguments with a default value, the default will be listed in brackets


For keywords arguments with a default value

toobaz · 2018-03-02T21:34:34Z

doc/source/contributing_docstring.rst

+- pandas.SparseArray
+
+If the exact type is not relevant, but must be compatible with a numpy
+array, array-like can be specified. If Any type that can be iterated is


any (lower case)

toobaz · 2018-03-02T21:39:23Z

doc/source/contributing_docstring.rst

+accepted, iterable can be used:
+
+- array-like
+- iterable


This might be subtle. For instance, pd.Series(i for i in range(3)) works, but it is undocumented. However, I don't think we want to replace "array-like" with "iterable": probably have both, although they are theoretically redundant.

TomAugspurger · 2018-03-07T16:12:22Z

Where's this at? Are we going through another round of reviews or can we merge this and iterate if needed?

datapythonista · 2018-03-07T16:27:36Z

@TomAugspurger I made some changes based on the points discussed in pandas-dev in the python-sprints version:
python-sprints/python-sprints.github.io@0dc3c18

I need to address couple of comments that @jorisvandenbossche pointed out (the default is incorrectly defined, the fillna is not good...) and then should be a good first version. I'll make the changes later today and update this PR with them.

jorisvandenbossche · 2018-03-07T22:53:39Z

I copied the latest version of the sprints repo, and pushed that as a commit.So people can already give this another round of review here if needed.

…ixes

datapythonista · 2018-03-08T01:19:33Z

Updated the documentation with the last changes (mainly the points discussed in pandas-dev), and two new sections, one at the beginning about when to use backticks in the docstrings, and another at the end on how to add plots to the documentation.

Any feedback welcome. If you prefer to read it in html, the exact same version as this PR is available here: https://python-sprints.github.io/pandas/guide/pandas_docstring.html

jorisvandenbossche · 2018-03-08T20:10:38Z

@datapythonista thanks a lot for the updates. Really looking great.

At the others, if you could do a final read of it before the sprint, that would be very welcome.

TomAugspurger · 2018-03-09T16:03:15Z

doc/source/contributing_docstring.rst

+
+    def add_values(arr):
+        """
+        Add the values in `arr`.


Did we settle on single or double backticks here?

For reference, typing

`arr`

Uses sphinx's default role. That's currently None (no role) in our conf.py, but it could be whatever. Does sphinx have a "parameter role"? I'm not finding one.

Single backticks is what numpydoc spec says to do. But I don't know how useful it is to make the distinction between single backticks for parameters but double backticks for code (other function names, parameter name combined with a value, ..).

Numpy uses 'autolink' as their default role. Which makes that they also use single backticks for other functions, and then they automatically become links to the docstring page, which is also nice.

Look eg at the keepdims explanation in the parameter section: https://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html
All of keepdims, ndarray and sum are single backticks. The first is rendered as italic, while the other are links to their docstring page.

But probably a bit late to change now before the sprint, without really trying out. I would propose to keep it as is?

Using it would however make the docs a bit more pleasant to read (or write) in plain text.

@jorisvandenbossche should we copy this then? https://github.com/numpy/numpy/blob/master/doc/source/conf.py#L65

FWIW, I think that'd be the best behavior. I'm not a huge fan of double backticks in docstrings, because they make the text version too noisy. For parameters, we get italics in the HTML (code might be better, but we at least have some formatting), or a link to the object without all the :ref: noise.

Sorry didn't see your last post before posting.

Agreed it's too late to change for the sprint. But let's leave the recommendation as is (use single backtick for parameters), since I think it's what we'll want in the future.

TomAugspurger · 2018-03-09T17:28:10Z

doc/source/contributing_docstring.rst

+- numpy.ndarray
+- scipy.sparse.coo_matrix
+
+If the type is a pandas type, also specify pandas except for Series and


Why the exception here? IMO, it'd be clearer to follow the rules that numpydoc / sphinx uses for discovery (so anything in the top-level pandas namespace should be found). That way we have consistency with the See Also section.

Do you mean everything with 'pandas', or everything without it?
I suppose without? I am fine with that, I think it was added mainly for being explicit for objects that maybe not everybody knows are coming from pandas.

TomAugspurger · 2018-03-09T17:29:40Z

doc/source/contributing_docstring.rst

+Section 5: See Also
+~~~~~~~~~~~~~~~~~~~
+
+This section is used to let users know about pandas functionality


strike pandas, since we'll often link to numpy / python / other libraries as well.

TomAugspurger · 2018-03-09T17:30:21Z

doc/source/contributing_docstring.rst

+related to the one being documented. In rare cases, if no related methods
+or functions can be found at all, this section can be skipped.
+
+An obvious example would be the `head()` and `tail()` methods. As `tail()` does


Change these to be links to the methods? So people can click it and see the rendered docstring?

TomAugspurger · 2018-03-09T17:33:40Z

doc/source/contributing_docstring.rst

+followed by a space, a colon, another space, and a short description that
+illustrated what this method or function does, why is it relevant in this
+context, and what are the key differences between the documented function and
+the one referencing. The description must also finish with a dot.


I don't think we need to require a description, do we? I think if you're writing read_csv and want to link to DataFrame.to_csv, just the link should be sufficient.

TomAugspurger · 2018-03-09T17:36:02Z

Yes, without. Not a big deal though.

…

On Fri, Mar 9, 2018 at 11:31 AM, Joris Van den Bossche < ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In doc/source/contributing_docstring.rst <#19704 (comment)>: > +- {'simple', 'advanced'} +- {'low', 'medium', 'high'} +- {'cat', 'dog', 'bird'} + +If the type is defined in a Python module, the module must be specified: + +- datetime.date +- datetime.datetime +- decimal.Decimal + +If the type is in a package, the module must be also specified: + +- numpy.ndarray +- scipy.sparse.coo_matrix + +If the type is a pandas type, also specify pandas except for Series and Do you mean everything with 'pandas', or everything without it? I suppose without? I am fine with that, I think it was added mainly for being explicit for objects that maybe not everybody knows are coming from pandas. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#19704 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHItBCAf3M2f_qXiogDyHdl5bqCfpGks5tcryDgaJpZM4SGDCd> .

TomAugspurger

+1

I think we should merge this, as we already have sprint PRs coming in. We revise as needed in followup PRs.

jorisvandenbossche · 2018-03-09T20:16:55Z

Un up to date version is hosted on the sprint website, so merging is not that urgent. But also no problem to merge

TomAugspurger · 2018-03-09T20:23:36Z

Ah, OK. does it make sense to include the shared doc guide from #20016 in the version on the sprint website?

…

On Fri, Mar 9, 2018 at 2:17 PM, Joris Van den Bossche < ***@***.***> wrote: Un up to date version is hosted on the sprint website, so merging is not that urgent. But also no problem to merge — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#19704 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIopshw72kAS5t0PhwpcObcHh5DBQks5tcuM8gaJpZM4SGDCd> .

jorisvandenbossche · 2018-03-09T22:30:47Z

@TomAugspurger yeah, if you would have time for that, that would be welcome

jorisvandenbossche · 2018-03-12T17:33:16Z

OK, I updated it with the latest version from the sprint website and merged this ones, I will create another issue to discuss further needed clarifications.

jorisvandenbossche · 2018-03-12T17:44:29Z

Opened issue here: #20309

DOC: Adding guide for the pandas documentation sprint

c625da5

chris-b1 reviewed Feb 14, 2018

View reviewed changes

TomAugspurger reviewed Feb 14, 2018

View reviewed changes

jorisvandenbossche reviewed Feb 14, 2018

View reviewed changes

toobaz reviewed Feb 15, 2018

View reviewed changes

jorisvandenbossche added the Docs label Feb 15, 2018

jorisvandenbossche mentioned this pull request Feb 15, 2018

DOC: develop a set of standard example DataFrames for use in docstring examples #19710

Open

Made changes addressing comments in the different reviews

97dd08d

datapythonista added 2 commits February 25, 2018 23:09

Adding comment on why the original numpy docstring convention is refe…

9b275ac

…renced

removing note saying that examples are unit tests

8e558b0

jorisvandenbossche reviewed Feb 26, 2018

View reviewed changes

jreback reviewed Feb 27, 2018

View reviewed changes

Addressing comments from reviews, and linking the docstring guide fro…

0598a79

…m the contributing page

toobaz reviewed Mar 2, 2018

View reviewed changes

Latest updates from sprint website

b170a73

datapythonista added 2 commits March 8, 2018 01:14

Updating contributing_docstring.rst with the last changes, and some f…

96da2c8

…ixes

Merging

cd43d82

jorisvandenbossche mentioned this pull request Mar 8, 2018

Document how to plot figures in generated docs from docstrings #20052

Closed

jorisvandenbossche mentioned this pull request Mar 9, 2018

DOC: use different 'sections' in the docstring examples? #20039

Closed

TomAugspurger reviewed Mar 9, 2018

View reviewed changes

TomAugspurger approved these changes Mar 9, 2018

View reviewed changes

synchronize with sprint version

5ebac55

jorisvandenbossche merged commit 7169830 into pandas-dev:master Mar 12, 2018

jorisvandenbossche mentioned this pull request Mar 12, 2018

DOC: further clarifications to docstring guide #20309

Open

jorisvandenbossche added this to the 0.23.0 milestone Mar 12, 2018

jorisvandenbossche mentioned this pull request Mar 13, 2018

DOC: add small guide on how to write examples that pass doctests #20037

Closed


		The way to present examples is as follows:

		1. Import required libraries


		3. Show a very basic example that gives an idea of the most common use case

		4. Add commented examples that illustrate how the parameters can be used for


		If the type is in a package, the module must be also specified:

		- numpy.ndarray

Uh oh!

DOC: Adding guide for the pandas documentation sprint #19704

DOC: Adding guide for the pandas documentation sprint #19704

Uh oh!

Conversation

datapythonista commented Feb 14, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chris-b1 commented Feb 14, 2018

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Feb 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

jorisvandenbossche Feb 25, 2018 •

edited

Loading