DOC: update the docstring for several functions and properties (Seoul). #20099

coffeedjimmy · 2018-03-10T08:31:36Z

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

PR title is "DOC: update the docstring"
The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single <your-function-or-method>
It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

`pd.DataFrame.get_dtype_counts`

################################################################################
################ Docstring (pandas.DataFrame.get_dtype_counts)  ################
################################################################################

Return counts of unique dtypes in this object.

Returns
-------
dtype   Number of dtype

See Also
--------
dtypes : Return the dtypes in this object.

Examples
--------
>>> a = [['a', 1, 1.0], ['b', 2, 2.0], ['c', 3, 3.0]]
>>> df = pd.DataFrame(a, columns=['str', 'int', 'float'])
>>> df['int'].astype(int)
>>> df['float'].astype(float)
>>> df.get_dtype_counts()
float64    1
int64      1
object     1
dtype: int64

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	No extended summary found
	Examples do not pass tests

################################################################################
################################### Doctests ###################################
################################################################################

**********************************************************************
Line 16, in pandas.DataFrame.get_dtype_counts
Failed example:
    df['int'].astype(int)
Expected nothing
Got:
    0    1
    1    2
    2    3
    Name: int, dtype: int64
**********************************************************************
Line 17, in pandas.DataFrame.get_dtype_counts
Failed example:
    df['float'].astype(float)
Expected nothing
Got:
    0    1.0
    1    2.0
    2    3.0
    Name: float, dtype: float64

`pd.DataFrame.get_ftype_counts`

################################################################################
################ Docstring (pandas.DataFrame.get_ftype_counts)  ################
################################################################################

Return counts of unique ftypes in this object.

Returns
-------
dtype   Number of dtype:dense|sparse

See Also
--------
ftypes : Return
         ftypes (indication of sparse/dense and dtype) in this object.

Examples
--------
>>> a = [['a', 1, 1.0], ['b', 2, 2.0], ['c', 3, 3.0]]
>>> df = pd.DataFrame(a, columns=['str', 'int', 'float'])
>>> df['int'].astype(int)
>>> df['float'].astype(float)
>>> df.get_dtype_counts()
float64:dense    1
int64:dense      1
object:dense     1
dtype: int64

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	No extended summary found
	Examples do not pass tests

################################################################################
################################### Doctests ###################################
################################################################################

**********************************************************************
Line 17, in pandas.DataFrame.get_ftype_counts
Failed example:
    df['int'].astype(int)
Expected nothing
Got:
    0    1
    1    2
    2    3
    Name: int, dtype: int64
**********************************************************************
Line 18, in pandas.DataFrame.get_ftype_counts
Failed example:
    df['float'].astype(float)
Expected nothing
Got:
    0    1.0
    1    2.0
    2    3.0
    Name: float, dtype: float64
**********************************************************************
Line 19, in pandas.DataFrame.get_ftype_counts
Failed example:
    df.get_dtype_counts()
Expected:
    float64:dense    1
    int64:dense      1
    object:dense     1
    dtype: int64
Got:
    float64    1
    int64      1
    object     1
    dtype: int64

`pd.DataFrame.select_dtypes`

################################################################################
################## Docstring (pandas.DataFrame.select_dtypes) ##################
################################################################################

Return a subset of a DataFrame including/excluding columns based on
their ``dtype``.

Parameters
----------
include, exclude : scalar or list-like
    A selection of dtypes or strings to be included/excluded. At least
    one of these parameters must be supplied.

Raises
------
ValueError
    * If both of ``include`` and ``exclude`` are empty
    * If ``include`` and ``exclude`` have overlapping elements
    * If any kind of string dtype is passed in.

Returns
-------
subset : DataFrame
    The subset of the frame including the dtypes in ``include`` and
    excluding the dtypes in ``exclude``.

Notes
-----
* To select all *numeric* types, use ``np.number`` or ``'number'``
* To select strings you must use the ``object`` dtype, but note that
  this will return *all* object dtype columns
* See the `numpy dtype hierarchy
  <http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html>`__
* To select datetimes, use ``np.datetime64``, ``'datetime'`` or
  ``'datetime64'``
* To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or
  ``'timedelta64'``
* To select Pandas categorical dtypes, use ``'category'``
* To select Pandas datetimetz dtypes, use ``'datetimetz'`` (new in
  0.20.0) or ``'datetime64[ns, tz]'``

Examples
--------
>>> df = pd.DataFrame({'a': np.random.randn(6).astype('f4'),
...                    'b': [True, False] * 3,
...                    'c': [1.0, 2.0] * 3})
>>> df
        a      b  c
0  0.3962   True  1.0
1  0.1459  False  2.0
2  0.2623   True  1.0
3  0.0764  False  2.0
4 -0.9703   True  1.0
5 -1.2094  False  2.0
>>> df.select_dtypes(include='bool')
   b
0  True
1  False
2  True
3  False
4  True
5  False
>>> df.select_dtypes(include=['float64'])
   c
0  1.0
1  2.0
2  1.0
3  2.0
4  1.0
5  2.0
>>> df.select_dtypes(exclude=['floating'])
       b
0   True
1  False
2   True
3  False
4   True
5  False

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	No summary found (a short summary in a single line should be present at the beginning of the docstring)
	Errors in parameters section
		Parameters {'exclude', 'include'} not documented
		Unknown parameters {'include, exclude'}
	See Also section not found
	Examples do not pass tests

################################################################################
################################### Doctests ###################################
################################################################################

**********************************************************************
Line 44, in pandas.DataFrame.select_dtypes
Failed example:
    df
Expected:
            a      b  c
    0  0.3962   True  1.0
    1  0.1459  False  2.0
    2  0.2623   True  1.0
    3  0.0764  False  2.0
    4 -0.9703   True  1.0
    5 -1.2094  False  2.0
Got:
              a      b    c
    0  1.941085   True  1.0
    1  1.050210  False  2.0
    2  1.936395   True  1.0
    3 -1.503260  False  2.0
    4 -0.155825   True  1.0
    5  0.852338  False  2.0

`pd.DataFrame.values`

################################################################################
##################### Docstring (pandas.DataFrame.values)  #####################
################################################################################

Return NDFrame as ndarray or ndarray-like depending on the dtype.

Notes
-----
The dtype will be a lower-common-denominator dtype (implicit
upcasting); that is to say if the dtypes (even of numeric types)
are mixed, the one that accommodates all will be chosen. Use this
with care if you are not dealing with the blocks.

e.g. If the dtypes are float16 and float32, dtype will be upcast to
float32.  If dtypes are int32 and uint8, dtype will be upcast to
int32. By numpy.find_common_type convention, mixing int64 and uint64
will result in a flot64 dtype.

Examples
--------
>>> df = pd.DataFrame({'a': np.random.randn(2).astype('f4'),
...                    'b': [True, False], 'c': [1.0, 2.0]})
>>> type(df.values)
<class 'numpy.ndarray'>
>>> df.values
array([[0.25209328532218933, True, 1.0],
[0.35383567214012146, False, 2.0]], dtype=object)
################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	No extended summary found
	No returns section found
	Private classes (['NDFrame']) should not be mentioned in public docstring.
	See Also section not found
	Examples do not pass tests

################################################################################
################################### Doctests ###################################
################################################################################

**********************************************************************
Line 22, in pandas.DataFrame.values
Failed example:
    df.values
Expected:
    array([[0.25209328532218933, True, 1.0],
    [0.35383567214012146, False, 2.0]], dtype=object)
Got:
    array([[-0.8504104018211365, True, 1.0],
           [-0.9855750203132629, False, 2.0]], dtype=object)

`pd.DataFrame.get_values`

################################################################################
################### Docstring (pandas.DataFrame.get_values)  ###################
################################################################################

Same as values (but handles sparseness conversions).

Returns
-------
numpy.ndaray
    Numpy representation of NDFrame

Examples
--------
>>> df = pd.DataFrame({'a': np.random.randn(2).astype('f4'),
...                    'b': [True, False], 'c': [1.0, 2.0]})
>>> df.get_values()
array([[0.25209328532218933, True, 1.0],
[0.35383567214012146, False, 2.0]], dtype=object)
################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	No extended summary found
	Private classes (['NDFrame']) should not be mentioned in public docstring.
	See Also section not found
	Examples do not pass tests

################################################################################
################################### Doctests ###################################
################################################################################

**********************************************************************
Line 13, in pandas.DataFrame.get_values
Failed example:
    df.get_values()
Expected:
    array([[0.25209328532218933, True, 1.0],
    [0.35383567214012146, False, 2.0]], dtype=object)
Got:
    array([[-1.3661248683929443, True, 1.0],
           [-0.5633015632629395, False, 2.0]], dtype=object)

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

-> Most of them occur because of missing extended summaries. Functions and properties I added docstrings are fairly well explained without extended summaries I think.

Some of the examples are failed because I used random functions for several examples. This makes different results in each execution. Also, I omitted some outputs because of its simplicity and clearness.

Lastly, I left errors already occurred in the previous version without changes.

TomAugspurger · 2018-03-10T12:26:53Z

pandas/core/frame.py

@@ -2432,7 +2432,8 @@ def eval(self, expr, inplace=False, **kwargs):
        return _eval(expr, inplace=inplace, **kwargs)

    def select_dtypes(self, include=None, exclude=None):
-        """Return a subset of a DataFrame including/excluding columns based on
+        """
+        Return a subset of a DataFrame including/excluding columns based on


Let's shorten this a bit so that it fits on a line. How about

Return a subset of the DataFrame based on the column dtypes.

Thanks for your comments. I've got one question about docstring guideline.

I looked through several comments left on other PRs and found this type of docstring is correct.

"""This type of docstring.

I ran the script mentioned in the guideline but it said docstring should be start with new line like this.

""" This type of docstring.

Which one is correct one?

Thanks.

It's the second one, so on new line (we changed that in the docstring guie rather late before the sprint)

@jorisvandenbossche Thanks. I've made docstrings based on the second one and changed things you and other contributors pointed out

jorisvandenbossche · 2018-03-10T12:33:00Z

Can you try to fix the failing examples?

TomAugspurger

Thansk! FYI, best to do one docstring at a time. Makes it easier to review.

TomAugspurger · 2018-03-10T12:28:23Z

pandas/core/generic.py

@@ -4232,7 +4232,8 @@ def as_matrix(self, columns=None):

    @property
    def values(self):
-        """Numpy representation of NDFrame
+        """
+        Return NDFrame as ndarray or ndarray-like depending on the dtype.


I think #20065 was working on .values. That has more comments so maybe remove your changes here.

TomAugspurger · 2018-03-10T12:29:42Z

pandas/core/generic.py

@@ -4260,16 +4271,76 @@ def _get_values(self):
        return self.values

    def get_values(self):
-        """same as values (but handles sparseness conversions)"""
+        """
+        Same as values (but handles sparseness conversions).


Let's avoid reference to .values in the first line.

Return a NumPy representation of the data after converting sparse to dense.

And then we can use the See Also to mention .values

TomAugspurger · 2018-03-10T12:29:55Z

pandas/core/generic.py

+
+        Returns
+        -------
+        numpy.ndaray


ndarray (two rs)

TomAugspurger · 2018-03-10T12:30:38Z

pandas/core/generic.py

+        ...                    'b': [True, False], 'c': [1.0, 2.0]})
+        >>> df.get_values()
+        array([[0.25209328532218933, True, 1.0],
+        [0.35383567214012146, False, 2.0]], dtype=object)


I don't think this formatting is quite right. Can you try using non-random data and running the doctest on it?

TomAugspurger · 2018-03-10T12:32:02Z

pandas/core/generic.py

+
+        Returns
+        -------
+        dtype   Number of dtype


colon between parameter name and the type.

Also I thikn it should be

`dtype : Series Series with the count of columns with each dtype

TomAugspurger · 2018-03-10T12:32:44Z

pandas/core/generic.py

+        --------
+        >>> a = [['a', 1, 1.0], ['b', 2, 2.0], ['c', 3, 3.0]]
+        >>> df = pd.DataFrame(a, columns=['str', 'int', 'float'])
+        >>> df['int'].astype(int)


If you make df as from a dictionary the types should be correctly inferred and you won't need astypes

TomAugspurger · 2018-03-10T12:32:52Z

pandas/core/generic.py

+
+        Returns
+        -------
+        dtype   Number of dtype:dense|sparse


colon. Same comment as above.

jorisvandenbossche · 2018-03-10T12:43:31Z

pandas/core/generic.py

+        Returns
+        -------
+        numpy.ndaray
+            Numpy representation of NDFrame


We should not mention "NDFrame" in public docstrings (see the output of the validation script). In this case, this can just be "DataFrame" (because Series has its own implementation and docstring)

jreback · 2018-03-10T14:48:11Z

This will also need a rebase on master

codecov · 2018-03-11T08:50:58Z

Codecov Report

❗ No coverage uploaded for pull request base (master@fb556ed). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master   #20099   +/-   ##
=========================================
  Coverage          ?    91.7%           
=========================================
  Files             ?      150           
  Lines             ?    49149           
  Branches          ?        0           
=========================================
  Hits              ?    45071           
  Misses            ?     4078           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.08% <ø> (?)`
#single	`41.85% <ø> (?)`

Impacted Files	Coverage Δ
pandas/core/frame.py	`97.18% <ø> (ø)`
pandas/core/generic.py	`95.84% <ø> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fb556ed...61851b2. Read the comment docs.

TomAugspurger · 2018-03-13T14:44:49Z

Thanks @coffeedjimmy !

TomAugspurger reviewed Mar 10, 2018

View reviewed changes

jreback added the Docs label Mar 10, 2018

jreback mentioned this pull request Mar 10, 2018

DOC: update the dtypes/ftypes docstring (Seoul) #20100

Merged

5 tasks

jorisvandenbossche reviewed Mar 10, 2018

View reviewed changes

coffeedjimmy force-pushed the master branch from d146e1a to 4f0c437 Compare March 11, 2018 08:50

coffeedjimmy force-pushed the master branch 4 times, most recently from ad2923c to 9ad55ec Compare March 11, 2018 09:32

revise docstring for several functions and properties

60e3dc2

coffeedjimmy force-pushed the master branch from 9ad55ec to 60e3dc2 Compare March 11, 2018 09:34

Updats

61851b2

TomAugspurger added this to the 0.23.0 milestone Mar 13, 2018

TomAugspurger merged commit 71e42a8 into pandas-dev:master Mar 13, 2018

Uh oh!

DOC: update the docstring for several functions and properties (Seoul). #20099

DOC: update the docstring for several functions and properties (Seoul). #20099

Uh oh!

Conversation

coffeedjimmy commented Mar 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomAugspurger Mar 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coffeedjimmy Mar 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Mar 10, 2018

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Mar 10, 2018

Uh oh!

codecov bot commented Mar 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TomAugspurger commented Mar 13, 2018

Uh oh!

Uh oh!

coffeedjimmy commented Mar 10, 2018 •

edited

Loading

TomAugspurger Mar 10, 2018 •

edited

Loading

coffeedjimmy Mar 11, 2018 •

edited

Loading

codecov bot commented Mar 11, 2018 •

edited

Loading