Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: update the docstring for several functions and properties (Seoul). #20099

Merged
merged 2 commits into from
Mar 13, 2018

Conversation

coffeedjimmy
Copy link
Contributor

@coffeedjimmy coffeedjimmy commented Mar 10, 2018

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

`pd.DataFrame.get_dtype_counts`

################################################################################
################ Docstring (pandas.DataFrame.get_dtype_counts)  ################
################################################################################

Return counts of unique dtypes in this object.

Returns
-------
dtype   Number of dtype

See Also
--------
dtypes : Return the dtypes in this object.

Examples
--------
>>> a = [['a', 1, 1.0], ['b', 2, 2.0], ['c', 3, 3.0]]
>>> df = pd.DataFrame(a, columns=['str', 'int', 'float'])
>>> df['int'].astype(int)
>>> df['float'].astype(float)
>>> df.get_dtype_counts()
float64    1
int64      1
object     1
dtype: int64

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	No extended summary found
	Examples do not pass tests

################################################################################
################################### Doctests ###################################
################################################################################

**********************************************************************
Line 16, in pandas.DataFrame.get_dtype_counts
Failed example:
    df['int'].astype(int)
Expected nothing
Got:
    0    1
    1    2
    2    3
    Name: int, dtype: int64
**********************************************************************
Line 17, in pandas.DataFrame.get_dtype_counts
Failed example:
    df['float'].astype(float)
Expected nothing
Got:
    0    1.0
    1    2.0
    2    3.0
    Name: float, dtype: float64

`pd.DataFrame.get_ftype_counts`

################################################################################
################ Docstring (pandas.DataFrame.get_ftype_counts)  ################
################################################################################

Return counts of unique ftypes in this object.

Returns
-------
dtype   Number of dtype:dense|sparse

See Also
--------
ftypes : Return
         ftypes (indication of sparse/dense and dtype) in this object.

Examples
--------
>>> a = [['a', 1, 1.0], ['b', 2, 2.0], ['c', 3, 3.0]]
>>> df = pd.DataFrame(a, columns=['str', 'int', 'float'])
>>> df['int'].astype(int)
>>> df['float'].astype(float)
>>> df.get_dtype_counts()
float64:dense    1
int64:dense      1
object:dense     1
dtype: int64

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	No extended summary found
	Examples do not pass tests

################################################################################
################################### Doctests ###################################
################################################################################

**********************************************************************
Line 17, in pandas.DataFrame.get_ftype_counts
Failed example:
    df['int'].astype(int)
Expected nothing
Got:
    0    1
    1    2
    2    3
    Name: int, dtype: int64
**********************************************************************
Line 18, in pandas.DataFrame.get_ftype_counts
Failed example:
    df['float'].astype(float)
Expected nothing
Got:
    0    1.0
    1    2.0
    2    3.0
    Name: float, dtype: float64
**********************************************************************
Line 19, in pandas.DataFrame.get_ftype_counts
Failed example:
    df.get_dtype_counts()
Expected:
    float64:dense    1
    int64:dense      1
    object:dense     1
    dtype: int64
Got:
    float64    1
    int64      1
    object     1
    dtype: int64

`pd.DataFrame.select_dtypes`

################################################################################
################## Docstring (pandas.DataFrame.select_dtypes) ##################
################################################################################

Return a subset of a DataFrame including/excluding columns based on
their ``dtype``.

Parameters
----------
include, exclude : scalar or list-like
    A selection of dtypes or strings to be included/excluded. At least
    one of these parameters must be supplied.

Raises
------
ValueError
    * If both of ``include`` and ``exclude`` are empty
    * If ``include`` and ``exclude`` have overlapping elements
    * If any kind of string dtype is passed in.

Returns
-------
subset : DataFrame
    The subset of the frame including the dtypes in ``include`` and
    excluding the dtypes in ``exclude``.

Notes
-----
* To select all *numeric* types, use ``np.number`` or ``'number'``
* To select strings you must use the ``object`` dtype, but note that
  this will return *all* object dtype columns
* See the `numpy dtype hierarchy
  <http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html>`__
* To select datetimes, use ``np.datetime64``, ``'datetime'`` or
  ``'datetime64'``
* To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or
  ``'timedelta64'``
* To select Pandas categorical dtypes, use ``'category'``
* To select Pandas datetimetz dtypes, use ``'datetimetz'`` (new in
  0.20.0) or ``'datetime64[ns, tz]'``

Examples
--------
>>> df = pd.DataFrame({'a': np.random.randn(6).astype('f4'),
...                    'b': [True, False] * 3,
...                    'c': [1.0, 2.0] * 3})
>>> df
        a      b  c
0  0.3962   True  1.0
1  0.1459  False  2.0
2  0.2623   True  1.0
3  0.0764  False  2.0
4 -0.9703   True  1.0
5 -1.2094  False  2.0
>>> df.select_dtypes(include='bool')
   b
0  True
1  False
2  True
3  False
4  True
5  False
>>> df.select_dtypes(include=['float64'])
   c
0  1.0
1  2.0
2  1.0
3  2.0
4  1.0
5  2.0
>>> df.select_dtypes(exclude=['floating'])
       b
0   True
1  False
2   True
3  False
4   True
5  False

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	No summary found (a short summary in a single line should be present at the beginning of the docstring)
	Errors in parameters section
		Parameters {'exclude', 'include'} not documented
		Unknown parameters {'include, exclude'}
	See Also section not found
	Examples do not pass tests

################################################################################
################################### Doctests ###################################
################################################################################

**********************************************************************
Line 44, in pandas.DataFrame.select_dtypes
Failed example:
    df
Expected:
            a      b  c
    0  0.3962   True  1.0
    1  0.1459  False  2.0
    2  0.2623   True  1.0
    3  0.0764  False  2.0
    4 -0.9703   True  1.0
    5 -1.2094  False  2.0
Got:
              a      b    c
    0  1.941085   True  1.0
    1  1.050210  False  2.0
    2  1.936395   True  1.0
    3 -1.503260  False  2.0
    4 -0.155825   True  1.0
    5  0.852338  False  2.0

`pd.DataFrame.values`

################################################################################
##################### Docstring (pandas.DataFrame.values)  #####################
################################################################################

Return NDFrame as ndarray or ndarray-like depending on the dtype.

Notes
-----
The dtype will be a lower-common-denominator dtype (implicit
upcasting); that is to say if the dtypes (even of numeric types)
are mixed, the one that accommodates all will be chosen. Use this
with care if you are not dealing with the blocks.

e.g. If the dtypes are float16 and float32, dtype will be upcast to
float32.  If dtypes are int32 and uint8, dtype will be upcast to
int32. By numpy.find_common_type convention, mixing int64 and uint64
will result in a flot64 dtype.

Examples
--------
>>> df = pd.DataFrame({'a': np.random.randn(2).astype('f4'),
...                    'b': [True, False], 'c': [1.0, 2.0]})
>>> type(df.values)
<class 'numpy.ndarray'>
>>> df.values
array([[0.25209328532218933, True, 1.0],
[0.35383567214012146, False, 2.0]], dtype=object)
################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	No extended summary found
	No returns section found
	Private classes (['NDFrame']) should not be mentioned in public docstring.
	See Also section not found
	Examples do not pass tests

################################################################################
################################### Doctests ###################################
################################################################################

**********************************************************************
Line 22, in pandas.DataFrame.values
Failed example:
    df.values
Expected:
    array([[0.25209328532218933, True, 1.0],
    [0.35383567214012146, False, 2.0]], dtype=object)
Got:
    array([[-0.8504104018211365, True, 1.0],
           [-0.9855750203132629, False, 2.0]], dtype=object)

`pd.DataFrame.get_values`

################################################################################
################### Docstring (pandas.DataFrame.get_values)  ###################
################################################################################

Same as values (but handles sparseness conversions).

Returns
-------
numpy.ndaray
    Numpy representation of NDFrame

Examples
--------
>>> df = pd.DataFrame({'a': np.random.randn(2).astype('f4'),
...                    'b': [True, False], 'c': [1.0, 2.0]})
>>> df.get_values()
array([[0.25209328532218933, True, 1.0],
[0.35383567214012146, False, 2.0]], dtype=object)
################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	No extended summary found
	Private classes (['NDFrame']) should not be mentioned in public docstring.
	See Also section not found
	Examples do not pass tests

################################################################################
################################### Doctests ###################################
################################################################################

**********************************************************************
Line 13, in pandas.DataFrame.get_values
Failed example:
    df.get_values()
Expected:
    array([[0.25209328532218933, True, 1.0],
    [0.35383567214012146, False, 2.0]], dtype=object)
Got:
    array([[-1.3661248683929443, True, 1.0],
           [-0.5633015632629395, False, 2.0]], dtype=object)

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

-> Most of them occur because of missing extended summaries. Functions and properties I added docstrings are fairly well explained without extended summaries I think.

Some of the examples are failed because I used random functions for several examples. This makes different results in each execution. Also, I omitted some outputs because of its simplicity and clearness.

Lastly, I left errors already occurred in the previous version without changes.

@@ -2432,7 +2432,8 @@ def eval(self, expr, inplace=False, **kwargs):
return _eval(expr, inplace=inplace, **kwargs)

def select_dtypes(self, include=None, exclude=None):
"""Return a subset of a DataFrame including/excluding columns based on
"""
Return a subset of a DataFrame including/excluding columns based on
Copy link
Contributor

@TomAugspurger TomAugspurger Mar 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's shorten this a bit so that it fits on a line. How about

Return a subset of the DataFrame based on the column dtypes.

Copy link
Contributor Author

@coffeedjimmy coffeedjimmy Mar 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments. I've got one question about docstring guideline.

I looked through several comments left on other PRs and found this type of docstring is correct.

"""This type of docstring.

I ran the script mentioned in the guideline but it said docstring should be start with new line like this.

"""
This type of docstring.

Which one is correct one?

Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the second one, so on new line (we changed that in the docstring guie rather late before the sprint)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche Thanks. I've made docstrings based on the second one and changed things you and other contributors pointed out

@jorisvandenbossche
Copy link
Member

Can you try to fix the failing examples?

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thansk! FYI, best to do one docstring at a time. Makes it easier to review.

@@ -4232,7 +4232,8 @@ def as_matrix(self, columns=None):

@property
def values(self):
"""Numpy representation of NDFrame
"""
Return NDFrame as ndarray or ndarray-like depending on the dtype.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #20065 was working on .values. That has more comments so maybe remove your changes here.

@@ -4260,16 +4271,76 @@ def _get_values(self):
return self.values

def get_values(self):
"""same as values (but handles sparseness conversions)"""
"""
Same as values (but handles sparseness conversions).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid reference to .values in the first line.

Return a NumPy representation of the data after converting sparse to dense.

And then we can use the See Also to mention .values


Returns
-------
numpy.ndaray
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ndarray (two rs)

... 'b': [True, False], 'c': [1.0, 2.0]})
>>> df.get_values()
array([[0.25209328532218933, True, 1.0],
[0.35383567214012146, False, 2.0]], dtype=object)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this formatting is quite right. Can you try using non-random data and running the doctest on it?


Returns
-------
dtype Number of dtype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

colon between parameter name and the type.

Also I thikn it should be

`dtype : Series
    Series with the count of columns with each dtype

--------
>>> a = [['a', 1, 1.0], ['b', 2, 2.0], ['c', 3, 3.0]]
>>> df = pd.DataFrame(a, columns=['str', 'int', 'float'])
>>> df['int'].astype(int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make df as from a dictionary the types should be correctly inferred and you won't need astypes


Returns
-------
dtype Number of dtype:dense|sparse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

colon. Same comment as above.

Returns
-------
numpy.ndaray
Numpy representation of NDFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not mention "NDFrame" in public docstrings (see the output of the validation script). In this case, this can just be "DataFrame" (because Series has its own implementation and docstring)

@jreback
Copy link
Contributor

jreback commented Mar 10, 2018

This will also need a rebase on master

@codecov
Copy link

codecov bot commented Mar 11, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@fb556ed). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #20099   +/-   ##
=========================================
  Coverage          ?    91.7%           
=========================================
  Files             ?      150           
  Lines             ?    49149           
  Branches          ?        0           
=========================================
  Hits              ?    45071           
  Misses            ?     4078           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.08% <ø> (?)
#single 41.85% <ø> (?)
Impacted Files Coverage Δ
pandas/core/frame.py 97.18% <ø> (ø)
pandas/core/generic.py 95.84% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fb556ed...61851b2. Read the comment docs.

@coffeedjimmy coffeedjimmy force-pushed the master branch 4 times, most recently from ad2923c to 9ad55ec Compare March 11, 2018 09:32
@TomAugspurger TomAugspurger added this to the 0.23.0 milestone Mar 13, 2018
@TomAugspurger TomAugspurger merged commit 71e42a8 into pandas-dev:master Mar 13, 2018
@TomAugspurger
Copy link
Contributor

Thanks @coffeedjimmy !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants