DOC: update pandas.core.resample.Resampler.nearest docstring #20381

thicorfon · 2018-03-16T15:18:01Z

Checklist for the pandas documentation sprint:

PR title is "DOC: update the docstring"
The validation script passes: scripts/validate_docstrings.py pandas.core.resample.Resampler.nearest
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single pandas.core.resample.Resampler.nearest
It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
############## Docstring (pandas.core.resample.Resampler.nearest) ##############
################################################################################

Fill the new missing values with their nearest neighbor value, based
on index.

When resampling data, missing values may appear (e.g., when the
resampling frequency is higher than the original frequency).
The nearest fill will replace ``NaN`` values that appeared in
the resampled data with the value from the nearest member of the
sequence, based on the index value.
Missing values that existed in the original data will not be modified.
If `limit` is given, fill only `limit` values in each direction for
each of the original values.

Parameters
----------
limit : integer, optional
    Limit of how many values to fill.

    .. versionadded:: 0.21.0

Returns
-------
Series, DataFrame
    An upsampled Series or DataFrame with ``NaN`` values filled with
    their closest neighbor value.

See Also
--------
backfill: Backward fill the new missing values in the resampled data.
fillna : Fill ``NaN`` values using the specified method, which can be
    'backfill'.
pad : Forward fill ``NaN`` values.
pandas.Series.fillna : Fill ``NaN`` values in the Series using the
    specified method, which can be 'backfill'.
pandas.DataFrame.fillna : Fill ``NaN`` values in the DataFrame using
    the specified method, which can be 'backfill'.

Examples
--------

Resampling a Series:

>>> s = pd.Series([1, 2, 3],
...               index=pd.date_range('20180101', periods=3,
...                                   freq='1h'))
>>> s
2018-01-01 00:00:00    1
2018-01-01 01:00:00    2
2018-01-01 02:00:00    3
Freq: H, dtype: int64

>>> s.resample('20min').nearest()
2018-01-01 00:00:00    1
2018-01-01 00:20:00    1
2018-01-01 00:40:00    2
2018-01-01 01:00:00    2
2018-01-01 01:20:00    2
2018-01-01 01:40:00    3
2018-01-01 02:00:00    3
Freq: 20T, dtype: int64

Resample in the middle:

>>> s.resample('30min').nearest()
2018-01-01 00:00:00    1
2018-01-01 00:30:00    2
2018-01-01 01:00:00    2
2018-01-01 01:30:00    3
2018-01-01 02:00:00    3
Freq: 30T, dtype: int64

Limited fill:

>>> s.resample('10min').nearest(limit=1)
2018-01-01 00:00:00    1.0
2018-01-01 00:10:00    1.0
2018-01-01 00:20:00    NaN
2018-01-01 00:30:00    NaN
2018-01-01 00:40:00    NaN
2018-01-01 00:50:00    2.0
2018-01-01 01:00:00    2.0
2018-01-01 01:10:00    2.0
2018-01-01 01:20:00    NaN
2018-01-01 01:30:00    NaN
2018-01-01 01:40:00    NaN
2018-01-01 01:50:00    3.0
2018-01-01 02:00:00    3.0
Freq: 10T, dtype: float64

Resampling a DataFrame that has missing values:

>>> df = pd.DataFrame({'a': [2, np.nan, 6], 'b': [1, 3, 5]},
...                   index=pd.date_range('20180101', periods=3,
...                                       freq='h'))
>>> df
                       a  b
2018-01-01 00:00:00  2.0  1
2018-01-01 01:00:00  NaN  3
2018-01-01 02:00:00  6.0  5

>>> df.resample('20min').nearest()
                       a  b
2018-01-01 00:00:00  2.0  1
2018-01-01 00:20:00  2.0  1
2018-01-01 00:40:00  NaN  3
2018-01-01 01:00:00  NaN  3
2018-01-01 01:20:00  NaN  3
2018-01-01 01:40:00  6.0  5
2018-01-01 02:00:00  6.0  5

Resampling a DataFrame with shuffled indexes:

>>> df = pd.DataFrame({'a': [2, 6, 4]},
...                   index=pd.date_range('20180101', periods=3,
...                                       freq='h'))
>>> df
                     a
2018-01-01 00:00:00  2
2018-01-01 01:00:00  6
2018-01-01 02:00:00  4

>>> sorted_df = df.sort_values(by=['a'])
>>> sorted_df
                     a
2018-01-01 00:00:00  2
2018-01-01 02:00:00  4
2018-01-01 01:00:00  6

>>> sorted_df.resample('20min').nearest()
                     a
2018-01-01 00:00:00  2
2018-01-01 00:20:00  2
2018-01-01 00:40:00  6
2018-01-01 01:00:00  6
2018-01-01 01:20:00  6
2018-01-01 01:40:00  4
2018-01-01 02:00:00  4

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameter "limit" description should finish with "."

The error is caused by .. versionadded:: 0.21.0, which has no period in the end.

TomAugspurger · 2018-03-16T15:55:01Z

pandas/core/resample.py

@@ -498,23 +498,142 @@ def pad(self, limit=None):

    def nearest(self, limit=None):
        """
-        Fill values with nearest neighbor starting from center
+        Fill the new missing values with their nearest neighbor value, based


This should fit on a single line. Can you try rephrasing?

TomAugspurger · 2018-03-16T15:57:40Z

pandas/core/resample.py

+        2018-01-01 02:00:00    3
+        Freq: 20T, dtype: int64
+
+        Resample in the middle:


What do you mean by "in the middle?"

TomAugspurger · 2018-03-16T15:58:34Z

pandas/core/resample.py

+
+        Limited fill:
+
+        >>> s.resample('10min').nearest(limit=1)


This is a tad long. Can you change it to '20min'. I think that'll still make the point as the first will be filled and the second will be NaN

codecov · 2018-03-16T20:45:23Z

Codecov Report

Merging #20381 into master will increase coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #20381      +/-   ##
==========================================
+ Coverage   92.22%   92.23%   +<.01%     
==========================================
  Files         161      161              
  Lines       51187    51197      +10     
==========================================
+ Hits        47209    47220      +11     
+ Misses       3978     3977       -1

Flag	Coverage Δ
#multiple	`90.61% <ø> (ø)`	⬆️
#single	`42.27% <ø> (+0.01%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/resample.py	`96.99% <ø> (ø)`	⬆️
pandas/core/arrays/timedeltas.py	`93.75% <0%> (-0.56%)`	⬇️
pandas/core/generic.py	`96.81% <0%> (ø)`	⬆️
pandas/core/series.py	`93.87% <0%> (ø)`	⬆️
pandas/core/indexes/timedeltas.py	`90.74% <0%> (+0.12%)`	⬆️
pandas/core/indexes/frozen.py	`92.1% <0%> (+0.43%)`	⬆️
pandas/core/arrays/period.py	`98.08% <0%> (+0.59%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4f71755...3ab559a. Read the comment docs.

jorisvandenbossche · 2018-03-17T10:11:17Z

pandas/core/resample.py

+        2018-01-01 01:40:00  6.0  5
+        2018-01-01 02:00:00  6.0  5
+
+        Resampling a DataFrame with shuffled indexes:


I am not fully sure this example is needed, as it is something general to resample, and not specific to the nearest method

jorisvandenbossche · 2018-03-17T10:11:52Z

pandas/core/resample.py

+        2018-01-01 02:00:00    3.0
+        Freq: 10T, dtype: float64
+
+        Resampling a DataFrame that has missing values:


I would add as small explanation about that the initial NaN is preserved

jreback · 2018-11-01T01:41:48Z

@datapythonista

pep8speaks · 2018-11-03T06:16:46Z

Hello @thicorfon! Thanks for updating the PR.

There are no PEP8 issues in the file pandas/core/resample.py !

datapythonista

Rebased, added couple of minor fixes, and made the examples shorter.

@jbrockmendel if you don't mind taking a look here too and merging on green if looks good. I'm trying to close all pending PRs from the docsting. I'm happy if what we merge if correct, future improvements can be addressed later in separate PRs. Thanks!

jbrockmendel · 2018-11-03T22:00:41Z

pandas/core/resample.py

+
+        When resampling data, missing values may appear (e.g., when the
+        resampling frequency is higher than the original frequency).
+        The nearest fill will replace ``NaN`` values that appeared in


"nearest fill" refers to this method, right? Maybe make that explicit with "nearest fill method"?

jbrockmendel · 2018-11-03T22:01:51Z

pandas/core/resample.py

+        the resampled data with the value from the nearest member of the
+        sequence, based on the index value.
+        Missing values that existed in the original data will not be modified.
+        If `limit` is given, fill only `limit` values in each direction for


Maybe the second "limit" could be "this many"? Probably worthwhile to get a non-native speaker to weight in what is clearest.

jbrockmendel · 2018-11-03T22:02:39Z

pandas/core/resample.py


            .. versionadded:: 0.21.0

        Returns
        -------
-        an upsampled Series
+        Series or DataFrame


@datapythonista do we have a convention for saying same-type-as-input?

jbrockmendel · 2018-11-03T22:03:39Z

pandas/core/resample.py

+        pad : Forward fill ``NaN`` values.
+        pandas.Series.fillna : Fill ``NaN`` values in the Series using the
+            specified method, which can be 'backfill'.
+        pandas.DataFrame.fillna : Fill ``NaN`` values in the DataFrame using


The thoroughness is good, but this seems excessive. @datapythonista what's the convention for how much to write in this section?

…also section shorter

datapythonista · 2018-11-04T04:56:57Z

Thanks for another great review @jbrockmendel. Made the changes, I think this should be ready to be merged on green.

For the "same type as caller", we try to use only Python types in the types of the return and parameters. Ideally at some point we can parse those types and detect suspicious things or get statistics. In generic methods, so far we're using Series or DataFrame. At a later stage it'd be nice to write the corresponding class, but that requires a bit of thinking on how to do, and will come in a separate PR.

datapythonista · 2018-11-20T14:36:17Z

@jreback this should be ready to merge, if you want to have a look

jreback · 2018-11-20T15:21:14Z

thanks @thicorfon

…dev#20381)

DOC: Update pandas.core.resample.Resampler.nearest docstring

b4fa500

jorisvandenbossche added the Docs label Mar 16, 2018

TomAugspurger reviewed Mar 16, 2018

View reviewed changes

jorisvandenbossche reviewed Mar 17, 2018

View reviewed changes

datapythonista self-assigned this Jul 22, 2018

datapythonista added 2 commits November 3, 2018 06:10

Merge remote-tracking branch 'upstream/master' into nearest

5ec0228

Minor fixes an making examples shorter

14a3b3a

datapythonista approved these changes Nov 3, 2018

View reviewed changes

jbrockmendel reviewed Nov 3, 2018

View reviewed changes

Addressing comments from the review, and making the examples and see …

3ab559a

…also section shorter

jreback added this to the 0.24.0 milestone Nov 20, 2018

jreback merged commit 1520047 into pandas-dev:master Nov 20, 2018

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

DOC: update pandas.core.resample.Resampler.nearest docstring (pandas-…

07cc86a

…dev#20381)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

DOC: update pandas.core.resample.Resampler.nearest docstring (pandas-…

283ff61

…dev#20381)

Uh oh!

DOC: update pandas.core.resample.Resampler.nearest docstring #20381

DOC: update pandas.core.resample.Resampler.nearest docstring #20381

Uh oh!

Conversation

thicorfon commented Mar 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Nov 1, 2018

Uh oh!

pep8speaks commented Nov 3, 2018

Uh oh!

datapythonista left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

datapythonista commented Nov 4, 2018

Uh oh!

datapythonista commented Nov 20, 2018

Uh oh!

jreback commented Nov 20, 2018

Uh oh!

Uh oh!

thicorfon commented Mar 16, 2018 •

edited

Loading

codecov bot commented Mar 16, 2018 •

edited

Loading