Skip to content

DOC: update pandas.core.resample.Resampler.nearest docstring #20381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 20, 2018

Conversation

thicorfon
Copy link
Contributor

@thicorfon thicorfon commented Mar 16, 2018

Checklist for the pandas documentation sprint:

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py pandas.core.resample.Resampler.nearest
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single pandas.core.resample.Resampler.nearest
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
############## Docstring (pandas.core.resample.Resampler.nearest) ##############
################################################################################

Fill the new missing values with their nearest neighbor value, based
on index.

When resampling data, missing values may appear (e.g., when the
resampling frequency is higher than the original frequency).
The nearest fill will replace ``NaN`` values that appeared in
the resampled data with the value from the nearest member of the
sequence, based on the index value.
Missing values that existed in the original data will not be modified.
If `limit` is given, fill only `limit` values in each direction for
each of the original values.

Parameters
----------
limit : integer, optional
    Limit of how many values to fill.

    .. versionadded:: 0.21.0

Returns
-------
Series, DataFrame
    An upsampled Series or DataFrame with ``NaN`` values filled with
    their closest neighbor value.

See Also
--------
backfill: Backward fill the new missing values in the resampled data.
fillna : Fill ``NaN`` values using the specified method, which can be
    'backfill'.
pad : Forward fill ``NaN`` values.
pandas.Series.fillna : Fill ``NaN`` values in the Series using the
    specified method, which can be 'backfill'.
pandas.DataFrame.fillna : Fill ``NaN`` values in the DataFrame using
    the specified method, which can be 'backfill'.

Examples
--------

Resampling a Series:

>>> s = pd.Series([1, 2, 3],
...               index=pd.date_range('20180101', periods=3,
...                                   freq='1h'))
>>> s
2018-01-01 00:00:00    1
2018-01-01 01:00:00    2
2018-01-01 02:00:00    3
Freq: H, dtype: int64

>>> s.resample('20min').nearest()
2018-01-01 00:00:00    1
2018-01-01 00:20:00    1
2018-01-01 00:40:00    2
2018-01-01 01:00:00    2
2018-01-01 01:20:00    2
2018-01-01 01:40:00    3
2018-01-01 02:00:00    3
Freq: 20T, dtype: int64

Resample in the middle:

>>> s.resample('30min').nearest()
2018-01-01 00:00:00    1
2018-01-01 00:30:00    2
2018-01-01 01:00:00    2
2018-01-01 01:30:00    3
2018-01-01 02:00:00    3
Freq: 30T, dtype: int64

Limited fill:

>>> s.resample('10min').nearest(limit=1)
2018-01-01 00:00:00    1.0
2018-01-01 00:10:00    1.0
2018-01-01 00:20:00    NaN
2018-01-01 00:30:00    NaN
2018-01-01 00:40:00    NaN
2018-01-01 00:50:00    2.0
2018-01-01 01:00:00    2.0
2018-01-01 01:10:00    2.0
2018-01-01 01:20:00    NaN
2018-01-01 01:30:00    NaN
2018-01-01 01:40:00    NaN
2018-01-01 01:50:00    3.0
2018-01-01 02:00:00    3.0
Freq: 10T, dtype: float64

Resampling a DataFrame that has missing values:

>>> df = pd.DataFrame({'a': [2, np.nan, 6], 'b': [1, 3, 5]},
...                   index=pd.date_range('20180101', periods=3,
...                                       freq='h'))
>>> df
                       a  b
2018-01-01 00:00:00  2.0  1
2018-01-01 01:00:00  NaN  3
2018-01-01 02:00:00  6.0  5

>>> df.resample('20min').nearest()
                       a  b
2018-01-01 00:00:00  2.0  1
2018-01-01 00:20:00  2.0  1
2018-01-01 00:40:00  NaN  3
2018-01-01 01:00:00  NaN  3
2018-01-01 01:20:00  NaN  3
2018-01-01 01:40:00  6.0  5
2018-01-01 02:00:00  6.0  5

Resampling a DataFrame with shuffled indexes:

>>> df = pd.DataFrame({'a': [2, 6, 4]},
...                   index=pd.date_range('20180101', periods=3,
...                                       freq='h'))
>>> df
                     a
2018-01-01 00:00:00  2
2018-01-01 01:00:00  6
2018-01-01 02:00:00  4

>>> sorted_df = df.sort_values(by=['a'])
>>> sorted_df
                     a
2018-01-01 00:00:00  2
2018-01-01 02:00:00  4
2018-01-01 01:00:00  6

>>> sorted_df.resample('20min').nearest()
                     a
2018-01-01 00:00:00  2
2018-01-01 00:20:00  2
2018-01-01 00:40:00  6
2018-01-01 01:00:00  6
2018-01-01 01:20:00  6
2018-01-01 01:40:00  4
2018-01-01 02:00:00  4

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameter "limit" description should finish with "."

The error is caused by .. versionadded:: 0.21.0, which has no period in the end.

@@ -498,23 +498,142 @@ def pad(self, limit=None):

def nearest(self, limit=None):
"""
Fill values with nearest neighbor starting from center
Fill the new missing values with their nearest neighbor value, based
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should fit on a single line. Can you try rephrasing?

2018-01-01 02:00:00 3
Freq: 20T, dtype: int64

Resample in the middle:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "in the middle?"


Limited fill:

>>> s.resample('10min').nearest(limit=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tad long. Can you change it to '20min'. I think that'll still make the point as the first will be filled and the second will be NaN

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@codecov
Copy link

codecov bot commented Mar 16, 2018

Codecov Report

Merging #20381 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20381      +/-   ##
==========================================
+ Coverage   92.22%   92.23%   +<.01%     
==========================================
  Files         161      161              
  Lines       51187    51197      +10     
==========================================
+ Hits        47209    47220      +11     
+ Misses       3978     3977       -1
Flag Coverage Δ
#multiple 90.61% <ø> (ø) ⬆️
#single 42.27% <ø> (+0.01%) ⬆️
Impacted Files Coverage Δ
pandas/core/resample.py 96.99% <ø> (ø) ⬆️
pandas/core/arrays/timedeltas.py 93.75% <0%> (-0.56%) ⬇️
pandas/core/generic.py 96.81% <0%> (ø) ⬆️
pandas/core/series.py 93.87% <0%> (ø) ⬆️
pandas/core/indexes/timedeltas.py 90.74% <0%> (+0.12%) ⬆️
pandas/core/indexes/frozen.py 92.1% <0%> (+0.43%) ⬆️
pandas/core/arrays/period.py 98.08% <0%> (+0.59%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4f71755...3ab559a. Read the comment docs.

2018-01-01 01:40:00 6.0 5
2018-01-01 02:00:00 6.0 5

Resampling a DataFrame with shuffled indexes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not fully sure this example is needed, as it is something general to resample, and not specific to the nearest method

2018-01-01 02:00:00 3.0
Freq: 10T, dtype: float64

Resampling a DataFrame that has missing values:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add as small explanation about that the initial NaN is preserved

@datapythonista datapythonista self-assigned this Jul 22, 2018
@jreback
Copy link
Contributor

jreback commented Nov 1, 2018

@datapythonista

@pep8speaks
Copy link

Hello @thicorfon! Thanks for updating the PR.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased, added couple of minor fixes, and made the examples shorter.

@jbrockmendel if you don't mind taking a look here too and merging on green if looks good. I'm trying to close all pending PRs from the docsting. I'm happy if what we merge if correct, future improvements can be addressed later in separate PRs. Thanks!


When resampling data, missing values may appear (e.g., when the
resampling frequency is higher than the original frequency).
The nearest fill will replace ``NaN`` values that appeared in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"nearest fill" refers to this method, right? Maybe make that explicit with "nearest fill method"?

the resampled data with the value from the nearest member of the
sequence, based on the index value.
Missing values that existed in the original data will not be modified.
If `limit` is given, fill only `limit` values in each direction for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the second "limit" could be "this many"? Probably worthwhile to get a non-native speaker to weight in what is clearest.


.. versionadded:: 0.21.0

Returns
-------
an upsampled Series
Series or DataFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista do we have a convention for saying same-type-as-input?

pad : Forward fill ``NaN`` values.
pandas.Series.fillna : Fill ``NaN`` values in the Series using the
specified method, which can be 'backfill'.
pandas.DataFrame.fillna : Fill ``NaN`` values in the DataFrame using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thoroughness is good, but this seems excessive. @datapythonista what's the convention for how much to write in this section?

@datapythonista
Copy link
Member

Thanks for another great review @jbrockmendel. Made the changes, I think this should be ready to be merged on green.

For the "same type as caller", we try to use only Python types in the types of the return and parameters. Ideally at some point we can parse those types and detect suspicious things or get statistics. In generic methods, so far we're using Series or DataFrame. At a later stage it'd be nice to write the corresponding class, but that requires a bit of thinking on how to do, and will come in a separate PR.

@datapythonista
Copy link
Member

@jreback this should be ready to merge, if you want to have a look

@jreback jreback added this to the 0.24.0 milestone Nov 20, 2018
@jreback jreback merged commit 1520047 into pandas-dev:master Nov 20, 2018
@jreback
Copy link
Contributor

jreback commented Nov 20, 2018

thanks @thicorfon

thoo added a commit to thoo/pandas that referenced this pull request Nov 20, 2018
…fixed

* upstream/master:
  DOC: Removing rpy2 dependencies, and converting examples using it to regular code blocks (pandas-dev#23737)
  BUG: Fix dtype=str converts NaN to 'n' (pandas-dev#22564)
  DOC: update pandas.core.resample.Resampler.nearest docstring (pandas-dev#20381)
  REF/TST: Add more pytest idiom to parsers tests (pandas-dev#23810)
  Added support for Fraction and Number (PEP 3141) to pandas.api.types.is_scalar (pandas-dev#22952)
  DOC: Updating to_timedelta docstring (pandas-dev#23259)
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants