Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #10174. Add 'interpolation' keyword in DataFrame.quantile and Series.quantile #10204

Closed

Conversation

mayankasthana
Copy link
Contributor

Closes #10174.
I have added the new argument to the doc too.

@shoyer
Copy link
Member

shoyer commented May 24, 2015

This need some tests.

@mayankasthana
Copy link
Contributor Author

I'll add some tests too.
Meanwhile I am trying to fix the build fail due to a stray \xe2 character.

@mayankasthana
Copy link
Contributor Author

The interpolation argument was added to np.percentile() only in numpy version 1.9.0. The tests don't pass where numpy version < 1.9.0. How to handle this?

@@ -4479,6 +4479,14 @@ def quantile(self, q=0.5, axis=0, numeric_only=True):
0 <= q <= 1, the quantile(s) to compute
axis : {0, 1}
0 for row-wise, 1 for column-wise
interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}
Specifies the interpolation method to use, when the desired quantile lies between two data points i and j:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you keep here to PEP8 line length?

Further, better to make a bullet points list of the options (newlines are ignored when rendered to html)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, you can just copy it from the source from the numpy.percentile function (using np.percentile?? in IPython):

interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}
    This optional parameter specifies the interpolation method to use,
    when the desired quantile lies between two data points `i` and `j`:
        * linear: `i + (j - i) * fraction`, where `fraction` is the
          fractional part of the index surrounded by `i` and `j`.
        * lower: `i`.
        * higher: `j`.
        * nearest: `i` or `j` whichever is nearest.
        * midpoint: (`i` + `j`) / 2.

    .. versionadded:: 1.9.0

@jorisvandenbossche
Copy link
Member

For the tests to handle the numpy version, you can do something like this:

if _np_version_under1p9:
    raise nose.SkipTest('numpy >= 1.9 required')

@mayankasthana
Copy link
Contributor Author

What should happen when numpy version is < 1.9.0?

  1. Should quantile() still work and ignore the interpolation argument? OR
  2. should it fail saying that pd > 1.9 is required?

if 1. , then I can put the if _np_version_under1p9: check inside quantile() and handle the different numpy versions transparently. We can throw a warning if the interpolation keyword is specified, but not being used (np < 1.9).

@shoyer
Copy link
Member

shoyer commented May 25, 2015

The default interpolation argument should work on all supported versions of NumPy.

For numpy < 1.9.0, it's OK not to support other interpolation options. In these cases, we should raise a ValueError with an informative error message. We should certainly not ignore the interpolation argument or merely raise a warning -- that could lead to unanticipated bugs.

@mayankasthana
Copy link
Contributor Author

Now the default interpolation argument works on all supported versions of NumPy.
And raised ValueError if an interpolation method other than linear is specified when numpy versions <1.9.

Will add tests soon.

@jreback jreback added API Design Numeric Operations Arithmetic, Comparison, and Logical operations labels May 26, 2015
@jreback jreback added this to the 0.17.0 milestone May 26, 2015
@mayankasthana
Copy link
Contributor Author

I added tests for Dataframe.quantile . This is my first time writing tests, so please verify if it is done right.
If someone verifies these tests, I'll write the tests for Series.quantile too.

from numpy import percentile
#interpolation = linear (default case)
q = self.tsframe.quantile(0.1, axis=0,interpolation='linear')
self.assertEqual(q['A'], percentile(self.tsframe['A'], 10))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also use np.percentile here, instead of importing percentile explicitely from numpy (np is already available)

@jorisvandenbossche
Copy link
Member

Tests look good! Most important things to be tested is that the keyword is passed to percentile correctly, that it generate an error for older numpy versions, and that the default is not changed. And those are included, so OK!

@mayankasthana
Copy link
Contributor Author

I added the explicit test that the result with interpolation='linear' and without specifying it is the same.
Also added the github issue number.

@jreback
Copy link
Contributor

jreback commented May 26, 2015

this is getting to be lots of duplicated code. I would prefer that this all be moved to core/generic.py, see #10207

@jreback
Copy link
Contributor

jreback commented Aug 5, 2015

can you rebase?

@mayankasthana
Copy link
Contributor Author

@jreback rebase done!

@jreback
Copy link
Contributor

jreback commented Aug 10, 2015

not really sure what you did. you should have 1 commit.

pls see contributing docs: http://pandas.pydata.org/pandas-docs/stable/contributing.html

@mayankasthana
Copy link
Contributor Author

I screwed up a lot. I fetched upstream and then rebased on that or something.
It's better I close this and create a new pull request following the contributing guide.
@jreback Is there any better way?

@jorisvandenbossche
Copy link
Member

@mayankasthana No, there is no need to close this and create a new PR.

Normally, doing this:

git fetch upstream
git rebase upstream/master
git push -f origin interpolationQuantile

should be all that is needed to clean this up

@mayankasthana mayankasthana force-pushed the interpolationQuantile branch 2 times, most recently from 175eafa to b365dcc Compare August 10, 2015 20:44
@mayankasthana
Copy link
Contributor Author

@jreback Thanks for the help with git. Here is the expected 1 commit.

@jreback
Copy link
Contributor

jreback commented Aug 10, 2015

looks pretty good. pls add a release note.

* lower: `i`.
* higher: `j`.
* nearest: `i` or `j` whichever is nearest.
* midpoint: (`i` + `j`) / 2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add here the line:

.. versionadded:: 0.17.0

(same indentation as 'This optional ...')

@mayankasthana
Copy link
Contributor Author

@jreback @jorisvandenbossche Where should I add the feature description in v0.17.0.txt?
Under Other API Changes or Other enhancements? And does it require a usage example?

@jorisvandenbossche
Copy link
Member

You can put it under 'Other enhancements'. A one line sentence explaining it is enough I think.

@mayankasthana
Copy link
Contributor Author

@jreback Rebased. Test is green.

@mayankasthana
Copy link
Contributor Author

@jreback Rebased. Test is green.

@jreback
Copy link
Contributor

jreback commented Nov 10, 2015

@mayankasthana I'd really like to see #10207 but that can be done after. can you rebase.

@mayankasthana
Copy link
Contributor Author

@jreback Rebased. Test is green.

@jreback
Copy link
Contributor

jreback commented Dec 6, 2015

can you rebase / update whats new to 0.18.0

@mayankasthana mayankasthana force-pushed the interpolationQuantile branch 2 times, most recently from 3f32fcf to 324d811 Compare December 10, 2015 20:18
@mayankasthana
Copy link
Contributor Author

@jreback Rebased and updated whats new to 0.18.0.

def multi(values, qs, interpolation):
if _np_version_under1p9:
if com.is_list_like(qs):
values = [_quantile(values, x*100) for x in qs]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just create a kwargs depending on the version and pass it to the functions, instead of duplicating all of this code

@mayankasthana mayankasthana force-pushed the interpolationQuantile branch 2 times, most recently from 2ff1ac5 to 7eeeb33 Compare December 16, 2015 20:11
#test with and without interpolation keyword
assert_series_equal(q,q1)

#interpolation method other than default linear
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this 2 tests, 1 for the version checking, and the other that would skip at the top if under verion 1.9

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I separated <np 1.9 tests and >1.9 tests. Is this what you meant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, make them 2 separately tests, skipping on each one respectively if the numpy version not what you need it to be. Its simpler / easier to read that way.

@jreback
Copy link
Contributor

jreback commented Dec 19, 2015

comments updated

@mayankasthana mayankasthana force-pushed the interpolationQuantile branch 2 times, most recently from 697ccae to 3cb69dd Compare December 20, 2015 22:03

#interpolation method other than default linear
if _np_version_under1p9:
expErrMsg = "Interpolation methods other than linear not supported in numpy < 1.9"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, make the numpy < 1.9 testsin a separate test

@jreback
Copy link
Contributor

jreback commented Jan 2, 2016

can you run: git diff master | flake8 --diff and fix anything. going to shortly be enforcing PEP8.

@mayankasthana mayankasthana force-pushed the interpolationQuantile branch from 217535f to f8e4b5a Compare January 6, 2016 19:49
@mayankasthana mayankasthana force-pushed the interpolationQuantile branch from f8e4b5a to 55836e6 Compare January 6, 2016 20:58
@mayankasthana
Copy link
Contributor Author

Created separate tests for different numpy versions.
Ran git diff master | flake8 --diff
Rebased.

@jreback jreback modified the milestones: 0.18.0, Next Major Release Jan 6, 2016
@jreback
Copy link
Contributor

jreback commented Jan 7, 2016

merged via e05f66a

thanks!

@jreback jreback closed this Jan 7, 2016
@mayankasthana
Copy link
Contributor Author

Thanks @jreback @shoyer @jorisvandenbossche

@mayankasthana mayankasthana deleted the interpolationQuantile branch September 18, 2016 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support 'interpolation' keyword in pandas.DataFrame.quantile and pandas.Series.quantile
5 participants