-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #10174. Add 'interpolation' keyword in DataFrame.quantile and Series.quantile #10204
Issue #10174. Add 'interpolation' keyword in DataFrame.quantile and Series.quantile #10204
Conversation
This need some tests. |
I'll add some tests too. |
The interpolation argument was added to np.percentile() only in numpy version 1.9.0. The tests don't pass where numpy version < 1.9.0. How to handle this? |
@@ -4479,6 +4479,14 @@ def quantile(self, q=0.5, axis=0, numeric_only=True): | |||
0 <= q <= 1, the quantile(s) to compute | |||
axis : {0, 1} | |||
0 for row-wise, 1 for column-wise | |||
interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'} | |||
Specifies the interpolation method to use, when the desired quantile lies between two data points i and j: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you keep here to PEP8 line length?
Further, better to make a bullet points list of the options (newlines are ignored when rendered to html)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, you can just copy it from the source from the numpy.percentile function (using np.percentile??
in IPython):
interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}
This optional parameter specifies the interpolation method to use,
when the desired quantile lies between two data points `i` and `j`:
* linear: `i + (j - i) * fraction`, where `fraction` is the
fractional part of the index surrounded by `i` and `j`.
* lower: `i`.
* higher: `j`.
* nearest: `i` or `j` whichever is nearest.
* midpoint: (`i` + `j`) / 2.
.. versionadded:: 1.9.0
For the tests to handle the numpy version, you can do something like this:
|
What should happen when numpy version is < 1.9.0?
if 1. , then I can put the |
The default interpolation argument should work on all supported versions of NumPy. For numpy < 1.9.0, it's OK not to support other interpolation options. In these cases, we should raise a |
Now the default interpolation argument works on all supported versions of NumPy. Will add tests soon. |
I added tests for Dataframe.quantile . This is my first time writing tests, so please verify if it is done right. |
from numpy import percentile | ||
#interpolation = linear (default case) | ||
q = self.tsframe.quantile(0.1, axis=0,interpolation='linear') | ||
self.assertEqual(q['A'], percentile(self.tsframe['A'], 10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also use np.percentile
here, instead of importing percentile explicitely from numpy (np is already available)
Tests look good! Most important things to be tested is that the keyword is passed to percentile correctly, that it generate an error for older numpy versions, and that the default is not changed. And those are included, so OK! |
I added the explicit test that the result with interpolation='linear' and without specifying it is the same. |
this is getting to be lots of duplicated code. I would prefer that this all be moved to |
can you rebase? |
@jreback rebase done! |
not really sure what you did. you should have 1 commit. pls see contributing docs: http://pandas.pydata.org/pandas-docs/stable/contributing.html |
I screwed up a lot. I fetched upstream and then rebased on that or something. |
@mayankasthana No, there is no need to close this and create a new PR. Normally, doing this:
should be all that is needed to clean this up |
175eafa
to
b365dcc
Compare
@jreback Thanks for the help with git. Here is the expected 1 commit. |
looks pretty good. pls add a release note. |
* lower: `i`. | ||
* higher: `j`. | ||
* nearest: `i` or `j` whichever is nearest. | ||
* midpoint: (`i` + `j`) / 2. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add here the line:
.. versionadded:: 0.17.0
(same indentation as 'This optional ...')
@jreback @jorisvandenbossche Where should I add the feature description in |
You can put it under 'Other enhancements'. A one line sentence explaining it is enough I think. |
@jreback Rebased. Test is green. |
715e787
to
6e03017
Compare
@jreback Rebased. Test is green. |
@mayankasthana I'd really like to see #10207 but that can be done after. can you rebase. |
6e03017
to
748a692
Compare
@jreback Rebased. Test is green. |
can you rebase / update whats new to 0.18.0 |
3f32fcf
to
324d811
Compare
@jreback Rebased and updated whats new to 0.18.0. |
def multi(values, qs, interpolation): | ||
if _np_version_under1p9: | ||
if com.is_list_like(qs): | ||
values = [_quantile(values, x*100) for x in qs] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just create a kwargs
depending on the version and pass it to the functions, instead of duplicating all of this code
2ff1ac5
to
7eeeb33
Compare
#test with and without interpolation keyword | ||
assert_series_equal(q,q1) | ||
|
||
#interpolation method other than default linear |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this 2 tests, 1 for the version checking, and the other that would skip at the top if under verion 1.9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I separated <np 1.9 tests and >1.9 tests. Is this what you meant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, make them 2 separately tests, skipping on each one respectively if the numpy version not what you need it to be. Its simpler / easier to read that way.
comments updated |
697ccae
to
3cb69dd
Compare
|
||
#interpolation method other than default linear | ||
if _np_version_under1p9: | ||
expErrMsg = "Interpolation methods other than linear not supported in numpy < 1.9" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, make the numpy < 1.9 testsin a separate test
can you run: |
217535f
to
f8e4b5a
Compare
…rame.quantile and Series.quantile
f8e4b5a
to
55836e6
Compare
Created separate tests for different numpy versions. |
merged via e05f66a thanks! |
Thanks @jreback @shoyer @jorisvandenbossche |
Closes #10174.
I have added the new argument to the doc too.