-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ERR: qcut uniquess checking #14455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERR: qcut uniquess checking #14455
Conversation
Current coverage is 85.25% (diff: 75.00%)@@ master #14455 diff @@
==========================================
Files 140 140
Lines 50631 50633 +2
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43166 43168 +2
Misses 7465 7465
Partials 0 0
|
@@ -172,11 +176,13 @@ def qcut(x, q, labels=None, retbins=False, precision=3): | |||
quantiles = q | |||
bins = algos.quantile(x, quantiles) | |||
return _bins_to_cuts(x, bins, labels=labels, retbins=retbins, | |||
precision=precision, include_lowest=True) | |||
precision=precision, include_lowest=True, | |||
duplicate_edges='raise') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should pass duplicate_edges
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally prefer errors
kw as it compat with others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, commented above
Thx for the PR. Can u add tests and whatsnew? |
@@ -141,6 +142,9 @@ def qcut(x, q, labels=None, retbins=False, precision=3): | |||
as a scalar. | |||
precision : int | |||
The precision at which to store and display the bins labels | |||
duplicate_edges : {'raise', 'drop'}, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate_edges -> errors='raise', 'drop'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, I like duplicates=raise, drop
@@ -191,7 +197,11 @@ def _bins_to_cuts(x, bins, right=True, labels=None, retbins=False, | |||
ids = bins.searchsorted(x, side=side) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check all the valid possibilities for errors
and raise otherwise (IOW, if you pass a bad value should raise an informative message)
if errors not in ['raise', 'drop']:
raise ValueError("invalid value for errors paramters, valid are: raise, drop")
raise ValueError('Bin edges must be unique: %s' % repr(bins)) | ||
if (duplicate_edges == 'raise'): | ||
raise ValueError('Bin edges must be unique: %s' | ||
% repr(bins)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expand this message to say, you can force edges to be unique by passing errors='drop'
looks pretty good. some more error checking formatting is needed. pls add a whatsnew note in 0.19.1, make an Enhancements changes section. Be sure to say that the default is the existing behavior. |
pls add some tests. use the example from the orginal issue exercising both options to |
* BUG: underflow on Timestamp creation * undo change to lower bound * change lower bound; but keep rounding to us
Thanks for the feedback -- this is my first PR on an open source project -- will make the changes and resubmit tomorrow. Had some trouble building my branch on Windows. |
USE_CASE_RANGE is a GNU C feature. This change will activate USE_CASE_RANGE on any platform when using GNU C and not on any platform when a different compiler is being used. closes #14373
contributing docs are here; there is a section on creating a windows env. |
1) Add checks to ensure that add overflow does not occur both in the positive or negative directions. 2) Add benchmarks to ensure that operations involving this checked add function are significantly impacted.
The mention of panels that are created is not correct. You get a multi-index
Since we don't support Python 2.6 anymore, the `check_output` method from `subprocess` is at our disposal. Follow-up to #14447. xref <a href="https://github.com/pandas- dev/pandas/issues/14439#issuecomment-254522055"> #14439 (comment)</a> Author: gfyoung <gfyoung17@gmail.com> Closes #14465 from gfyoung/merge-pr-refactor and squashes the following commits: e267d2b [gfyoung] MAINT: Use check_output when merging.
…y input closes #13139 Added test case to check for invalid input(empy string) on pd.eval('') and df.query(''). Used existing helper function(_check_expression) Author: Thiago Serafim <thiago.serafim@gmail.com> Closes #14473 from tserafim/issue#13139 and squashes the following commits: 77483dd [Thiago Serafim] ERR: correctly raise ValueError on empty input to pd.eval() and df.query() (#13139) 9a5c55f [Thiago Serafim] Fix GH13139: better error message on invalid pd.eval and df.query input
…on_normalize Author: dickreuter <dickreuter@yahoo.com> Closes #14583 from dickreuter/json_normalize_enhancement and squashes the following commits: 701c140 [dickreuter] adjusted formatting 3c94206 [dickreuter] shortened lines to pass linting 2028924 [dickreuter] doc changes d298588 [dickreuter] Fixed as instructed in pull request page bcfbf18 [dickreuter] Avoids exception when pandas.io.json.json_normalize
closes #14778 Please see regex search on long columns by first converting to Categorical, avoid melting all dataframes with all the id variables, and wait with trying to convert the "time" variable to `int` until last), and clear up the docstring. Author: nuffe <erik.cfr@gmail.com> Closes #14779 from nuffe/wide2longfix and squashes the following commits: df1edf8 [nuffe] asv_bench: fix indentation and simplify dc13064 [nuffe] Set docstring to raw literal to allow backslashes to be printed (still had to escape them) 295d1e6 [nuffe] Use pd.Index in doc example 1c49291 [nuffe] Can of course get rid negative lookahead now that suffix is a regex 54c5920 [nuffe] Specify the suffix with a regex 5747a25 [nuffe] ENH/DOC: wide_to_long performance and functionality improvements (#14779)
- [x] closes #12651 - [x] passes `git diff upstream/master | flake8 --diff` Author: adrian-stepien <adrian-stepien@users.noreply.github.com> Closes #14098 from adrian-stepien/doc/12651 and squashes the following commits: 4427e28 [adrian-stepien] DOC: Improved links between expanding and cum* (#12651) 8466669 [adrian-stepien] DOC: Improved links between expanding and cum* (#12651) 30164f3 [adrian-stepien] DOC: Correct link from b/ffill to fillna
Passing `'0.5min'` as a frequency string should generate 30 second intervals, rather than five minute intervals. By recursively increasing resolution until one is found for which the frequency is an integer, this commit ensures that that's the case for resolutions from days to microseconds. Fixes #8419
`cpplint` was introduced #14740, and this commit extends to check other `*.c` and `*.h` files. Currently, they all reside in `pandas/src`, and this commit expands the lint to check all of the following: 1) `datetime` (dir) 2) `ujson` (dir) 3) `period_helper.c` 4) `All header files` The parser directory was handled in #14740, and the others have been deliberately omitted per the discussion <a href="https://github.com/pandas- dev/pandas/pull/14740#issuecomment-265260209">here</a>. Author: gfyoung <gfyoung17@gmail.com> Closes #14814 from gfyoung/c-style-continue and squashes the following commits: 27d4d46 [gfyoung] MAINT: Style check *.c and *.h files
Always return `SparseArray` and `SparseSeries` for `SparseArray.cumsum()` and `SparseSeries.cumsum()` respectively, regardless of `fill_value`. Closes #12855. Author: gfyoung <gfyoung17@gmail.com> Closes #14771 from gfyoung/sparse-return-type and squashes the following commits: 83314fc [gfyoung] API: Return sparse objects always for cumsum
BUG: Fixed KDE plot to ignore missing values closes #14821 * fixed kde plot to ignore the missing values * added comment to elaborate the changes made * added a release note in whatsnew/0.19.2 * added test to check for missing values and cleaned up whatsnew doc * added comment to refer the issue * modified to fit lint checks * replaced ._xorig with .get_xdata()
xref #13745 provides a modest speedup for all string hashing. The key thing is, it will release the GIL on more operations where this is possible (mainly factorize). can be easily extended to value_counts() and .duplicated() (for strings) Author: Jeff Reback <jeff@reback.net> Closes #14859 from jreback/string and squashes the following commits: 98f46c2 [Jeff Reback] PERF: use StringHashTable for strings in factorizing
# Conflicts: # pandas/tools/tile.py
@ashishsingal1 something went wrong with your rebase. Can you do:
That should normally solve it |
Trouble rebasing, going to start over with a new PR. |
git diff upstream/master | flake8 --diff
Add option to drop non-unique bins.