ERR: qcut uniquess checking #14455

ashishsingal1 · 2016-10-19T18:02:55Z

closes qcut() should make sure the bins bounderies are unique before passing them to _bins_to_cuts #7751
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

Add option to drop non-unique bins.

codecov-io · 2016-10-19T21:54:46Z

Current coverage is 85.25% (diff: 75.00%)

Merging #14455 into master will increase coverage by <.01%

@@             master     #14455   diff @@
==========================================
  Files           140        140          
  Lines         50631      50633     +2   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43166      43168     +2   
  Misses         7465       7465          
  Partials          0          0

Powered by Codecov. Last update c31ea34...4450884

sinhrks · 2016-10-20T01:16:56Z

pandas/tools/tile.py

@@ -172,11 +176,13 @@ def qcut(x, q, labels=None, retbins=False, precision=3):
        quantiles = q
    bins = algos.quantile(x, quantiles)
    return _bins_to_cuts(x, bins, labels=labels, retbins=retbins,
-                         precision=precision, include_lowest=True)
+                         precision=precision, include_lowest=True,
+                         duplicate_edges='raise')


should pass duplicate_edges.

Personally prefer errors kw as it compat with others.

agree, commented above

sinhrks · 2016-10-20T01:17:17Z

Thx for the PR. Can u add tests and whatsnew?

…om_iterables (fixes #14438) (#14449)

Author: Iván Vallés Pérez <ivanvallesperez@gmail.com> Closes #14434 from ivallesp/add-check-for-merge-indices and squashes the following commits: e18b7c9 [Iván Vallés Pérez] Add some checks for assuring that the left_index and right_index parameters have correct types. Tests added.

jreback · 2016-10-20T10:49:43Z

pandas/tools/tile.py

@@ -141,6 +142,9 @@ def qcut(x, q, labels=None, retbins=False, precision=3):
        as a scalar.
    precision : int
        The precision at which to store and display the bins labels
+    duplicate_edges : {'raise', 'drop'}, optional


duplicate_edges -> errors='raise', 'drop'

actually, I like duplicates=raise, drop

jreback · 2016-10-20T10:51:33Z

pandas/tools/tile.py

@@ -191,7 +197,11 @@ def _bins_to_cuts(x, bins, right=True, labels=None, retbins=False,
    ids = bins.searchsorted(x, side=side)



check all the valid possibilities for errors and raise otherwise (IOW, if you pass a bad value should raise an informative message)

if errors not in ['raise', 'drop']: raise ValueError("invalid value for errors paramters, valid are: raise, drop")

jreback · 2016-10-20T10:52:47Z

pandas/tools/tile.py

-        raise ValueError('Bin edges must be unique: %s' % repr(bins))
+        if (duplicate_edges == 'raise'):
+            raise ValueError('Bin edges must be unique: %s'
+                             % repr(bins))


expand this message to say, you can force edges to be unique by passing errors='drop'

jreback · 2016-10-20T10:54:27Z

looks pretty good. some more error checking formatting is needed. pls add a whatsnew note in 0.19.1, make an Enhancements changes section. Be sure to say that the default is the existing behavior.

jreback · 2016-10-20T10:55:12Z

pls add some tests. use the example from the orginal issue exercising both options to duplicates and pass some invalid values as well (and assert those raise)

``pivot_table`` raises TypeError`` when ``index`` or ``columns`` is array-like and ``values`` is not specified. Author: sinhrks <sinhrks@gmail.com> Closes #14380 from sinhrks/pivot_table_bug and squashes the following commits: be426db [sinhrks] BUG: pivot_table may raise TypeError without values

* BUG: underflow on Timestamp creation * undo change to lower bound * change lower bound; but keep rounding to us

ashishsingal1 · 2016-10-20T18:08:51Z

Thanks for the feedback -- this is my first PR on an open source project -- will make the changes and resubmit tomorrow. Had some trouble building my branch on Windows.

USE_CASE_RANGE is a GNU C feature. This change will activate USE_CASE_RANGE on any platform when using GNU C and not on any platform when a different compiler is being used. closes #14373

jreback · 2016-10-20T23:04:16Z

contributing docs are here; there is a section on creating a windows env.

1) Add checks to ensure that add overflow does not occur both in the positive or negative directions. 2) Add benchmarks to ensure that operations involving this checked add function are significantly impacted.

The mention of panels that are created is not correct. You get a multi-index

…14424)

Since we don't support Python 2.6 anymore, the `check_output` method from `subprocess` is at our disposal. Follow-up to #14447. xref <a href="https://github.com/pandas- dev/pandas/issues/14439#issuecomment-254522055"> #14439 (comment)</a> Author: gfyoung <gfyoung17@gmail.com> Closes #14465 from gfyoung/merge-pr-refactor and squashes the following commits: e267d2b [gfyoung] MAINT: Use check_output when merging.

…y input closes #13139 Added test case to check for invalid input(empy string) on pd.eval('') and df.query(''). Used existing helper function(_check_expression) Author: Thiago Serafim <thiago.serafim@gmail.com> Closes #14473 from tserafim/issue#13139 and squashes the following commits: 77483dd [Thiago Serafim] ERR: correctly raise ValueError on empty input to pd.eval() and df.query() (#13139) 9a5c55f [Thiago Serafim] Fix GH13139: better error message on invalid pd.eval and df.query input

…o_datetime (#14864)

closes #14861 closes #14863

closes #14699 closes #14831 closes #14508

xref #14576 closes #13340

closes #14576 closes #12688 closes #14570 xref #14874

…on_normalize Author: dickreuter <dickreuter@yahoo.com> Closes #14583 from dickreuter/json_normalize_enhancement and squashes the following commits: 701c140 [dickreuter] adjusted formatting 3c94206 [dickreuter] shortened lines to pass linting 2028924 [dickreuter] doc changes d298588 [dickreuter] Fixed as instructed in pull request page bcfbf18 [dickreuter] Avoids exception when pandas.io.json.json_normalize

closes #14778 Please see regex search on long columns by first converting to Categorical, avoid melting all dataframes with all the id variables, and wait with trying to convert the "time" variable to `int` until last), and clear up the docstring. Author: nuffe <erik.cfr@gmail.com> Closes #14779 from nuffe/wide2longfix and squashes the following commits: df1edf8 [nuffe] asv_bench: fix indentation and simplify dc13064 [nuffe] Set docstring to raw literal to allow backslashes to be printed (still had to escape them) 295d1e6 [nuffe] Use pd.Index in doc example 1c49291 [nuffe] Can of course get rid negative lookahead now that suffix is a regex 54c5920 [nuffe] Specify the suffix with a regex 5747a25 [nuffe] ENH/DOC: wide_to_long performance and functionality improvements (#14779)

+ Add doc explaining parse_date limitation

- [x] closes #12651 - [x] passes `git diff upstream/master | flake8 --diff` Author: adrian-stepien <adrian-stepien@users.noreply.github.com> Closes #14098 from adrian-stepien/doc/12651 and squashes the following commits: 4427e28 [adrian-stepien] DOC: Improved links between expanding and cum* (#12651) 8466669 [adrian-stepien] DOC: Improved links between expanding and cum* (#12651) 30164f3 [adrian-stepien] DOC: Correct link from b/ffill to fillna

Passing `'0.5min'` as a frequency string should generate 30 second intervals, rather than five minute intervals. By recursively increasing resolution until one is found for which the frequency is an integer, this commit ensures that that's the case for resolutions from days to microseconds. Fixes #8419

…4253) (#14531)

…H5677) (#14432)

…14192) (#14754)

`cpplint` was introduced #14740, and this commit extends to check other `*.c` and `*.h` files. Currently, they all reside in `pandas/src`, and this commit expands the lint to check all of the following: 1) `datetime` (dir) 2) `ujson` (dir) 3) `period_helper.c` 4) `All header files` The parser directory was handled in #14740, and the others have been deliberately omitted per the discussion <a href="https://github.com/pandas- dev/pandas/pull/14740#issuecomment-265260209">here</a>. Author: gfyoung <gfyoung17@gmail.com> Closes #14814 from gfyoung/c-style-continue and squashes the following commits: 27d4d46 [gfyoung] MAINT: Style check *.c and *.h files

Always return `SparseArray` and `SparseSeries` for `SparseArray.cumsum()` and `SparseSeries.cumsum()` respectively, regardless of `fill_value`. Closes #12855. Author: gfyoung <gfyoung17@gmail.com> Closes #14771 from gfyoung/sparse-return-type and squashes the following commits: 83314fc [gfyoung] API: Return sparse objects always for cumsum

BUG: Fixed KDE plot to ignore missing values closes #14821 * fixed kde plot to ignore the missing values * added comment to elaborate the changes made * added a release note in whatsnew/0.19.2 * added test to check for missing values and cleaned up whatsnew doc * added comment to refer the issue * modified to fit lint checks * replaced ._xorig with .get_xdata()

xref #13745 provides a modest speedup for all string hashing. The key thing is, it will release the GIL on more operations where this is possible (mainly factorize). can be easily extended to value_counts() and .duplicated() (for strings) Author: Jeff Reback <jeff@reback.net> Closes #14859 from jreback/string and squashes the following commits: 98f46c2 [Jeff Reback] PERF: use StringHashTable for strings in factorizing

# Conflicts: # pandas/tools/tile.py

jorisvandenbossche · 2016-12-17T23:43:41Z

@ashishsingal1 something went wrong with your rebase. Can you do:

git fetch upstream
git rebase upstream/master
git push -f

That should normally solve it

ashishsingal1 · 2016-12-27T22:25:16Z

Trouble rebasing, going to start over with a new PR.

Update tile.py

36ec9a1

ashishsingal1 mentioned this pull request Oct 19, 2016

qcut() should make sure the bins bounderies are unique before passing them to _bins_to_cuts #7751

Closed

Update tile.py

4450884

sinhrks reviewed Oct 20, 2016

View reviewed changes

sinhrks added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Error Reporting Incorrect or improved errors from pandas labels Oct 20, 2016

sinhrks added this to the 0.19.1 milestone Oct 20, 2016

dubourg and others added 2 commits October 20, 2016 06:25

Type codes and categories as lists instead of tuples in _factorize_fr…

0b6946b

…om_iterables (fixes #14438) (#14449)

jreback changed the title ~~Update tile.py~~ ERR: qcut uniquess checking Oct 20, 2016

jreback reviewed Oct 20, 2016

View reviewed changes

jreback removed this from the 0.19.1 milestone Oct 20, 2016

jorisvandenbossche added this to the 0.20.0 milestone Oct 20, 2016

BUG: underflow on Timestamp creation (#14433)

65362aa

* BUG: underflow on Timestamp creation * undo change to lower bound * change lower bound; but keep rounding to us

Update unpack_template.h (#14441)

794f792

USE_CASE_RANGE is a GNU C feature. This change will activate USE_CASE_RANGE on any platform when using GNU C and not on any platform when a different compiler is being used. closes #14373

chris-b1 and others added 6 commits October 21, 2016 19:37

DOC: update readme for repo move (#14470)

170d13a

BUG: Catch overflow in both directions for checked add (#14453)

83a380c

1) Add checks to ensure that add overflow does not occur both in the positive or negative directions. 2) Add benchmarks to ensure that operations involving this checked add function are significantly impacted.

DOC: correct DataFrame.pivot docstring (#14430)

aff20eb

The mention of panels that are created is not correct. You get a multi-index

BUG: String indexing against object dtype may raise AttributeError (#…

233d51d

…14424)

patniharshit and others added 24 commits December 12, 2016 06:10

DOC: add floats and ints missing as acceptable arguments for pandas.t…

dfe8230

…o_datetime (#14864)

DOC: fix groupby.rst for building issues

96b171a

closes #14861 closes #14863

BF: boost min cython to 0.23

14e4815

closes #14699 closes #14831 closes #14508

Move compression code to io.common._get_handle

110ac2a

xref #14576 closes #13340

CLN: Refactor compression code to expand URL support

3761448

closes #14576 closes #12688 closes #14570 xref #14874

DOC: doc-string for infer_compression

4a5aec4

TST: Parse dates with empty space (#6428) (#14862)

510dd67

+ Add doc explaining parse_date limitation

DOC: update added docs from #14098

30025d8

ENH: merge_asof() has left_index/right_index and left_by/right_by (#1…

84cad61

…4253) (#14531)

DOC: already include 0.20 release notes for dev docs

abdfa3e

ENH: Allow the groupby by param to handle columns and index levels (G…

a8cabb8

…H5677) (#14432)

TST: Same return values in drop_duplicates for Series and DataFrames(#…

5f889a2

…14192) (#14754)

DOC: doc changes in whatsnew 0.20.0

9ab4046

DOC: whatsnew 0.19.2

49e3137

Merge remote-tracking branch 'origin/master'

e409059

# Conflicts: # pandas/tools/tile.py

adding test and moving unique bins code.

e62eeac

ashishsingal1 closed this Dec 27, 2016

ashishsingal1 mentioned this pull request Dec 27, 2016

ERR: qcut uniquess checking (try 2) #15000

Closed

4 tasks

jorisvandenbossche modified the milestones: No action, 0.20.0 Dec 28, 2016

		@@ -191,7 +197,11 @@ def _bins_to_cuts(x, bins, right=True, labels=None, retbins=False,
		ids = bins.searchsorted(x, side=side)

Uh oh!

ERR: qcut uniquess checking #14455

ERR: qcut uniquess checking #14455

Uh oh!

Conversation

ashishsingal1 commented Oct 19, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented Oct 19, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current coverage is 85.25% (diff: 75.00%)

Uh oh!

sinhrks Oct 20, 2016

Choose a reason for hiding this comment

Uh oh!

sinhrks Oct 20, 2016

Choose a reason for hiding this comment

Uh oh!

jreback Oct 20, 2016

Choose a reason for hiding this comment

Uh oh!

sinhrks commented Oct 20, 2016

Uh oh!

jreback Oct 20, 2016

Choose a reason for hiding this comment

Uh oh!

jreback Oct 20, 2016

Choose a reason for hiding this comment

Uh oh!

jreback Oct 20, 2016

Choose a reason for hiding this comment

Uh oh!

jreback Oct 20, 2016

Choose a reason for hiding this comment

Uh oh!

jreback commented Oct 20, 2016

Uh oh!

jreback commented Oct 20, 2016

Uh oh!

ashishsingal1 commented Oct 20, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback commented Oct 20, 2016

Uh oh!

jorisvandenbossche commented Dec 17, 2016

Uh oh!

ashishsingal1 commented Dec 27, 2016

Uh oh!

Uh oh!

ashishsingal1 commented Oct 19, 2016 •

edited

Loading

codecov-io commented Oct 19, 2016 •

edited

Loading

ashishsingal1 commented Oct 20, 2016 •

edited

Loading