DOC: Added note about groupby excluding Decimal columns by default #18953

pdpark · 2017-12-27T08:26:56Z

Also included example of how to explicitly aggregate by Decimal columns.

closes 'groupby' multiple columns and 'sum' multiple columns with different types #13821

gfyoung · 2017-12-27T09:19:30Z

doc/source/groupby.rst

@@ -497,6 +497,28 @@ index are the group names and whose values are the sizes of each group.

   ``nth`` can act as a reducer *or* a filter, see :ref:`here <groupby.nth>`

+   Decimal columns are "nuisance" columns that .agg automatically excludes in groupby.
+
+   If you do wish to aggregate them you must do so explicitly:


Add a comma between "them" and "you"

gfyoung · 2017-12-27T09:20:03Z

doc/source/groupby.rst

@@ -497,6 +497,28 @@ index are the group names and whose values are the sizes of each group.

   ``nth`` can act as a reducer *or* a filter, see :ref:`here <groupby.nth>`

+   Decimal columns are "nuisance" columns that .agg automatically excludes in groupby.


The word "nuisance" comes off a little too strong IMO. Just state that they're excluded and (very briefly) why that is the case.

no, we call excluded columns nuisance so this is ok.

codecov · 2017-12-27T09:29:43Z

Codecov Report

Merging #18953 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #18953   +/-   ##
=======================================
  Coverage    92.2%    92.2%           
=======================================
  Files         169      169           
  Lines       50924    50924           
=======================================
  Hits        46952    46952           
  Misses       3972     3972

Flag	Coverage Δ
#multiple	`90.62% <ø> (ø)`	⬆️
#single	`42.3% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c8ce3d0...c755f2c. Read the comment docs.

topper-123 · 2017-12-27T11:02:55Z

I'm not so sure Pandas should mention Decimal on the groupby page.

Operations on Decimals will by nature be slow, and people could misunderstand the slowness as a slowness of Pandas. IMO Pandas should promote using the various numpy and pandas types and python str, very seldomly other types.

Maybe it could go to the FAQ or maybe just to StackOverflow? If it goes in the pandas docs, there should IMO be a visible disclaimer regarding slowness and the backsides of doing math operations on python objects rather than numpy objects.

jreback · 2017-12-27T20:11:39Z

doc/source/groupby.rst

@@ -497,6 +497,28 @@ index are the group names and whose values are the sizes of each group.

   ``nth`` can act as a reducer *or* a filter, see :ref:`here <groupby.nth>`

+   Decimal columns are "nuisance" columns that .agg automatically excludes in groupby.


no, we call excluded columns nuisance so this is ok.

jreback · 2017-12-27T20:12:03Z

doc/source/groupby.rst

@@ -497,6 +497,28 @@ index are the group names and whose values are the sizes of each group.

   ``nth`` can act as a reducer *or* a filter, see :ref:`here <groupby.nth>`

+   Decimal columns are "nuisance" columns that .agg automatically excludes in groupby.


add this entire section in a .. note:: block

jreback · 2017-12-27T20:12:19Z

doc/source/groupby.rst

+                'dec_column2': [Decimal('0.20'), Decimal('0.30'), Decimal('0.55'), Decimal('0.60')]
+            },
+        columns=['name','title','id','int_column','dec_column1','dec_column2']
+        )


show the dec here

jreback · 2017-12-27T20:13:28Z

doc/source/groupby.rst

+        columns=['name','title','id','int_column','dec_column1','dec_column2']
+        )
+
+    dec.groupby(['name', 'title', 'id'], as_index=False).sum()


add a comment before each of the groupbys (e.g. .sum() by default excludes nuiscance), and how to include

jreback · 2017-12-27T20:14:31Z

doc/source/groupby.rst

@@ -497,6 +497,28 @@ index are the group names and whose values are the sizes of each group.

   ``nth`` can act as a reducer *or* a filter, see :ref:`here <groupby.nth>`

+   Decimal columns are "nuisance" columns that .agg automatically excludes in groupby.


move this entire section down to 'Automatic exclusion of nuiscance columns'

pdpark · 2017-12-28T03:39:13Z

Made requested changes.

pdpark · 2018-01-05T08:04:56Z

Fix for issue #17027.

jreback · 2018-02-11T15:26:19Z

can you rebase this

jreback · 2018-06-26T10:37:13Z

closing as stale

gfyoung · 2018-06-27T05:04:12Z

@jreback Can certainly rebase. 🙂

jreback · 2018-06-27T09:14:53Z

@gfyoung i am not sure this is in the right direction
needs significant edits

gfyoung · 2018-06-27T09:24:18Z

i am not sure this is in the right direction

Hmm...I see. I was under the impression, given the conversation, that this just needed rebasing. What do you mean by the "right direction" ?

jreback · 2018-06-27T09:44:07Z

needs to be way way simpler

jreback · 2018-06-27T09:44:32Z

my point is i would start over

pdpark · 2018-06-27T14:41:40Z

I can revisit this.

jorisvandenbossche

Some tips:

I would leave the change to gotchas.rst as a separate PR, as this is really a different topic
In the decimal example, try to trim down the code to what is minimally needed, to have a much simpler example. You don't need multiple decimal columns; a single key to group by is sufficient.
This is in general true for object dtype columns, not only for decimals. I would also mention that somewhere.

alphaKuz · 2018-06-28T07:34:43Z

Hello not interseted to learn Sent from Outlook<http://aka.ms/weboutlook> Thank you very much for replying Jean K

…

________________________________ From: Jeff Reback <notifications@github.com> Sent: Tuesday, June 26, 2018 1:37 PM To: pandas-dev/pandas Cc: Subscribed Subject: Re: [pandas-dev/pandas] Added note about groupby excluding Decimal columns by default (#18953) closing as stale — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#18953 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/Aj_QpqG49taqbGUDB8-RaAbV8T-3rJbhks5uAg7rgaJpZM4RNSQx>.

pdpark · 2018-07-12T16:11:17Z

@jorisvandenbossche Will do.

pdpark · 2018-07-16T15:36:29Z

@jorisvandenbossche & @jreback Requested changes made.

jreback · 2018-10-07T22:54:37Z

you need to merge master and force push, you are picking up all commits since july.

pdpark · 2018-10-11T16:23:09Z

Fixed

datapythonista · 2018-11-03T05:25:40Z

@jreback is this ready to be merged after the last chages?

jreback · 2018-11-03T13:54:39Z

so I am not averse to expanding this section, however decimals DO now work on master if they are typed as EAs. So maybe a reference to that possibility here.

datapythonista · 2018-11-08T10:59:40Z

@pdpark do you have time to update based on the last comment?

jorisvandenbossche · 2018-11-08T14:10:07Z

however decimals DO now work on master if they are typed as EAs

The decimals EA is not publicly exposed, only for testing purposes for now.

jorisvandenbossche · 2018-11-08T14:11:22Z

I updated this PR (slight rewording + simplified and fixed the example a little bit), should be ready to merge now.

…files we don't care

jorisvandenbossche · 2018-11-08T15:07:27Z

Some lines are >80, but I think in rst files we don't care

We care a little bit, to the extent that it fits in the code block box without generating scroll bars, but I think the limit is a bit higher than 80 for that.

jorisvandenbossche · 2018-11-08T15:08:34Z

Merging since it is a /pandas/doc/source change only (the CI is still an old version that is failing)

jorisvandenbossche · 2018-11-08T15:08:55Z

@pdpark Thanks a lot for the PR!

…fixed * upstream/master: (47 commits) CLN: remove values attribute from datetimelike EAs (pandas-dev#23603) DOC/CI: Add linting to rst files, and fix issues (pandas-dev#23381) PERF: Speeds up creation of Period, PeriodArray, with Offset freq (pandas-dev#23589) PERF: define is_all_dates to shortcut inadvertent copy when slicing an IntervalIndex (pandas-dev#23591) TST: Tests and Helpers for Datetime/Period Arrays (pandas-dev#23502) Update description of Index._values/values/ndarray_values (pandas-dev#23507) Fixes to make validate_docstrings.py not generate warnings or unwanted output (pandas-dev#23552) DOC: Added note about groupby excluding Decimal columns by default (pandas-dev#18953) ENH: Support writing timestamps with timezones with to_sql (pandas-dev#22654) CI: Auto-cancel redundant builds (pandas-dev#23523) Preserve EA dtype in DataFrame.stack (pandas-dev#23285) TST: Fix dtype mismatch on 32bit in IntervalTree get_indexer test (pandas-dev#23468) BUG: raise if invalid freq is passed (pandas-dev#23546) remove uses of (ts)?lib.(NaT|iNaT|Timestamp) (pandas-dev#23562) BUG: Fix error message for invalid HTML flavor (pandas-dev#23550) ENH: Support EAs in Series.unstack (pandas-dev#23284) DOC: Updating DataFrame.join docstring (pandas-dev#23471) TST: coverage for skipped tests in io/formats/test_to_html.py (pandas-dev#22888) BUG: Return KeyError for invalid string key (pandas-dev#23540) BUG: DatetimeIndex slicing with boolean Index raises TypeError (pandas-dev#22852) ...

…andas-dev#18953)

gfyoung added Docs Dtype Conversions Unexpected or buggy dtype conversions Groupby labels Dec 27, 2017

gfyoung reviewed Dec 27, 2017

View reviewed changes

jreback requested changes Dec 27, 2017

View reviewed changes

jreback mentioned this pull request Jan 7, 2018

Doc: Added warning to treat group chunks as immutable when using apply #19114

Closed

1 task

jreback closed this Jun 26, 2018

gfyoung reopened this Jun 27, 2018

gfyoung force-pushed the doc_updt1 branch from 6cf1c2c to fa94960 Compare June 27, 2018 05:04

gfyoung added this to the 0.24.0 milestone Jun 27, 2018

jorisvandenbossche reviewed Jun 27, 2018

View reviewed changes

pdpark force-pushed the doc_updt1 branch from 9b6dfae to 6b22d12 Compare October 11, 2018 15:22

Added note about groupby excluding Decimal columns by default

df32828

pdpark force-pushed the doc_updt1 branch from 6431b13 to df32828 Compare October 11, 2018 16:22

Merge branch 'master' into doc_updt1

ff04442

jreback removed this from the 0.24.0 milestone Nov 6, 2018

reword + fix typo in example

9e39cf2

jorisvandenbossche added this to the 0.24.0 milestone Nov 8, 2018

jorisvandenbossche changed the title ~~Added note about groupby excluding Decimal columns by default~~ DOC: Added note about groupby excluding Decimal columns by default Nov 8, 2018

Fixing couple of pep8 issues. Some lines are >80, but I think in rst …

c755f2c

…files we don't care

jorisvandenbossche merged commit b73d602 into pandas-dev:master Nov 8, 2018

JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018

DOC: Added note about groupby excluding Decimal columns by default (p…

db58d3d

…andas-dev#18953)

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018

DOC: Added note about groupby excluding Decimal columns by default (p…

825dc54

…andas-dev#18953)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

DOC: Added note about groupby excluding Decimal columns by default (p…

4f5d64e

…andas-dev#18953)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

DOC: Added note about groupby excluding Decimal columns by default (p…

cf12120

…andas-dev#18953)

		@@ -497,6 +497,28 @@ index are the group names and whose values are the sizes of each group.

		``nth`` can act as a reducer or a filter, see :ref:`here <groupby.nth>`

		Decimal columns are "nuisance" columns that .agg automatically excludes in groupby.

Uh oh!

DOC: Added note about groupby excluding Decimal columns by default #18953

DOC: Added note about groupby excluding Decimal columns by default #18953

Uh oh!

Conversation

pdpark commented Dec 27, 2017

Uh oh!

gfyoung Dec 27, 2017

Choose a reason for hiding this comment

Uh oh!

gfyoung Dec 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback Dec 27, 2017

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Dec 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

topper-123 commented Dec 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback Dec 27, 2017

Choose a reason for hiding this comment

Uh oh!

jreback Dec 27, 2017

Choose a reason for hiding this comment

Uh oh!

jreback Dec 27, 2017

Choose a reason for hiding this comment

Uh oh!

jreback Dec 27, 2017

Choose a reason for hiding this comment

Uh oh!

jreback Dec 27, 2017

Choose a reason for hiding this comment

Uh oh!

pdpark commented Dec 28, 2017

Uh oh!

pdpark commented Jan 5, 2018

Uh oh!

jreback commented Feb 11, 2018

Uh oh!

jreback commented Jun 26, 2018

Uh oh!

gfyoung commented Jun 27, 2018

Uh oh!

jreback commented Jun 27, 2018

Uh oh!

gfyoung commented Jun 27, 2018

Uh oh!

jreback commented Jun 27, 2018

Uh oh!

jreback commented Jun 27, 2018

Uh oh!

pdpark commented Jun 27, 2018

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

alphaKuz commented Jun 28, 2018 via email

Uh oh!

pdpark commented Jul 12, 2018

Uh oh!

pdpark commented Jul 16, 2018

Uh oh!

jreback commented Oct 7, 2018

Uh oh!

pdpark commented Oct 11, 2018

Uh oh!

datapythonista commented Nov 3, 2018

Uh oh!

jreback commented Nov 3, 2018

Uh oh!

datapythonista commented Nov 8, 2018

Uh oh!

jorisvandenbossche commented Nov 8, 2018

Uh oh!

gfyoung Dec 27, 2017 •

edited

Loading

codecov bot commented Dec 27, 2017 •

edited

Loading

topper-123 commented Dec 27, 2017 •

edited

Loading