-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Added note about groupby excluding Decimal columns by default #18953
Conversation
doc/source/groupby.rst
Outdated
@@ -497,6 +497,28 @@ index are the group names and whose values are the sizes of each group. | |||
|
|||
``nth`` can act as a reducer *or* a filter, see :ref:`here <groupby.nth>` | |||
|
|||
Decimal columns are "nuisance" columns that .agg automatically excludes in groupby. | |||
|
|||
If you do wish to aggregate them you must do so explicitly: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comma between "them" and "you"
doc/source/groupby.rst
Outdated
@@ -497,6 +497,28 @@ index are the group names and whose values are the sizes of each group. | |||
|
|||
``nth`` can act as a reducer *or* a filter, see :ref:`here <groupby.nth>` | |||
|
|||
Decimal columns are "nuisance" columns that .agg automatically excludes in groupby. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The word "nuisance" comes off a little too strong IMO. Just state that they're excluded and (very briefly) why that is the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, we call excluded columns nuisance
so this is ok.
Codecov Report
@@ Coverage Diff @@
## master #18953 +/- ##
=======================================
Coverage 92.2% 92.2%
=======================================
Files 169 169
Lines 50924 50924
=======================================
Hits 46952 46952
Misses 3972 3972
Continue to review full report at Codecov.
|
I'm not so sure Pandas should mention Operations on Maybe it could go to the FAQ or maybe just to StackOverflow? If it goes in the pandas docs, there should IMO be a visible disclaimer regarding slowness and the backsides of doing math operations on python objects rather than numpy objects. |
doc/source/groupby.rst
Outdated
@@ -497,6 +497,28 @@ index are the group names and whose values are the sizes of each group. | |||
|
|||
``nth`` can act as a reducer *or* a filter, see :ref:`here <groupby.nth>` | |||
|
|||
Decimal columns are "nuisance" columns that .agg automatically excludes in groupby. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, we call excluded columns nuisance
so this is ok.
doc/source/groupby.rst
Outdated
@@ -497,6 +497,28 @@ index are the group names and whose values are the sizes of each group. | |||
|
|||
``nth`` can act as a reducer *or* a filter, see :ref:`here <groupby.nth>` | |||
|
|||
Decimal columns are "nuisance" columns that .agg automatically excludes in groupby. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this entire section in a .. note::
block
doc/source/groupby.rst
Outdated
'dec_column2': [Decimal('0.20'), Decimal('0.30'), Decimal('0.55'), Decimal('0.60')] | ||
}, | ||
columns=['name','title','id','int_column','dec_column1','dec_column2'] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
show the dec here
doc/source/groupby.rst
Outdated
columns=['name','title','id','int_column','dec_column1','dec_column2'] | ||
) | ||
|
||
dec.groupby(['name', 'title', 'id'], as_index=False).sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment before each of the groupbys (e.g. .sum()
by default excludes nuiscance), and how to include
doc/source/groupby.rst
Outdated
@@ -497,6 +497,28 @@ index are the group names and whose values are the sizes of each group. | |||
|
|||
``nth`` can act as a reducer *or* a filter, see :ref:`here <groupby.nth>` | |||
|
|||
Decimal columns are "nuisance" columns that .agg automatically excludes in groupby. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this entire section down to 'Automatic exclusion of nuiscance columns'
Made requested changes. |
Fix for issue #17027. |
can you rebase this |
closing as stale |
@jreback Can certainly rebase. 🙂 |
@gfyoung i am not sure this is in the right direction |
Hmm...I see. I was under the impression, given the conversation, that this just needed rebasing. What do you mean by the "right direction" ? |
needs to be way way simpler |
my point is i would start over |
I can revisit this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some tips:
- I would leave the change to
gotchas.rst
as a separate PR, as this is really a different topic - In the decimal example, try to trim down the code to what is minimally needed, to have a much simpler example. You don't need multiple decimal columns; a single key to group by is sufficient.
- This is in general true for object dtype columns, not only for decimals. I would also mention that somewhere.
Hello not interseted to learn
Sent from Outlook<http://aka.ms/weboutlook>
Thank you very much for replying
Jean K
…________________________________
From: Jeff Reback <notifications@github.com>
Sent: Tuesday, June 26, 2018 1:37 PM
To: pandas-dev/pandas
Cc: Subscribed
Subject: Re: [pandas-dev/pandas] Added note about groupby excluding Decimal columns by default (#18953)
closing as stale
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#18953 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/Aj_QpqG49taqbGUDB8-RaAbV8T-3rJbhks5uAg7rgaJpZM4RNSQx>.
|
@jorisvandenbossche Will do. |
@jorisvandenbossche & @jreback Requested changes made. |
you need to merge master and force push, you are picking up all commits since july. |
Fixed |
@jreback is this ready to be merged after the last chages? |
so I am not averse to expanding this section, however decimals DO now work on master if they are typed as EAs. So maybe a reference to that possibility here. |
@pdpark do you have time to update based on the last comment? |
The decimals EA is not publicly exposed, only for testing purposes for now. |
I updated this PR (slight rewording + simplified and fixed the example a little bit), should be ready to merge now. |
…files we don't care
We care a little bit, to the extent that it fits in the code block box without generating scroll bars, but I think the limit is a bit higher than 80 for that. |
Merging since it is a |
@pdpark Thanks a lot for the PR! |
…fixed * upstream/master: (47 commits) CLN: remove values attribute from datetimelike EAs (pandas-dev#23603) DOC/CI: Add linting to rst files, and fix issues (pandas-dev#23381) PERF: Speeds up creation of Period, PeriodArray, with Offset freq (pandas-dev#23589) PERF: define is_all_dates to shortcut inadvertent copy when slicing an IntervalIndex (pandas-dev#23591) TST: Tests and Helpers for Datetime/Period Arrays (pandas-dev#23502) Update description of Index._values/values/ndarray_values (pandas-dev#23507) Fixes to make validate_docstrings.py not generate warnings or unwanted output (pandas-dev#23552) DOC: Added note about groupby excluding Decimal columns by default (pandas-dev#18953) ENH: Support writing timestamps with timezones with to_sql (pandas-dev#22654) CI: Auto-cancel redundant builds (pandas-dev#23523) Preserve EA dtype in DataFrame.stack (pandas-dev#23285) TST: Fix dtype mismatch on 32bit in IntervalTree get_indexer test (pandas-dev#23468) BUG: raise if invalid freq is passed (pandas-dev#23546) remove uses of (ts)?lib.(NaT|iNaT|Timestamp) (pandas-dev#23562) BUG: Fix error message for invalid HTML flavor (pandas-dev#23550) ENH: Support EAs in Series.unstack (pandas-dev#23284) DOC: Updating DataFrame.join docstring (pandas-dev#23471) TST: coverage for skipped tests in io/formats/test_to_html.py (pandas-dev#22888) BUG: Return KeyError for invalid string key (pandas-dev#23540) BUG: DatetimeIndex slicing with boolean Index raises TypeError (pandas-dev#22852) ...
Also included example of how to explicitly aggregate by Decimal columns.