-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Doc: Added warning to treat group chunks as immutable when using apply #19114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pdpark
commented
Jan 7, 2018
- closes issue DOC: clarify dangers of fast apply in GroupBy apply docs #14180
…utomatic-exclusion-of-nuisance-columns section
…ly" section of groupby.rst Resolves: pandas-dev#14180
@@ -332,3 +332,97 @@ using something similar to the following: | |||
See `the NumPy documentation on byte order | |||
<https://docs.scipy.org/doc/numpy/user/basics.byteswapping.html>`__ for more | |||
details. | |||
|
|||
|
|||
Alternative to storing lists in Pandas DataFrame Cells |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just DataFrame
|
||
Alternative to storing lists in Pandas DataFrame Cells | ||
------------------------------------------------------ | ||
Storing nested lists/arrays inside a pandas object should be avoided for performance and memory use reasons. Instead they should be "exploded" into a flat DataFrame structure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use double backticks around DataFrame
.. ipython:: python | ||
|
||
from collections import OrderedDict | ||
df = (pd.DataFrame(OrderedDict([('name', ['A.J. Price']*3), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use dict contruction directly, if you want column ordering then pass columns
)) | ||
df | ||
|
||
nn = [['Zach LaVine', 'Jeremy Lin', 'Nate Robinson', 'Isaia']]*3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this something more apparent
nn = [['Zach LaVine', 'Jeremy Lin', 'Nate Robinson', 'Isaia']]*3 | ||
nn | ||
|
||
# Step 1: Create an index with the "parent" columns to be included in the final Dataframe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use sphinx to number these
@@ -955,6 +959,42 @@ will be (silently) dropped. Thus, this does not pose any problems: | |||
|
|||
df.groupby('A').std() | |||
|
|||
.. note:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is for another issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR #18953 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I should have made this a separate branch on my fork and separate pull request.
I will make the updates per your notes above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a clean pull request for this fix: #19215
superseded by #19175 |