-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doc: Adds example of categorical data for efficient storage and consistency across DataFrames #19245
Conversation
pdpark
commented
Jan 15, 2018
- closes append a categorical with different categories to the existing #12509
Codecov Report
@@ Coverage Diff @@
## master #19245 +/- ##
==========================================
- Coverage 91.84% 91.6% -0.25%
==========================================
Files 153 150 -3
Lines 49295 48864 -431
==========================================
- Hits 45276 44761 -515
- Misses 4019 4103 +84
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be a less drawn out example, while still including all of the info. can you post a screen shot of the resulting page.
doc/source/cookbook.rst
Outdated
|
||
.. ipython:: python | ||
|
||
import pandas as pd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe we include the standard import (pandas and numpy) anywhere else, they are assumed
doc/source/cookbook.rst
Outdated
np.random.seed(1234) | ||
pd.set_option('max_rows',10) | ||
uniques = np.array(list(string.ascii_letters)) | ||
uniques |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this less verbose on the creation here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How's this?
UniqCats.pdf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to simplify this
doc/source/cookbook.rst
Outdated
|
||
df1 = pd.DataFrame({'A': domain.take(np.random.randint(0,12,size=100000))}) | ||
df2 = pd.DataFrame({'A': domain.take(np.random.randint(8,20,size=100000))}) | ||
print('df1.A Data Type:', df1.A.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to use print, the output is printed.
you can use comments to indicate what's important
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do.
doc/source/cookbook.rst
Outdated
@@ -1318,3 +1318,96 @@ of the data values: | |||
'weight': [100, 140, 180], | |||
'sex': ['Male', 'Female']}) | |||
df | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make ref to this section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also link to the categorical.rst section
doc/source/cookbook.rst
Outdated
df3 = df1.append(df2) | ||
df3.memory_usage() | ||
|
||
.. ipython:: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you are going to put text in between the diferent code samples it makes sense to have them in separate ipython blocks, otherwise it doesn't
doc/source/cookbook.rst
Outdated
|
||
df3.A.dtype | ||
|
||
The data type of all the columns is Object. Using Categorical columns should improve memory usage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should -> will
doc/source/cookbook.rst
Outdated
|
||
.. ipython:: python | ||
|
||
dfc1 = df1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to copy here
when assigning use dfc1['A'] = ....
doc/source/cookbook.rst
Outdated
dfc3.memory_usage() | ||
|
||
.. ipython:: python | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can greatly simplify this entire example and just show the middle section. just show a column as object, then the same one as categorical in 1 shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I understand... I was trying to show how appending two categorical columns can result in an object column if they don't have the same categories and showing the big difference in storage of an object vs category column. Is that still what we're aiming for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i understand. this example can be pretty short though. you can have 1 column that matches exactly what you are appending and is a category and 1 that is not and turns to object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I'll give it another shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback.
closing as stale. if you'd like to continue pls ping. |