Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc: Adds example of categorical data for efficient storage and consistency across DataFrames #19245

Closed
wants to merge 1 commit into from

Conversation

pdpark
Copy link

@pdpark pdpark commented Jan 15, 2018

@codecov
Copy link

codecov bot commented Jan 15, 2018

Codecov Report

Merging #19245 into master will decrease coverage by 0.24%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19245      +/-   ##
==========================================
- Coverage   91.84%    91.6%   -0.25%     
==========================================
  Files         153      150       -3     
  Lines       49295    48864     -431     
==========================================
- Hits        45276    44761     -515     
- Misses       4019     4103      +84
Flag Coverage Δ
#multiple 89.97% <ø> (-0.27%) ⬇️
#single 41.75% <ø> (-0.15%) ⬇️
Impacted Files Coverage Δ
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/core/dtypes/base.py 47.61% <0%> (-44.28%) ⬇️
pandas/plotting/_compat.py 62% <0%> (-28.91%) ⬇️
pandas/core/arrays/base.py 60% <0%> (-24.15%) ⬇️
pandas/io/s3.py 72.72% <0%> (-13.64%) ⬇️
pandas/core/missing.py 85.9% <0%> (-5.75%) ⬇️
pandas/io/formats/terminal.py 16.43% <0%> (-4.55%) ⬇️
pandas/plotting/_timeseries.py 60.82% <0%> (-4.49%) ⬇️
pandas/core/reshape/tile.py 90.25% <0%> (-3.12%) ⬇️
pandas/io/html.py 85.98% <0%> (-2.81%) ⬇️
... and 67 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7ed1f53...d579ab6. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be a less drawn out example, while still including all of the info. can you post a screen shot of the resulting page.


.. ipython:: python

import pandas as pd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we include the standard import (pandas and numpy) anywhere else, they are assumed

np.random.seed(1234)
pd.set_option('max_rows',10)
uniques = np.array(list(string.ascii_letters))
uniques
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this less verbose on the creation here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's this?
UniqCats.pdf

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to simplify this

@jreback jreback added the Docs label Jan 15, 2018

df1 = pd.DataFrame({'A': domain.take(np.random.randint(0,12,size=100000))})
df2 = pd.DataFrame({'A': domain.take(np.random.randint(8,20,size=100000))})
print('df1.A Data Type:', df1.A.dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to use print, the output is printed.

you can use comments to indicate what's important

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.

@@ -1318,3 +1318,96 @@ of the data values:
'weight': [100, 140, 180],
'sex': ['Male', 'Female']})
df

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make ref to this section

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also link to the categorical.rst section

df3 = df1.append(df2)
df3.memory_usage()

.. ipython:: python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you are going to put text in between the diferent code samples it makes sense to have them in separate ipython blocks, otherwise it doesn't


df3.A.dtype

The data type of all the columns is Object. Using Categorical columns should improve memory usage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should -> will


.. ipython:: python

dfc1 = df1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to copy here
when assigning use dfc1['A'] = ....

dfc3.memory_usage()

.. ipython:: python

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can greatly simplify this entire example and just show the middle section. just show a column as object, then the same one as categorical in 1 shot.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand... I was trying to show how appending two categorical columns can result in an object column if they don't have the same categories and showing the big difference in storage of an object vs category column. Is that still what we're aiming for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i understand. this example can be pretty short though. you can have 1 column that matches exactly what you are appending and is a category and 1 that is not and turns to object.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'll give it another shot.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback.

@jreback
Copy link
Contributor

jreback commented Nov 1, 2018

closing as stale. if you'd like to continue pls ping.

@jreback jreback closed this Nov 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

append a categorical with different categories to the existing
2 participants