Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optional memory usage optimization and show mem usage #437

Merged
merged 2 commits into from
Feb 19, 2021

Conversation

AnthraX1
Copy link
Contributor

@AnthraX1 AnthraX1 commented Feb 19, 2021

  1. Allow memory usage optimizaton of large dataframes by converting object columns to Category type (https://www.dataquest.io/blog/pandas-big-data/)

  2. show instance memory usage in Instances pop up (A bit slower to open due to calculating deep mem usage)

Copy link
Collaborator

@aschonfeld aschonfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this is great idea. As a way to make the Instances popup not take longer to load. I'm going to adjust this code after I merge it so that I load the "Memory" column separately and you'll see a "spinner" in that column value until its populated.

I also think this presents an interesting option to build out some UI for allowing users to "Optimize" data they have already loaded and do a memory comparison before accepting the optimization. Nice work!

num_unique_values = len(df[col].unique())
num_total_values = len(df[col])
if num_unique_values / num_total_values < 0.5:
df[col] = df[col].astype("category")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was always told to use .iloc, but I heard differing things. Do you know which is more correct?

Suggested change
df[col] = df[col].astype("category")
df.loc[:, col] = df[col].astype("category")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they are the same?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Centropy-io 🙏 I remember pandas throwing warnings at one point in time if you didnt use iloc because of setting values on a copy, but in this instance it doesnt apply.

@aschonfeld aschonfeld merged commit 0f2b3a9 into man-group:master Feb 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants