NYC DSA May 18 Talk

Code for a talk on wrangling large datasets in pandas. The presentation slides are here. Video of talk to be uploaded soon.

The talk covers

Managing pandas dataframe memory usage through downcasting types
Using pre-commit with nbstripout, black, and isort to have good code quality in Jupyter notebooks
Using dask when data just doesn't fit in memory
Moving from CSV to columnar data stores, such as parquet
Using SQL when data is large enough that python is no longer an option

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
demonotebook.ipynb		demonotebook.ipynb
memory_profiler.ipynb		memory_profiler.ipynb
merge_data_with_dask.ipynb		merge_data_with_dask.ipynb

Provide feedback