Skip to content

Latest commit

 

History

History
9 lines (8 loc) · 769 Bytes

README.md

File metadata and controls

9 lines (8 loc) · 769 Bytes

NYC DSA May 18 Talk

Code for a talk on wrangling large datasets in pandas. The presentation slides are here. Video of talk to be uploaded soon.

The talk covers

  • Managing pandas dataframe memory usage through downcasting types
  • Using pre-commit with nbstripout, black, and isort to have good code quality in Jupyter notebooks
  • Using dask when data just doesn't fit in memory
  • Moving from CSV to columnar data stores, such as parquet
  • Using SQL when data is large enough that python is no longer an option