Skip to content

Code for a talk on wrangling large datasets in pandas

Notifications You must be signed in to change notification settings

rishigoutam/nycdemo

Repository files navigation

NYC DSA May 18 Talk

Code for a talk on wrangling large datasets in pandas. The presentation slides are here. Video of talk to be uploaded soon.

The talk covers

  • Managing pandas dataframe memory usage through downcasting types
  • Using pre-commit with nbstripout, black, and isort to have good code quality in Jupyter notebooks
  • Using dask when data just doesn't fit in memory
  • Moving from CSV to columnar data stores, such as parquet
  • Using SQL when data is large enough that python is no longer an option

About

Code for a talk on wrangling large datasets in pandas

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published