Skip to content

Latest commit

 

History

History
 
 

opends4all-data-wrangling-and-integration

Data Wrangling and Integration

This set of modules covers the critical topics of data wrangling and data integration. Many data scientists, data engineers, and data consultants report that over 80% of their time is spent on these tasks -- rather than on actual data analysis or machine learning. This module introduces the basic concepts, and the major Python (especially Pandas) capabilities for data wrangling and cleaning. It also makes use of the Magellan Python record linking package by Doan and his students (http://pages.cs.wisc.edu/~anhai/papers/magellan-tr.pdf). We consider this module to be of "mixed" difficulty, with some basic and some intermediate components.

Additional instructor notes are available.

Directory Contents

  • Data wrangling and import lecture materials:

    • DATA-WRANGLING-import-link-mixed slides. Section headings indicate difficulty level (basic vs intermediate).
    • DATA-WRANGLING-import-link-mixed companion Jupyter notebook
  • Homework notebook

Release History

  • Initial release, Susan Davidson and Zachary Ives, University of Pennsylvania, February 2020.