Note, this is a draft and is subject to change
This course is an introduction to the field of data-driven AI and Machine Learning using Python. It will start with the hands-on introduction of essential Python tools and libraries used for manipulating, visualizing and transforming data such as NumPy, Pandas, Matplotlib, Seaborn and IPython. It will then build on top of those tools to introduce some of the most established and frequently used Machine Learning algorithms using the Scikit-Learn library.
Python Data Science Handbook: Essential Tools for Working with Data
(by Jake VanderPlas)
Free full text:
here
under CC-BY-NC-ND license.
GitHub repository: https://github.com/jakevdp/PythonDataScienceHandbook
NOTE: This is a handbook style book with each chapter providing a fairly deep exploration of its subject. The course will be based on the first few sections of each chapter. This will provide a gentle practical introduction into the field while leaving the student with a clear path towards a deeper exploration of each subject.
-
Week, book chapter 1
- Introduction to the field of data-based AI and the relationship between AI, Machine Learning and Data Science
- Historic overview
- Overview of the book, tools and libraries used in the course
- Introduction to IPython and Jupyter Notebooks and a quick recap of Python
- Administrative
-
Week, quiz, book chapter 1 and chapter 2
- Introduction to plotting data with Matplotlib
- Introduction to NumPy
- Vectorized computation vs. Python loops
- Two-dimensional arrays and NumPy broadcasting
- Slicing NumPy arrays
-
Week, quiz, book chapters 3 and 4
- Introduction to Pandas
- Working with public datasets, introduction to Kaggle
- Handling Missing Data
-
Week, quiz, lab test, book chapters 3 and 4
- More on Pandas and visualization
- Exploring data with descriptive statistics
- Correlation and linear fitting
- Advanced DataFrame manipulations
-
Week, quiz, Assignment 1 due
- Walkthrough of a real-world data analysis project
-
Week, quiz
- Patterns and structure in data
- Clustering
- Dealing with more than 3 dimensions
- Recap of the concept of distance metric
- Dimensionality reduction for visualization
-
Week
- MID-TERM EXAM
-
Week 8
- INTERSESSION WEEK (week 8)
-
Week, quiz, assignment 2 due, book chapter 5 from here and for the rest of course
- Introduction to Statistical Machine Learning
- Terminology: Supervised vs Unsupervised ML, Classification vs Regression
- The concept of “model”
- Linear regression as a simple example of machine learning
- K nearest neighbours (KNN) algorithm as both regression and classifier.
- Introduction to scikit-learn API
-
Week, quiz
- Separation of test and training data
- Overfitting and underfitting
- Evaluation
-
Week, quiz, lab test
- Multidimensional regression
- Regularization (Ridge and Lasso regression)
- Feature engineering
- More on evaluation
-
Week, quiz, assignment 3 due
- Multi-class classifiers
- SVM
- Decisions trees
-
Week, quiz
- Recommender systems
- Walkthrough of a real-world ML project
-
Week, Assignment 4 due
- Overview of other related tools and resources with examples.
-
Week 15
- FINAL EXAM