Skip to content

Latest commit

 

History

History
79 lines (73 loc) · 3.72 KB

Outline.md

File metadata and controls

79 lines (73 loc) · 3.72 KB

COMP 3122 Outline Draft

Note, this is a draft and is subject to change

Course description

This course is an introduction to the field of data-driven AI and Machine Learning using Python. It will start with the hands-on introduction of essential Python tools and libraries used for manipulating, visualizing and transforming data such as NumPy, Pandas, Matplotlib, Seaborn and IPython. It will then build on top of those tools to introduce some of the most established and frequently used Machine Learning algorithms using the Scikit-Learn library.

Book

Python Data Science Handbook: Essential Tools for Working with Data (by Jake VanderPlas)
Free full text: here under CC-BY-NC-ND license.
GitHub repository: https://github.com/jakevdp/PythonDataScienceHandbook

NOTE: This is a handbook style book with each chapter providing a fairly deep exploration of its subject. The course will be based on the first few sections of each chapter. This will provide a gentle practical introduction into the field while leaving the student with a clear path towards a deeper exploration of each subject.

Outline

  1. Week, book chapter 1

    • Introduction to the field of data-based AI and the relationship between AI, Machine Learning and Data Science
    • Historic overview
    • Overview of the book, tools and libraries used in the course
    • Introduction to IPython and Jupyter Notebooks and a quick recap of Python
    • Administrative
  2. Week, quiz, book chapter 1 and chapter 2

    • Introduction to plotting data with Matplotlib
    • Introduction to NumPy
    • Vectorized computation vs. Python loops
    • Two-dimensional arrays and NumPy broadcasting
    • Slicing NumPy arrays
  3. Week, quiz, book chapters 3 and 4

    • Introduction to Pandas
    • Working with public datasets, introduction to Kaggle
    • Handling Missing Data
  4. Week, quiz, lab test, book chapters 3 and 4

    • More on Pandas and visualization
    • Exploring data with descriptive statistics
    • Correlation and linear fitting
    • Advanced DataFrame manipulations
  5. Week, quiz, Assignment 1 due

    • Walkthrough of a real-world data analysis project
  6. Week, quiz

    • Patterns and structure in data
    • Clustering
    • Dealing with more than 3 dimensions
    • Recap of the concept of distance metric
    • Dimensionality reduction for visualization
  7. Week

    • MID-TERM EXAM
  8. Week 8

    • INTERSESSION WEEK (week 8)
  9. Week, quiz, assignment 2 due, book chapter 5 from here and for the rest of course

    • Introduction to Statistical Machine Learning
    • Terminology: Supervised vs Unsupervised ML, Classification vs Regression
    • The concept of “model”
    • Linear regression as a simple example of machine learning
    • K nearest neighbours (KNN) algorithm as both regression and classifier.
    • Introduction to scikit-learn API
  10. Week, quiz

    • Separation of test and training data
    • Overfitting and underfitting
    • Evaluation
  11. Week, quiz, lab test

    • Multidimensional regression
    • Regularization (Ridge and Lasso regression)
    • Feature engineering
    • More on evaluation
  12. Week, quiz, assignment 3 due

    • Multi-class classifiers
    • SVM
    • Decisions trees
  13. Week, quiz

    • Recommender systems
    • Walkthrough of a real-world ML project
  14. Week, Assignment 4 due

    • Overview of other related tools and resources with examples.
  15. Week 15

    • FINAL EXAM