Skip to content

Latest commit

 

History

History
107 lines (84 loc) · 3.76 KB

tut1.org

File metadata and controls

107 lines (84 loc) · 3.76 KB

Tutorial 1

What’s expected?

  • Work through various machine learning algorithms using simple toy data-sets.
  • Leverage Python packages for machine learning and data science.
  • Perform practical experiments using whats already taught in the lectures.
  • Understand the development and combination of machine learning methods to tackle a given problem.
  • Explore visualization techniques used in data science.
  • Discover what is deep learning?
  • Explore GPU and parallel computing for ML algorithms in Python.
  • Explore LaTeX for writing scientific paper.

Tutorial Assignments

  • Four tutorial assignments (T1, T2, T3, T4) in syllabus
  • 3% each
  • Consisting of coding exercises
  • Follow instructions very carefully, code must compile and run on Python 3.6 (environment we setup during tutorials)

About me

I’m Shivam Kalra, MASc. candidate under supervision of Prof. Hamid R. Tizhoosh. My research interests are in Deep learning for Image Analysis. I mostly work with Python for my research. My favorite distro is Arch Linux and editor is Emacs.

Office Hours/Contacts

Emailshivam.kalra \at uwaterloo.ca
Office HoursThursday, 2:30-5:00 PM
Office LocationEC4, 2007 A

Quick survey

https://goo.gl/forms/CG42LP9sMsth73Ej1

Today’s agenda

  • Explore some open data-sets (UCI and Kaggle)
  • Explore some Kaggle competitions
  • Explore sample projects from past years
  • Setup Python environment and some packages

Why open data-sets

  • Need data-sets to work with any machine learning algorithms
  • Compare your approach with others

Resources for open data-sets

General outline of final project

  1. Find a problem that is not solved properly, correctly, efficiently
  2. Analyze the problem (is data-set available?)
  3. Select an approach: decision tree, fuzzy logic, reinforcement learning, deep learning.
  4. Justify (empirically) the choice of approach over other possibilities.
  5. Design/customize (fine-tuning, parameter selection) the approach for the problem.
  6. Train your ML models (use bagging, K-fold validation and etc)
  7. Fine tune your models for best AUC.
  8. Try it on testing data (you cannot you test data for training NO NO NO)
  9. Compare results with others, ensemble models for better accuracy/results.

Finding the problem for final project

Environment

We will be using Python for the tutorials, but you’re free to use any language or OS for the final project. However, only Python 3.6 must be used for tutorial’s assignments.

  • Python 3.6
  • Packages for now: matplotlib, scikit-learn, numpy, scipy, pandas, jupyter
  • I suggest you to use Anaconda Python 3.6 bundle

Setup instruction for Python Environment

I encourage to use Linux environment for easier development workflow.

Use https://www.anaconda.com/download/#linux to download anaconda for your OS.