Tutorial 1

What’s expected?

Work through various machine learning algorithms using simple toy data-sets.
Leverage Python packages for machine learning and data science.
Perform practical experiments using whats already taught in the lectures.
Understand the development and combination of machine learning methods to tackle a given problem.
Explore visualization techniques used in data science.
Discover what is deep learning?
Explore GPU and parallel computing for ML algorithms in Python.
Explore LaTeX for writing scientific paper.

Tutorial Assignments

Four tutorial assignments (T1, T2, T3, T4) in syllabus
3% each
Consisting of coding exercises
Follow instructions very carefully, code must compile and run on Python 3.6 (environment we setup during tutorials)

About me

I’m Shivam Kalra, MASc. candidate under supervision of Prof. Hamid R. Tizhoosh. My research interests are in Deep learning for Image Analysis. I mostly work with Python for my research. My favorite distro is Arch Linux and editor is Emacs.

Office Hours/Contacts

Email	shivam.kalra \at uwaterloo.ca
Office Hours	Thursday, 2:30-5:00 PM
Office Location	EC4, 2007 A

Quick survey

https://goo.gl/forms/CG42LP9sMsth73Ej1

Today’s agenda

Explore some open data-sets (UCI and Kaggle)
Explore some Kaggle competitions
Explore sample projects from past years
Setup Python environment and some packages

Why open data-sets

Need data-sets to work with any machine learning algorithms
Compare your approach with others

Resources for open data-sets

https://archive.ics.uci.edu/ml/datasets.html
https://www.data.gov/
https://www.kaggle.com/datasets
Twitter streaming API

General outline of final project

Find a problem that is not solved properly, correctly, efficiently
Analyze the problem (is data-set available?)
Select an approach: decision tree, fuzzy logic, reinforcement learning, deep learning.
Justify (empirically) the choice of approach over other possibilities.
Design/customize (fine-tuning, parameter selection) the approach for the problem.
Train your ML models (use bagging, K-fold validation and etc)
Fine tune your models for best AUC.
Try it on testing data (you cannot you test data for training NO NO NO)
Compare results with others, ensemble models for better accuracy/results.

Finding the problem for final project

Project could be in any field:
- Sports
- Audio/Music/Multimedia
- Computer Vision
- Finance/Commerce
- Natural Language Processing/Sentiment Analysis
- Image Retrieval (talk to Prof. Tizhoosh or any of the TA for the data-set)
- Visualization
Look for interesting Kaggle data-sets.
Active Kaggle competitions?
- https://www.kaggle.com/competitions
Improve/compare/survey existing ML algorithms (you can use any open data-sets).
Some projects from stanford courses:

Environment

We will be using Python for the tutorials, but you’re free to use any language or OS for the final project. However, only Python 3.6 must be used for tutorial’s assignments.

Python 3.6
Packages for now: matplotlib, scikit-learn, numpy, scipy, pandas, jupyter
I suggest you to use Anaconda Python 3.6 bundle

Setup instruction for Python Environment

I encourage to use Linux environment for easier development workflow.

Use https://www.anaconda.com/download/#linux to download anaconda for your OS.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tut1.org

tut1.org

Tutorial 1

What’s expected?

Tutorial Assignments

About me

Office Hours/Contacts

Quick survey

Today’s agenda

Why open data-sets

Resources for open data-sets

General outline of final project

Finding the problem for final project

Environment

Setup instruction for Python Environment

Files

tut1.org

Latest commit

History

tut1.org

File metadata and controls

Tutorial 1

What’s expected?

Tutorial Assignments

About me

Office Hours/Contacts

Quick survey

Today’s agenda

Why open data-sets

Resources for open data-sets

General outline of final project

Finding the problem for final project

Environment

Setup instruction for Python Environment