- Work through various machine learning algorithms using simple toy data-sets.
- Leverage Python packages for machine learning and data science.
- Perform practical experiments using whats already taught in the lectures.
- Understand the development and combination of machine learning methods to tackle a given problem.
- Explore visualization techniques used in data science.
- Discover what is deep learning?
- Explore GPU and parallel computing for ML algorithms in Python.
- Explore LaTeX for writing scientific paper.
- Four tutorial assignments (T1, T2, T3, T4) in syllabus
- 3% each
- Consisting of coding exercises
- Follow instructions very carefully, code must compile and run on Python 3.6 (environment we setup during tutorials)
I’m Shivam Kalra, MASc. candidate under supervision of Prof. Hamid R. Tizhoosh. My research interests are in Deep learning for Image Analysis. I mostly work with Python for my research. My favorite distro is Arch Linux and editor is Emacs.
shivam.kalra \at uwaterloo.ca | |
Office Hours | Thursday, 2:30-5:00 PM |
Office Location | EC4, 2007 A |
https://goo.gl/forms/CG42LP9sMsth73Ej1
- Explore some open data-sets (UCI and Kaggle)
- Explore some Kaggle competitions
- Explore sample projects from past years
- Setup Python environment and some packages
- Need data-sets to work with any machine learning algorithms
- Compare your approach with others
- https://archive.ics.uci.edu/ml/datasets.html
- https://www.data.gov/
- https://www.kaggle.com/datasets
- Twitter streaming API
- Find a problem that is not solved properly, correctly, efficiently
- Analyze the problem (is data-set available?)
- Select an approach: decision tree, fuzzy logic, reinforcement learning, deep learning.
- Justify (empirically) the choice of approach over other possibilities.
- Design/customize (fine-tuning, parameter selection) the approach for the problem.
- Train your ML models (use bagging, K-fold validation and etc)
- Fine tune your models for best AUC.
- Try it on testing data (you cannot you test data for training NO NO NO)
- Compare results with others, ensemble models for better accuracy/results.
- Project could be in any field:
- Sports
- Audio/Music/Multimedia
- Computer Vision
- Finance/Commerce
- Natural Language Processing/Sentiment Analysis
- Image Retrieval (talk to Prof. Tizhoosh or any of the TA for the data-set)
- Visualization
- Look for interesting Kaggle data-sets.
- Active Kaggle competitions?
- Improve/compare/survey existing ML algorithms (you can use any open data-sets).
- Some projects from stanford courses:
We will be using Python
for the tutorials, but you’re free to use any language
or OS for the final project. However, only Python 3.6 must be used for
tutorial’s assignments.
- Python 3.6
- Packages for now:
matplotlib
,scikit-learn
,numpy
,scipy
,pandas
,jupyter
- I suggest you to use Anaconda Python 3.6 bundle
I encourage to use Linux environment for easier development workflow.
Use https://www.anaconda.com/download/#linux to download anaconda for your OS.