This repo contains a series of notebooks on various aspects applied machine learning. The focus for much of the beginning is on classical learning, both supervised and unsupervised. Deep learning will be visited later on after foundations are established. The use of visualizations as a tool for understanding ML algorithms is highly emphasized throughout.
These tutorials originated from a series of talks I gave in 2018 for the IT division at Virginia Tech. Since these were applied talks, I didn't initially plan to go into details on how to actually implement the algorithms. I am starting to put together notebooks to address the fundamentals more, mainly to help with those preparing data science interviews.
Regarding theory, I assume no background other than being able to read algebraic equations and some basic linear algebra (e.g. matrix multiplication, vectors). Knowledge of probability theory shouldn't be essential for going through most of the tutorials, but may be helpful in some areas. Similarly, if calculus shows up somewhere, you can probably ignore it and be fine. On the practical side of things, I assume that readers are familiar with python and basic programming concepts in general.
- Basics (Part 1): Python, Jupyter, and Math (WIP)
- Basics (Part 2): Arrays, Dataframes, and Plotting
- Binary Classification
- Regression
- Dimensionality Reduction
- Clustering
- Unbalanced Data and Supervised Anomaly Detection
- End-to-End ML (Part 1): Exploratory Data Analysis
- End-to-End ML (Part 2): Data Cleaning and Feature Engineering
- End-to-End ML (Part 3): Model Selection and Deployment (WIP)
- PyTorch: Tensors, Autodiff, Cuda
- Neural Networks (Part 1): Basics
- Neural Networks (Part 2): Going Deeper
- Neural Networks (Part 3): Optimizers
- Convolutional Neural Networks (Part 1): Basics
- Convolutional Neural Networks (Part 2): Advanced (WIP)
- Recurrent Neural Networks (WIP)
- Transformer Networks (WIP)
- Unsupervised Anomaly Detection
- Time Series Basics
- Machine Learning with Time Series
- ML Deep Dive: Probability and Naive Bayes
- Generating Random Numbers
Tutorials are added periodically. Check back.
If you want to view the notebooks without downloading them, you can just click the links here in the README, or use the Jupyter nbviewer website, where you can just paste the URL to the notebook and it should render correctly.
To setup an environment to run the notebooks on your own machine, you should first have git and anaconda installed and working on your computer. Once you've done this, do the following in sequence to setup your environment. Note: If running on Windows (sorry bro) you may need to modify these commands slightly.
conda create --name ml_tutorials python=3.6 pip
conda activate ml_tutorials
# recommend making sure your shell is using the python and pip packages inside your anaconda path (e.g. ~/anaconda3/.../python)
which python && which pip
# if python or pip path is something different (e.g. /usr/local/bin/python), restart your shell and try again
cd <directory where you want repo to go>
git clone https://github.com/rkingery/ml_tutorials.git
cd ml_tutorials
pip install -r requirements.txt
python -m spacy download en_core_web_sm
Once your environment is setup, each time you want to use it I recommend following a workflow something like this.
conda activate ml_tutorials
cd <path to ml_tutorials>
git pull
pip install -r requirements.txt # occasionally new packages may be added
jupyter notebook # should launch a jupyter session in your browser
conda deactivate # when finished with session
TBD.