Skip to content

Latest commit

 

History

History
160 lines (117 loc) · 14.7 KB

README.md

File metadata and controls

160 lines (117 loc) · 14.7 KB

Tutorial On Machine Learning

Python and R tutorial on Machine Learning

tree classifiers

Photo by Lukasz Szmigiel on Unsplash

 

This section is dedicated to tutorials about linear algebra (principle of mathematics for machine learning), machine learning algorithms (clustering, linear regression, classification, and so on), data science basics (data frame, data visualization, etc...), and principles of graph theory.

In this section, you will find the Jupiter Notebook for the tutorial I published in Medium. I suggest reading the tutorial and the companion tutorial code in the order provided in the table below. For practical reasons, I have divided some of the tutorials into more than one part (allowing me to concentrate in one of the tutorials on the theoretical part and the others on the programming). Tutorial dedicated only to the theory have not a linked Jupiter notebook containing the Python code used for the model and the graph. I wrote and test the code in Google Colab in order to make it reproducible.

I am progressively adding also some R tutorials, I decided to upload the R-scripts so you can test them. Check the table below where I list the Colab Notebooks, the R-scripts, and the companion articles.

Moreover, you may find here some Colab notebooks without a theoretical tutorial (yet). I decided to upload the code before I have finish to write the theoretical part (this would be indicated). I am convinced that the code alone is already beneficial. I would successively publish on Medium the written article (with details and comments on the code).

You can open a Github Issue for any request, comment or any issue you encounter.

Index

  • Tutorial List - The list of tutorials and corresponding code
  • Utility - A list of functions and code you can use for your projects.
  • Scripts - A list of scripts you can execute on your PC.

Tutorials

Tutorial Notebook Description
Data manipulation notebook Common data manipulation tasks and data issues - MEDIUM ARTICLE NOT YET PUBLISHED
Pandas Cheatsheet notebook Introduction to Pandas library - MEDIUM ARTICLE NOT YET PUBLISHED
Python Data Visualization notebook Introduction to data visualization with Python- MEDIUM ARTICLE NOT YET PUBLISHED
Regular expression in Python notebook Regular expression in Python - MEDIUM ARTICLE NOT YET PUBLISHED
Matrix operations for machine learning notebook Matrix operations for machine learning in Python - MEDIUM ARTICLE NOT YET PUBLISHED
Matrix operations for machine learning - part 2 notebook Matrix operations for machine learning in Python, the second part - MEDIUM ARTICLE NOT YET PUBLISHED
Tree classifiers ---- Introduction to tree classifiers, theory and math explained simple - MEDIUM ARTICLE NOT YET PUBLISHED
Tree classifiers notebook Training of tree classifiers - MEDIUM ARTICLE NOT YET PUBLISHED
Visualize decision tree notebook Visualization of decision tree - MEDIUM ARTICLE NOT YET PUBLISHED
Train and visualize decision tree in R R-script Plot and visualize a decision tree in R - MEDIUM ARTICLE NOT YET PUBLISHED
Evaluation metrics for classification - part I notebook How to calculate, code, and interpret evaluation metrics for classification - MEDIUM ARTICLE NOT YET PUBLISHED
Evaluation metrics for classification - part II --- Part II about imbalance dataset and multiclass classification - MEDIUM ARTICLE NOT YET PUBLISHED
Linear Regression - OLS notebook Linear regression introduction, least square method - MEDIUM ARTICLE NOT YET PUBLISHED
Evaluation metrics for regression notebook Evaluation metrics for regression - MEDIUM ARTICLE NOT YET PUBLISHED
Train and visualize regression tree notebook Train, visualize regression decision tree in Python- MEDIUM ARTICLE NOT YET PUBLISHED
Linear regression in R R-script Train and visualize a linear regression model in R- MEDIUM ARTICLE NOT YET PUBLISHED
Introduction to Python iGraph Notebook A notebook to refresh the use of Python iGraph
Introduction to R iGraph Notebook A notebook to refresh the use of Python iGraph
Introduction to point processing Jupiter Notebook Whether you are doing medical image analysis or you use Photoshop, you are using point preprocessing
Introduction to Thresholding Jupiter Notebook A simple but powerful system for segmenting images
A practical guide to neighborhood image processing Jupiter Notebook Love thy neighbors: How the neighbors are influencing a pixel
A practical guide to morphological image processing Jupiter Notebook simple but powerful operations to analyze images
Dividi et Impera: A Practical Guide to BLOB Analysis and Extraction with Python Jupiter Notebook Simple yet powerful techniques to extract objects.
Harnessing the power of colors in Python Jupiter Notebook Color images have more hidden information than you think
Image Segmentation with Simple and Elegant Methods Jupiter Notebook Why the need for a deep learning model with hundreds of layers? Sometimes, there are simpler and faster models.
A Guide to Geometric Transformation with Python Jupiter Notebook Why the need for Photoshop when you can have fun with Python
Graph ML: A Gentle Introduction to Graphs -- A deep introduction to these mysterious creatures.
Graph ML: fantastic graphs and where to find them -- Why to use a graph? which application?
Graph ML: introduction to NetworkX Jupiter Notebook How to start with handle graph in Python using the most popular library
Graph ML: Graph traversal algorithms in a nutshell Jupiter Notebook A quick glance at bread-first and depth-first search algorithms for graph machine learning
Graph ML: Introduction to Python iGraph Jupiter Notebook Python iGraph is a wide-use library to handle graphs. how do start using it? why?
Graph ML: How Do you Visualize a Large network? Jupiter Notebook Seeing is understanding: How to visualize large networks

Back to General Index -- Back to local index  

Utility

I am providing some useful functions and classes that can be ready to use. I am providing them as executable Python files that you can import and use. You find them in the utility folder.

Check in the utiliy folder the example of usages and the explanation about them. Each function is a document and you can access the provided documentation.

For example, if you want to use my regression_report function in Colab you can import it in this way:

import sys
import os

user = "SalvatoreRa"
repo = "tutorial"
src_dir = "machine%20learning/utility/"
pyfile = "regression_report.py" #here the name of the file py

url = f"https://raw.githubusercontent.com/{user}/{repo}/main/{src_dir}/{pyfile}"
!wget --no-cache --backups=1 {url}
#copy here the link of the file
py_file_location = "https://github.com/SalvatoreRa/tutorial/blob/main/machine%20learning/utility/regression_report.py"
sys.path.append(os.path.abspath(py_file_location))
#here the importing
from regression_report import regression_report 

Or alternatively, you can use in this way in Colab:

wget.download('https://raw.githubusercontent.com/SalvatoreRa/tutorial/main/machine learning/utility/utils_NA.py')
!pip install wget 
from utils import *
import torch
import seaborn as sns

#generate different type of NA
X_miss_mcar = produce_NA(df, p_miss=0.4, mecha="MCAR")
X_miss_mar = produce_NA(df, p_miss=0.4, mecha="MAR", p_obs=0.5)
X_miss_mnar = produce_NA(df, p_miss=0.4, mecha="MNAR", opt="logistic", p_obs=0.5)
X_miss_quant = produce_NA(df, p_miss=0.4, mecha="MNAR", opt="quantile", p_obs=0.5, q=0.3)
File Description
Regression report Print different regression metric (similar to classification report of scikit-learn)
Upset plot Plot an upset plot to visualize missing data and their distribution in the columns
Random NA generation Introduces random missing values into a dataset.
Utils NA a set of utils to generate and insert NA in your dataset
DR_utils a set of utils for dimensional reduction techniques
Correlation_utils a set of utils for correlation dimension

Back to General Index -- Back to local index    

Scripts

Here you can find a list of scripts that have been used to generate images for the tutorials or that can be used to analyze data and models. You can easily adapt to your needs.

For example, if you want to use my MAR script in your pc you can simply execute it in this way:

python3 MAR.py

Or alternatively:

python3.8 MAR.py
File Description
MAR Loop to test different algorithms for MAR missing value imputation. The script is generating missing values, testing different imputation methods, and generating the plots
MNAR Loop to test different algorithms for MNAR missing value imputation. The script is generating missing values, testing different imputation methods, and generating the plots
MCAR Loop to test different algorithms for MCAR missing value imputation. The script is generating missing values, testing different imputation methods, and generating the plots

Back to General Index -- Back to local index    

Contributing

License

This project is licensed under the MIT License

Bugs/Issues

Comment or open an issue on Github