This project has been created for the exam of Machine Learning of the Master's Degree course at the University of Turin.
The project is divided in 4 sections:
- Decision Tree Models
- Distance Based Models
- Linear Models
- Probabilistic Models
In this section there is the iris_classification.ipynb
.
The purpose of this notebook is to manipulate the sk-learn iris dataset by applying transformations to the data within it and training different Decision Tree classifiers.
Then we analyze the performance of the classifiers according to different metrics including the accuracy score, f1 score and plotting the ROC curves.
In this section we find two python notebooks.
In iris_classification_knn.ipynb
we compare the prediction results obtained by decision trees and k-nearest neighbors on the dataset Iris. We use different types of weight (uniform or distance) to test prediction accuracy of k-nn and we verify which is the best k to choose for the (split) dataset at hand.
Finally we test and tune the gamma hyperparameter of a Radial Basis Function (RBF) Kernel used in k-nn as weight between data-points.
In clustering.ipynb
we use and compare different clustering algorithms: K-means and DBScan.
In this section we apply the support vector machines to an artificial dataset built by my professor.
In this section we compare two models for categorical data probabilistic modeling:
- multivariate Bernoulli
- multinomial on a dataset
We adopt a dataset on Twitter messages labelled with emotions (Joy vs Sadness).
- Clone repository
- Create a virtual environment and activate it
- Install all the required libraries through
pip install -r requirements.txt
- Open and launch any notebook
Libraries used: