- By using Scikit Learn python library we analyse 2 datasets (crimes and startup) and the impact of PCA dimension reduction on the data and how we can rewrite the data in a lesser dimension.
- In a second part on the villes.csv dataset we compare the use of Kmeans, AgglomerativeClustering:ward and AgglomerativeClustering:average on predicting appropriate clusters of cities after applying PCA dimension reduction on the datasets
- 2 university projects on supervised learning using Scikit Learn library.
- First project comparing classification tasks with decision tree and KNN without any data pre-processing. In a second part we compare the performance between gradient boosting; random forests and logistic regression for classification task.
- Second project we compare: decision tree: CART and ID3, KNN, NaivesBayes, Random Forest, Bagging, MultilayerPerceptron and Adaboost on a classification task, we fine-tune the hyper parameters of each algorithm and we set-up pipelines. In a second part we learn how to deal with heterogeneous datset and how to deal with missing data for numerical and categorical. Then in a last part we learn how to use text as input for a classification task SPAM or NOT SPAM
- By using Scikit Learn python library we deploy a isolation forest model for anomaly detection a a mickey mouse figure. We learn how to fine-tune the hyper parameters of the algorithm.