Skip to content

Several datasets are manipulated, visualized, and analyzed with well-known ML Algorithms to make predictions, clustering, or classifications.

License

Notifications You must be signed in to change notification settings

volkansonmez/Exploratory_Data_Analysis_and_ML_Projects

Repository files navigation

Volkan Sonmez's Machine Learning Projects

© 2018 - current, Volkan Sonmez, www.pythonicfool.com

This is a repository of teaching materials, code, and data for my data analysis and machine learning projects.

Each repository will (usually) correspond to one of the posts on my website.

You are free to:

  • Share—copy and redistribute the material in any medium or format
  • Adapt—remix, transform, and build upon the material

Under the following terms:

  • Attribution—You must give appropriate credit (mentioning that your work is derived from work that is © Volkan Sonmez and, where practical, linking to http://www.pythonicfool.com/), and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. License

List of Exploratory Data Analysis and Machine Learning Projects:

This is a timeseries dataset showing hourly temperature values for one year. Kmeans++ is written from scracth for clustering the data and ADTK is used for anomaly detection. The dataset can be obtained at: https://www.kaggle.com/boltzmannbrain/nab


Several laughters in .wav format are analyzed with Librosa and Matplotlib Libraries. Convolutional NN are used to make predictions. The dataset can be found in the 'laugh' and 'laugh_test' folders. There are 22 laughter files in total. Some sound sincere and some sound fake. The gray scale mel spectogram images of the laughter audio files are trained and tested.


Breast Cancer Dataset is analyzed with Pandas, Seaborn, and Matplot Libraries. Decision Tree & XGBoost models are trained to make a prediction with 95% and 97% accuracy respectively. The dataset can be obtained at: http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28diagnostic%29


DGL tutorials are simplified with examples. Deep Graph Library is a great tool to do node classification, edge classification, and graph classification. It has its own tutorial datasets. This notebook has detailed analysis of CoraDataSet and MiniGCDatasets with dgl.nn module. https://www.dgl.ai/


Framingham Dataset is analyzed with Pandas, Seaborn, and Matplot Libraries. KNN, Logistic Regression Classifier, and a One Layer Neural Network are applied to the dataset. Raw framingham.csv dataset is downloaded from Kaggle.


Famous MNIST Dataset is analyzed with Pandas, Seaborn, and Matplot Libraries. Pytorch and TF-Keras libraries are used to build models with FCL and CNNs. The dataset can be downloaded from: https://www.kaggle.com/oddrationale/mnist-in-csv , tf.keras.datasets.mnist, or torchvision.datasets.MNIST


Stock prices is analyzed with Pandas, Seaborn, and Matplot Libraries. FBProphet, ARIMA, and LSTM models (with Keras TF) are used to make predictions. The dataset can be obtained at: https://finance.yahoo.com/chart/AAPL/


A Transformer Encoder is coded from scratch with PyTorch and then trained for performing a sentiment analysis on the torch.datasets.IMDB dataset.


A truck is learning how to park backwards, creating its own training data with emulator and doing its steering with controller. This notebook is the enhanced version of the copy at the NYU 2020 Deep Learning Class. The trained weights are stored in the emulator.txt and controller.txt files.


Variational Auto Encoder (VAE) is created and trained it with the Yale Face Database to extract the average facial features of the dataset. This dataset can be found here: https://www.kaggle.com/kerneler/starter-yale-face-database-c5f3978b-5


Bitcoin price is analyzed with Pandas and Matplot Libraries. ARIMA (statistical) and LSTM (machine learning) models are used to make predictions. The dataset can be obtained with yfinance module.