The goal of the project is to deliver a deep learning model classifying an open-source dataset of Shark species available on Kaggle.
The project will consist of a training and evaluation scripts wrapped with Kedro project.
Therefore, we are going to use some state-of-the-art convolutional neural networks adjusted to the needs of the dataset. We are aware that the project is not revolutionary at, but its goal is to learn how to deliver end-to-end ML model rather than make an innovative step in research.
The papers which describe the models that we are going to use are obviously:
- A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
- K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition. In CVPR, 2016.
Due to complicated dependencies, we are using conda environment for this project. To prepare it, run:
conda env create -f environment.yml
There are 3 pipelines:
data_processingfor preparing the datatrain_modelfor training and evaluating model with given hyperparametersoptimize_hyperparamsfor hyperparams optimization using Raytune
To run any of them, run:
kedro run --pipeline pipeline_name
data_processing pipeline will be automatically started before running train_model or optimize_hyperparams
Results of training (in both train_model and optimize_hyperparams) will get logged to Weights & Biases.
Results of our experiments can be found here.