GitHub

Capstone Project mlzoomcamp Image Classification

Extra folder with the project images
Extra_models folder with best models in .h5 and .tflite format (it is advised that you train the model from the start and not copy these)
Dockerfile for building the docker image
Documentation with code description
README.md with
- Description of the problem
- Instructions on how to run the project
create_directories.py that splits the data to train, val, test folders
Dependencies
Script lambda-function.py for predictions. The script is formatted for deployment on Amazon Web Services' Lambda.
notebook.ipynb a Jupyter Notebook with the data analysis and models
Script test.py for testing
test.json where you can copy any json event to test
Script train.py
- Training the final model
Instructions for Production deployment
- Video or image of how you interact with the deployed service

Dataset is from Kaggle and will be given instructions later how to download.

Description of the problem

Written with help of #ChatGPT

Have you ever been to the beach and found yourself wanting to collect either shells or pebbles, but not sure which was which? Or maybe you're in the oil and gas industry and need a quick and accurate way to classify different geological materials? Well, I have the solution for you!

Introducing the Shells or Pebbles dataset – a collection of images specifically designed for binary classification tasks. With this dataset, you'll be able to easily determine whether a certain image is a shell or a pebble.

But the usefulness of this dataset doesn't stop there. In the oil and gas industry, accurately identifying and classifying different materials, including rocks and shells, is crucial for exploration and production activities. By understanding the composition and structure of the earth's layers, geologists can make informed decisions about where to drill for oil and gas.

And for those concerned about the environment, this dataset can also be used to study the impacts of climate change on coastal ecosystems. By analyzing the distribution and abundance of shells and pebbles on beaches, scientists can gain valuable insights into the health of marine life and the effects of human activities.

So whether you're an artist looking to create a beach-themed project or a scientist studying the earth's geological makeup, the Shells or Pebbles dataset has something to offer. With its reliable and accurate classification capabilities, this dataset can help you make better informed decisions and better understand the world around you.

Project Objectives

Potential objectives for this project include:

Develop a model that performs well on a binary classification problem.
Tune the model's hyperparameters to get the best possible accuracy.
- Used learning rate, droprate as main hyperparameters. Also added data augmentation but due to lack of time and computer resources didn't spend much time on tuning it further. Size of inner layers, img size and other parameters could also be changed by the user.
Use the callbacks to save the best model weights and and end training if the validation accuracy does not increase after a certain number of epochs.
Utilize TensorBoard to visualize the training process and find trends or patterns in the data (I didn't make use of this in the end).
Use the trained model to accurately categorize new photos as Shells or Pebbles.
Deploy the trained model in a production environment.
Create comprehensive Documentation for the project, including a detailed description of the model architecture, training procedure and deployment.
Display the project's outcomes in a more professional way.

I selected possible best parameters and architecture to achieve a good accuracy. It is possible the architecture is not much suitable or there are other parameters that better fit this problem. It would need more investigation on the dataset and on creation of the model.

Local deployment

All development was done on Windows with conda.

You can create an environment

conda env create -f env_project.yml
conda activate capstone

Download repo

https://github.com/dimzachar/capstone_mlzoomcamp.git

Notes:

You can git clone the repo in Saturn Cloud instead of running it in your own pc.
Just make sure you have set it up, see here. Create secrets for Kaggle in order to download the data.
You don't need pipenv if you use Saturn Cloud.
See instructions below for more.
You can access the environment here

For the virtual environment, I utilized pipenv.

If you want to use the same venv as me, install pipenv and dependencies, navigate to the folder with the given files:

cd capstone_mlzoomcamp
pip install pipenv
pipenv shell
pipenv install numpy pandas seaborn jupyter plotly scipy tensorflow==2.9.1 scikit-learn==1.1.3 tensorflow-gpu

Before you begin you need to download the data. You can either download them manually from Kaggle or use the kaggle cli with your API keys (you need to download the kaggle.json from your profile amd paste it in PATH/.kaggle) and extract the files

kaggle config set -n api.username -v YOUR_USERNAME
kaggle config set -n api.key -v YOUR_API_KEY

kaggle datasets download -d vencerlanz09/shells-or-pebbles-an-image-classification-dataset -p Images

If you run it on Saturn Cloud make sure you are inside /tensorflow/capstone_mlzoomcamp.

This will download the zip file inside folder named Images. Then, unzip it inside this folder manually or using git bash and delete the zip file. Since you are inside capstone_mlzoomcamp folder do

unzip -q Images/shells-or-pebbles-an-image-classification-dataset.zip -d Images
rm Images/shells-or-pebbles-an-image-classification-dataset.zip

Folder structure should now look like this

Images
├───Pebbles
└───Shells

Now run create_directories script, which will split the images into train, val and test folders (60%,20%,20%) with labels

pipenv run python create_directories.py

The final structure before you train the model should look like this

Images
├───test
│   ├───Pebbles
│   └───Shells
├───train
│   ├───Pebbles
│   └───Shells
└───val
    ├───Pebbles
    └───Shells

To open the notebook.ipynb and see what is inside (optional - running the whole thing would probably take min 2 hours), run jupyter

pipenv run jupyter notebook

For the evaluation you would need to run train.py. This, will run the train function and construct a ML model with best parameters which will be saved in checkpoints folder (it will be created automatically). The model with highest validation accuracy will be loaded, evaluated (it will return some metrics) and then converted to a Tensorflow Lite model in order to deploy it in the cloud later. Note: If you run it on a CPU it will take some time (minimum 20 minutes). It is a good idea to use a GPU to speed up the training process.

pipenv run python train.py

Note:

Ignore if you get any warnings and wait till you see the message Finished. In the end you will have a model.tflite file in the directory. You can also find the best model in .h5 format inside the checkpoints folder.
If you don't want to run train.py (even though you should) there are files in folder Extra_models in .h5 and .tflite format. I have no responsibility if they work (I guess they do).

Production deployment

Docker container

To deploy the model locally, follow these steps:

Install Docker on your system. Instructions can be found here.
Build the Docker image for the model and run the container using the following commands:

docker build -t model .
docker run -it --rm -p 8080:8080 model:latest

then run

pipenv run python test.py

to test it locally using an url.

The function returns a dictionary with a single key-value pair, where the key is the class label and the value is the prediction value. The class label is "Shells" if the prediction value is greater than or equal to 0.5, or "Pebbles" if the prediction value is less than 0.5. The prediction value is always greater than or equal to 0.5.

For example, if the value of pred is 0.7, the class label will be "Shells" and the prediction value will be 0.7. If the value of pred is 0.3, the class label will be "Pebbles" and the prediction value will be 0.7.

Cloud deployment

In order to deploy it to AWS we push the docker image. Make sure you have an account and install AWS CLI. Instructions can be found here

First, create a repository on Amazon Elastic Container Registry (ECR) with an appropriate name

You will find the push commands there to tag and push the latest docker image

which you find on your system with

pipenv run docker images

Next, we publish to AWS Lambda.

Go to AWS Lambda, create function, select container image and add a name. Then, browse your image and finally hit create function

Go to configuration, change timeout to 30 seconds and increase memory RAM (e.g. 1024)

Test the function by changing the event json

Expose the lambda function using API Gateway. Go to API Gateway, select REST API and build a new API

Create a new API, give a name

Create new resource, name it predict

Create new method, select POST and hit click. Choose Lambda function as integration type and on Lambda function give the name of the function you created and hit save

Hit Test, add a JSON document on request body

 {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/7/77/Pebbleswithquarzite.jpg/1280px-Pebbleswithquarzite.jpg" }

or other image

Hit Deploy on Actions, select New Stage and give a name

Copy the invoke URL, put it in your /test.py file and run it

Make sure you remove/delete everything after testing if necessary.

Video of cloud deployment

shells.mp4

That's a wrap!

What else can I do?

Send a pull request.
If you liked this project, give a ⭐.

Connect with me:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Capstone Project mlzoomcamp Image Classification

Table of Contents

Description of the problem

Project Objectives

Local deployment

Production deployment

Docker container

Cloud deployment

What else can I do?

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Extra		Extra
Extra_models		Extra_models
Dockerfile		Dockerfile
Documentation.md		Documentation.md
LICENSE		LICENSE
README.md		README.md
create_directories.py		create_directories.py
env_project.yml		env_project.yml
lambda-function.py		lambda-function.py
notebook.ipynb		notebook.ipynb
test.json		test.json
test.py		test.py
train.py		train.py

License

dimzachar/capstone_mlzoomcamp

Folders and files

Latest commit

History

Repository files navigation

Capstone Project mlzoomcamp Image Classification

Table of Contents

Description of the problem

Project Objectives

Local deployment

Production deployment

Docker container

Cloud deployment

What else can I do?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages