End-to-End MLOps Pipeline using SageMaker Pipelines

The goal of the project is to run an end-to-end pipeline using SageMaker along with

hyper-parameter tuning
model evaluation
model explanation
model testing
serverless deployment
data labelling with both human annotation and model prediction
and front-end deployment.

Video Demonstration

You can see a demo of the project here

Part 1 - Hyper-parameter Tuning (HPT)

For HPT, Optuna was used. The augmentation and dataset used is the same as that for training later.

The hyperparameters explored were:

Model: resnet18, resnet34, resnet26
Optimizers: Adam, SGD
Learning Rate: log scale in the range (1e-4, 1e-2)

The tuning video can be seen here.

We can see that the best model yields a validation accuracy of 73.5% and has the following hyperparameters

Model: resnet34
Optimizer: SGD
Learning Rate: 0.0018308341419607547

The Tensorboard logs can be seen online here

.

These values are chosen as the default values for model training in the pipeline.

Part 2 - SageMaker Pipeline

A pipeline is created (by modifying the skeleton project 'abalone').

Parameters

This pipeline has the following parameters.

Model Name
Batch Size
Optimizer
Learning Rate

Data Preprocessing

Data (in a zip file) is read from S3. However, it also gets additional data from another 'source'. We will see this part later!

Training

For model training, albumentation is used for augmentation.

Training itself is done on SageMaker using a spot instance of 'ml.g4dn.12xlarge' which is a multi-gpu instance (with 4 Nvidia T4s). To enable DDP in SageMaker, the following argument was passed to the PyTorch estimator.

distribution = { 
        "pytorchddp": {
            "enabled": True,
            "custom_mpi_options": "-verbose -x NCCL_DEBUG=VERSION"
        }
    }

The logs can be seen here

Confusion matrix images are not logged in Tensorboard dev, but have been saved on each epoch.

.

Evaluation

In the pipeline, the model is evaluated using the test dataset. If the accuracy is better than the one obtained via HPO, the model is registered and is subject to manual approval before deployment

The evaluation report can be seen here.

Data Drift

The data drift for the model is calculated using the Alibi Detect library. In the pipeline, this step is performed as part of the evaluate stage.

It outputs two json files in S3; one for the unperturbed dataset and the other for the perturbed dataset.

Explanation

The explanation for the model is calculated using the captum library. In the pipeline, this step is performed as part of the evaluate stage.

It outputs a set of images (one for each of the different methods) and a markdown file in S3. The markdown file can be seen here.

Robustness

The robustness for the model is (also) calculated using the captum library. In the pipeline, this step is performed as part of the evaluate stage.

It outputs a set of images (one for each of the different methods) and a markdown file in S3. The markdown file can be seen here.

Deployment

For our deployment, we use serverless inference in the staging part and a managed (EC2) endpoint for the production part. The config files for the two stages can be seen here

staging
prod

Part 3 - Testing

For testing, pytest is used. The code reads images from a directory and makes sure that the predictions match the class labels (which are the same as the filenames). It directly invokes the endpoint.

The file is located here. To execute the file, you must be in the 03_testing folder and run pytest test_intel

Part 4 - Human-in-the-Loop

What if we obtain more (unlabelled) data than we initially started with?

In that case, we need to first annotate it and then upload data to a data lake (S3 in our case).

There are two ways to perform annotations

Inference from our deployed model
Human annotations

Inference from deployed model

For inference, we have to use label-studio-ml. The setup instructions can be found here. It calls the deployed predictor by sending the image data, and populates the predicted label along with the score. We can manually retrieve a prediction by selecting the task and going to Task -> Retrieve Predictions

Label Studio automatically retrieves predictions when we open it for annotation! It populates the choices with a check sign.

Annotation and webhooks

Once the user labels a task, it can run a webhook. This webhook has been configured to be the 'function URL' of a lambda function. This lambda function does a couple of things

It reads the image url and the label annotated by the user.
It copies that file to an S3 bucket (with the file under the "folder" whose name is the label)
Recall in the pipeline section it was mentioned that it gets additional data. This is where the pipeline gets additional data from! In CloudWatch, you can see the logs that it prints class count without annotations (the original zip file) as well as with annotations. This implements our human-in-the-loop part.
It checks if the number of annotations exceed a threshold. If not, nothing happens. If it does, then the modelbuild pipeline is triggered.

Note that this triggers the SageMaker pipeline and not the CodePipeline pipeline. This is an expected/intended behavior. It is just that only the data for training will change and not the pipeline infrastructure.

Part 5 - Front-End and Web Deployment

For the front-end, streamlit is used. This lets you upload a file and run inference.

For global access, Streamlit Cloud is used that generates a URL that can be browsed. But Streamlit Cloud cannot access libraries like PyTorch etc. To address this, a docker image was built along with a lambda function (with Function URL enabled) to access it. The docker image contains all the required libraries.

The front-end can be seen in the video here.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
01_hpt		01_hpt
02_pipeline		02_pipeline
03_testing/test_intel		03_testing/test_intel
04_hil		04_hil
05-front		05-front
img		img
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

End-to-End MLOps Pipeline using SageMaker Pipelines

Video Demonstration

Part 1 - Hyper-parameter Tuning (HPT)

Part 2 - SageMaker Pipeline

Parameters

Data Preprocessing

Training

Evaluation

Data Drift

Explanation

Robustness

Deployment

Part 3 - Testing

Part 4 - Human-in-the-Loop

Inference from deployed model

Annotation and webhooks

Part 5 - Front-End and Web Deployment

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mmgxa/mlops_sagemaker

Folders and files

Latest commit

History

Repository files navigation

End-to-End MLOps Pipeline using SageMaker Pipelines

Video Demonstration

Part 1 - Hyper-parameter Tuning (HPT)

Part 2 - SageMaker Pipeline

Parameters

Data Preprocessing

Training

Evaluation

Data Drift

Explanation

Robustness

Deployment

Part 3 - Testing

Part 4 - Human-in-the-Loop

Inference from deployed model

Annotation and webhooks

Part 5 - Front-End and Web Deployment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages