This repo introduces how to prepare you Machine Learning (ML) model for production in AWS Sagemaker with the help of MLflow and AWS Command Line Inteface (AWS CLI). Before start, we should know what is what:
- Mlflow - an open source platform for the ML lifecycle.
- AWS CLI - is an unified tool to manage your Amazon Web Services (AWS) services.
For this tutorial you will need:
- AWS Account.
- Docker installed on your local machine.
- Python 3.6 (or higher) with
mlflow>=1.0.0
installed. - Anaconda software to create conda virtual environment.
- Create a new conda virtual environment in you working directory with the following command in your terminal:
conda create --name deploy_ml python=3.6
With this command you will create a new conda based virtual environment on your local machine. - Once your virtual environment is sucessfully create, you can easily to activate it with the following command:
conda activate deploy_ml
At this moment we are having an almost empty virtual environment which is ready to be filled with new dependencies.
- Install mlflow package into our virtual environment with the following command:
pip install -q mlflow==1.18.0
.
At the moment of preparing this repo, the version of mlfflow ismlflow==1.18.0
. - To run properly the ML model itself we have to install following modules and packages to our virtual environment as well:
- Pandas:
pip install pandas
. - Scikit-learn:
pip install -U scikit-learn
. - AWS Command Line Interface (AWS CLI):
pip install awscli --upgrade --user
. - Boto3:
pip install boto3
.
- Pandas:
- Create a new AWS AIM User
- Open Identity and Access management (IAM) dashboard.
- Click on Users.
- Click Add users on the right side of the screenshot.
- Set the User name and mark Programmatic access tick below.
- Click on Create group as the part of Add user to group option.
- Type a group name you want to assign to your IAM User.
- From the list below select following policies:
- AmazonSageMakerFullAccess.
- AmazonEC2ContainerRegistryFullAccess.
- ....
- Click on Create group, then the current Policies window will be closed.
- Click on Next: Tags.
- Click on Next: Review.
- Click on Create user.
- You will get a notification about sucessfully created new User on AWS IAM, see the screenshot below.
Important. Keep safe you credentials on your own notes. This step is only one occasion you see AWS Secret Access Key. Rewrite it carefully.
- Setup AWS CLI configuration
- Be sure you have installed AWS CLI and type command in your terminal:
aws configure
. - Then you will have to enter your own credentials as follows:
- AWS Access Key ID: go to IAM, then Users, and click on your user just created. Select Security credentials tab and copy the value of AWS Access Key ID
- AWS Secret Access Key: paste this code from your own notes. You have seen this code originally from Security credentials of your user.
- Default region name: go to main AWS interface, click on your region, and check which region is activated for you (us-east-1, eu-west-1, and so on).
- Default output format: set it as json.
- The filled AWS CLI configurations should looks like in the screenshot below.
- Before doing all following steps, we must be sure if our freshly installed mlflow service if working good on our local machine. To do it, type the following command in the terminal:
mlflow ui
. - Open the mlflow dashboard on you browser by entering following URL to your localhost:
http://127.0.0.1:5000
Please keep in mind that this service uses port5000
on your machine (open the second terminal window on the same working directory before run this command).
You should see mlflow dashboard interface it is shown in the screenshot below:
If you see the same on your browser too, mlflow works fine and we are ready to go to the next steps.
- Copy and paste the full code from train.py (available in this repo).
- Adapt the Python code inside to track some model metrics in mlfflow with following changes in code:
import mlflow
import mlflow.sklearn
mlflow.set_experiment("my_classification_model")
with mlflow.start_run(run_name="My model experiment") as run:
mlflow.log_param("num_estimators", num_estimators)
mlflow.sklearn.log_model(rf, "random-forest-model")
# Log model metrics (mse - mean squared error)
mlflow.log_metric("mse", mse)
# close mlflow connection
run_id = run.info.run_uuid
experiment_id = run.info.experiment_id
mlflow.end_run()
print(mlflow.get_artifact_uri())
print("runID: %s" % run_id)
With the code changes above, we are ready to track mse
(Mean Squared Error) metric for the model and generate some run artifacts in mlflow dashboard.
mse
) in mlflow dashboard. To do it properly, do the following:
- Open another terminal windown (be sure the first one with activated mlflow service in port
:5000
is still running). - Activate the same virtual environment on the second terminal window as we have created in Step no. 1 with the following command:
conda activate deploy_ml
- Navigate your terminal to the project environment with
cd
(change directory) terminal command. - From the second terminal window, simply run the model training code train.py with the command in your terminal:
python train.py
. - Once the train script executed sucessfully, you will be notificated about creation of new experiment in mlflow, calculated MSE, absolute directory path to model artifacts, and
runID
, see the screenshot below.
- Open the browser window again where mlflow service is running on. Refresh the browser window. Then, on the left menu Experiments you will see updated list of experiments ran so far, which includes that one you ran recently in this step (
my_classification_model
), see the screenshot below.
- Click on the new experiment, and freely navigate through available options to check metric values and other parameters from the experiment.
- Once you entered to your new experiment space, click on the most recent run tagged with Random Forest Experiment Run time tag. Be sure, that on the Artifacts section you are seeing this kind of structured experiment files with filled content for each file. See screenshot below.
Among the artifact files, there is a conda.yaml file. This is the environment your model needs to run, and it can be heavily customized based on your needs. - Also, check the file structure in you Finder (Mac OS), or in Windows explorer (Windows OS). You should see new folder mlruns created in your project directory with run numbers folder(0, 1, and so on),
runID
folder, and full set of artifacts inside. See the screenshot below.
- On the second terminal windows, got to the artifact directory of selected model run in mlruns folder (in my example I set this directory as: /mlruns/1/ccc73fe5e7784df69b8518e3b9daa0c6/artifacts/random-forest-model), then type and run a command:
mlflow sagemaker build-and-push-container
.
You will see all processes running in real time in your terminal, as shown in a screenshot below.
The finished processes on building and pushing Docker container images looks as it is in a screenshot below:
If you have set up AWS correctly with the proper permissions, this will build an image locally and push it to your image (name is mlflow-pyfunc) registry on AWS. - Once you finished the previous action without any issues, you should check local Docker Desktop to be sure that everything is well at this moment. Open your Docker Desktop application and go to Images section. You should see two images created. The first one indicates your AWS ECR container image, and the other one is
mlflow-pyfunc
. See the screenshot below.
- Check AWS ECR repos list. You should see the same image name as it was listed first in Docker Desktop application. See the screenshot below.
- Check image parameters in AWS ECR by clicking on
mlflow-pyfunc
image repository. On a image properties window you will see the main parameters of your image such as Image URI, Size, and other ones, see the screenshot below.
If you see the same set of image parameters with meaningful values, move forward with next steps.
Important: Before doing all following steps, you should create a special CLI token between your terminal and AWS. For this boto3
package is responsible, and you can do it in very simple way, just by setting your environment variables in your terminal like following:
- Type
AWS_ACCESS_KEY=XXXXXXXXX
, whereXXXXXXXXX
is your AWS Access Key for your User in AWS. - Type
AWS_SECRET_ACCESS_KEY=XXXXXXXXX
, whereXXXXXXXXX
is your AWS Secret Access Key for your User in AWS.
- Create a new Python script in your root project directory by terminal command
touch deploy.py
. This command will create a new file deploy.py. - Write the following script logic in the deploy.py you have just created:
import mlflow.sagemaker as mfs experiment_id = '1' run_id = 'XXXXXXXXXXXXXX' region = 'us-east-2' aws_id = 'XXXXXXXXXXXXXX' arn = 'arn:aws:iam::XXXXXXXXXXXXXX:role/sagemaker-full-access-role' app_name = 'model-application' model_uri = 'mlruns/%s/%s/artifacts/random-forest-model' % (experiment_id, run_id) tag_id = '1.18.0' image_url = aws_id + '.dkr.ecr.' + region + '.amazonaws.com/mlflow-pyfunc:' + tag_id mfs.deploy(app_name=app_name, model_uri=model_uri, region_name=region, mode="create", execution_role_arn=arn, image_url=image_url)
As you can see from the deploy.py skeleton code, we will need to get
run_id
,region
,aws_id
, ARN for AmazonSageMakerFullAccess (arn
),model_uri
, and image_url. Let's extract these values one by one. experiment_id
- Open the Mlflow user interface in http://127.0.0.1:5000.
- Click on your experiment (for example my_classification_model).
- You will see an Experiment ID in the upper side of the screen next to Experiment ID text. Copy the value (in my case it is
1
).
run_id
:
- Open the Mlflow user interface in http://127.0.0.1:5000.
- Click on your experiment (for example my_classification_model).
- Click on run which has an image to an AWS ECR.
- On Artifacts section expand the list of artifacts by clicking on an arrow, and select MLModel.
- On the data window on the right, you can see main data about the model. One of these is
run_id
. Copy it.
aws_id
:
Get your AWS ID from the terminal by running this command:aws sts get-caller-identity --query Account --output text
.arn
:
Create the Role for the SageMakerFullAccess and grab it's ARN.- Open IAM Dashboard.
- Click on Create role.
- Select SageMaker service from the given list and click Next: Permissions.
- Click Next: Tags.
- Once you completed to create this role, copy Role ARN for further usage (Role ARN).
region
:
Go to main AWS interface, click on your region, and check which region is activated for you (us-east-1, eu-west-1, and so on).app_name
:
Set any name you want to recognize your application.tag_id
This the version of mlflow. Open AWS ECR, then open your repository, and copy the value of Image tag.- Open again terminal with activated virtual environment deploy_ml and enter run the deploy.py Python script with the command:
python deploy.py
.
The output should be notify you about sucessfuly complete deploying the image to SageMaker, see the screen below.
You should see the inService message in you terminal resuting into sucessfully deployed model to SageMaker.
Once you are sure that your model is running in AWS, you can make a new prediction for a new given data with the help of boto3
library. Follow the following steps to do it.
- Open the terminal with activated virtual environment deploy_ml.
- Create a new file predict.py with command
touch predict.py
. - For predict.py use this Python code:
import pandas as pd from sklearn.model_selection import train_test_split from sklearn import datasets import json import boto3 global app_name global region app_name = 'model-application' region = 'us-east-2' def check_status(app_name): sage_client = boto3.client('sagemaker', region_name=region) endpoint_description = sage_client.describe_endpoint(EndpointName=app_name) endpoint_status = endpoint_description["EndpointStatus"] return endpoint_status def query_endpoint(app_name, input_json): client = boto3.session.Session().client("sagemaker-runtime", region) response = client.invoke_endpoint( EndpointName=app_name, Body=input_json, ContentType='application/json; format=pandas-split', ) preds = response['Body'].read().decode("ascii") preds = json.loads(preds) print("Received response: {}".format(preds)) return preds ## check endpoint status print("Application status is: {}".format(check_status(app_name))) # Prepare data to give for predictions iris = datasets.load_iris() x = iris.data[:, 2:] y = iris.target X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=7) ## create test data and make inference from enpoint query_input = pd.DataFrame(X_train).iloc[[3]].to_json(orient="split") prediction = query_endpoint(app_name=app_name, input_json=query_input)
- Set variables
app_name
andregion
based on your preferences. - Save the predict.py file.
- Open again the terminal with activated virtual environment deploy_ml and run the file with command
python predict.py
. - As a result, you should see an inService message in terminal informing about sucessful new prediction made in the cloud, see the screenshot below.
Once you sucessfully did all the steps above and achieved the same results, in order to avoid extra billing, do not forget to remove SageMaker endpoint and AWS ECR repository with Docker image.
Video Tutorial: Deploy ML model to AWS Sagemaker with mlflow and Docker