This repository contains an Azure ML template project built using the Python SDK v2. The project covers the following use-cases:
- Setting up a training pipeline with hyperparameter search in order to train a
machine learninglogistic regression model, - Deploying a model for batch inference, and
- Deploying a model for online inference.
We also show you how to provision the required infrastructure using Terraform. The purpose of this is to make it so the template project is easy to set up and run even without much experience with Azure.
Please be aware that this is not an "Azure ML Tutorial". The purpose here is to provide a template project that could serve as a starting point for new Azure ML projects. The template project is fully functional though, so it can also serve as an example of how the various pieces of functionality fit together. Most of this is already covered in Microsofts own documentation, but there's very few (if any) end-to-end projects like this. Microsofts documentation and examples mostly covers individual pieces of functionality, and this makes it hard to get a sense of how things can look when combined. The README will cover how to set up and run the template project, but if you're interested in a more in-depth exploration of what is going on you should check out this blog post.
Install and set up Terraform for Azure by following the instructions on the Terraform website.
Proceed by navigating into the provision/
directory. Execute the following commands:
az login
terraform init
terraform apply
This will create all the required resources in Azure. It might take a few minutes to complete.
Create a file in the root directory called .env
. It should look like this:
BATCH_ENDPOINT_NAME="xxx"
ONLINE_ENDPOINT_NAME="xxx"
SA_NAME="xxx"
SA_CONN_STRING="xxx"
SA_CONTAINER_NAME="xxx"
BATCH_ENDPOINT_NAME
andONLINE_ENDPOINT_NAME
can be whatever, but it has to be unique across the entirety of Azure. These will be the names of the REST endpoints your model will be accessible through.SA_NAME
. The name of the storage acccount. Can be found by executingaz storage account list -g azureml-template-rg --query "[].{Name:name}"
.SA_CONN_STRING
. The connection string for storage account. Can be found by executingaz storage account show-connection-string -n ${storage_account_name}
.SA_CONTAINER_NAME
. Can be retrieved by executingaz storage container list --account-name ${storage_account_name} --query "[].{Name:name}"
. The name will beazureml-blobstore
followed by a bunch of letters and numbers.
You should also download config.json
and place it in your root directory. It can be downloaded from the top-right corner at ml.azure.com
or from the resource page in the Azure portal.
Finally, install all the packages required for interacting running this the template project in Azure ML. This includes
azure-ai-ml
azure-identity
azure-storage-blob
mlflow
azureml-mlflow
You should also install python-dotenv
and seaborn
, but those packages are specific to the template project, and wouldn't be required in general.
You could also just run pip install -r requirements.txt
.
Navigate to the setup/
directory. This directory contains two Python scripts that should be run to create the data and upload them to the cloud.
create_data.py
upload_data.py
Navigate into the pipeline/
directory. Create the environment that should be used when training by running create_environment.py
. Register the dataset you uploaded earlier by running register_training_data.py
. Finally, run pipeline.py
to run the training pipeline. This will take up to 30 minutes. Later runs will be quicker as long as you don't make any changes to the environment.
Navigate to the deployment/batch/
directory. Run the scripts in the following order, but verify that the previous step has been completed before moving on to the next one:
create_batch_endpoint.py
create_batch_deployment.py
When you're done, make sure to run delete_endpoint.py
. Endpoints can not be managed with Terraform, so you have to make sure that you've deleted them if you want terraform destroy
to work.
Find your deployment on ml.azure.com
and retrieve the value in the REST endpoint field. Run az account get-access-token --resource https://ml.azure.com --query accessToken -o tsv
to get the required token.
To invoke the endpoint run python invoke_with_rest.py --uri ${rest_endpoint} --token ${token}
. The output will be saved to the folder specified in the request body.
Navigate to the deployment/online/
directory. Run the scripts in the following order, but verify that the previous step has been completed before moving on to the next one:
create_online_endpoint.py
create_online_deployment.py
set_traffic_rules.py
When you're done, make sure to run delete_endpoint.py
. Endpoints can not be managed with Terraform, so you have to make sure that you've deleted them if you want terraform destroy
to work.
Find your deployment on ml.azure.com
and retrieve the value in the REST endpoint field. Run az ml online-endpoint get-credentials -g azureml-template-rg -w azureml-template-ws -n {endpoint_name} -o tsv
to get the required token.
To invoke the endpoint run python invoke_with_rest.py --uri ${rest_endpoint} --token ${token}
. The predictions will be returned by the API.
The following websites were very helpful when piecing all this stuff together: