Skip to content

Latest commit

 

History

History
33 lines (28 loc) · 2.85 KB

aml-compute-target-deploy.md

File metadata and controls

33 lines (28 loc) · 2.85 KB
title description services author ms.service ms.author manager ms.custom ms.topic ms.date
include file
include file
machine-learning
sdgilley
machine-learning
sgilley
cgronlund
include file
include
05/20/2021

The compute target you use to host your model will affect the cost and availability of your deployed endpoint. Use this table to choose an appropriate compute target.

Compute target Used for GPU support FPGA support Description
Local web service Testing/debugging     Use for limited testing and troubleshooting. Hardware acceleration depends on use of libraries in the local system.
Azure Kubernetes Service (AKS) Real-time inference Yes (web service deployment) Yes Use for high-scale production deployments. Provides fast response time and autoscaling of the deployed service. Cluster autoscaling isn't supported through the Azure Machine Learning SDK. To change the nodes in the AKS cluster, use the UI for your AKS cluster in the Azure portal.

Supported in the designer.
Azure Container Instances Real-time inference     Use for low-scale CPU-based workloads that require less than 48 GB of RAM. Doesn't require you to manage a cluster.

Supported in the designer.
Azure Machine Learning compute clusters Batch inference Yes (machine learning pipeline)   Run batch scoring on serverless compute. Supports normal and low-priority VMs. No support for real-time inference.

Note

Although compute targets like local, and Azure Machine Learning compute clusters support GPU for training and experimentation, using GPU for inference when deployed as a web service is supported only on AKS.

Using a GPU for inference when scoring with a machine learning pipeline is supported only on Azure Machine Learning compute.

When choosing a cluster SKU, first scale up and then scale out. Start with a machine that has 150% of the RAM your model requires, profile the result and find a machine that has the performance you need. Once you've learned that, increase the number of machines to fit your need for concurrent inference.

Note

  • Container instances are suitable only for small models less than 1 GB in size.
  • Use single-node AKS clusters for dev/test of larger models.