title	description	services	author	ms.service	ms.author	manager	ms.custom	ms.topic	ms.date
include file	include file	machine-learning	sdgilley	machine-learning	sgilley	cgronlund	include file	include	05/20/2021

The compute target you use to host your model will affect the cost and availability of your deployed endpoint. Use this table to choose an appropriate compute target.

Compute target	Used for	GPU support	FPGA support	Description
Local web service	Testing/debugging			Use for limited testing and troubleshooting. Hardware acceleration depends on use of libraries in the local system.
Azure Kubernetes Service (AKS)	Real-time inference	Yes (web service deployment)	Yes	Use for high-scale production deployments. Provides fast response time and autoscaling of the deployed service. Cluster autoscaling isn't supported through the Azure Machine Learning SDK. To change the nodes in the AKS cluster, use the UI for your AKS cluster in the Azure portal. Supported in the designer.
Azure Container Instances	Real-time inference			Use for low-scale CPU-based workloads that require less than 48 GB of RAM. Doesn't require you to manage a cluster. Supported in the designer.
Azure Machine Learning compute clusters	Batch inference	Yes (machine learning pipeline)		Run batch scoring on serverless compute. Supports normal and low-priority VMs. No support for real-time inference.

Note

Although compute targets like local, and Azure Machine Learning compute clusters support GPU for training and experimentation, using GPU for inference when deployed as a web service is supported only on AKS.

Using a GPU for inference when scoring with a machine learning pipeline is supported only on Azure Machine Learning compute.

When choosing a cluster SKU, first scale up and then scale out. Start with a machine that has 150% of the RAM your model requires, profile the result and find a machine that has the performance you need. Once you've learned that, increase the number of machines to fit your need for concurrent inference.

Note

Container instances are suitable only for small models less than 1 GB in size.
Use single-node AKS clusters for dev/test of larger models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aml-compute-target-deploy.md

aml-compute-target-deploy.md

Files

aml-compute-target-deploy.md

Latest commit

History

aml-compute-target-deploy.md

File metadata and controls