title | description | services | author | ms.service | ms.author | manager | ms.custom | ms.topic | ms.date |
---|---|---|---|---|---|---|---|---|---|
include file |
include file |
machine-learning |
sdgilley |
machine-learning |
sgilley |
cgronlund |
include file |
include |
05/20/2021 |
The compute target you use to host your model will affect the cost and availability of your deployed endpoint. Use this table to choose an appropriate compute target.
Compute target | Used for | GPU support | FPGA support | Description |
---|---|---|---|---|
Local web service | Testing/debugging | Use for limited testing and troubleshooting. Hardware acceleration depends on use of libraries in the local system. | ||
Azure Kubernetes Service (AKS) | Real-time inference | Yes (web service deployment) | Yes | Use for high-scale production deployments. Provides fast response time and autoscaling of the deployed service. Cluster autoscaling isn't supported through the Azure Machine Learning SDK. To change the nodes in the AKS cluster, use the UI for your AKS cluster in the Azure portal. Supported in the designer. |
Azure Container Instances | Real-time inference | Use for low-scale CPU-based workloads that require less than 48 GB of RAM. Doesn't require you to manage a cluster. Supported in the designer. |
||
Azure Machine Learning compute clusters | Batch inference | Yes (machine learning pipeline) | Run batch scoring on serverless compute. Supports normal and low-priority VMs. No support for real-time inference. |
Note
Although compute targets like local, and Azure Machine Learning compute clusters support GPU for training and experimentation, using GPU for inference when deployed as a web service is supported only on AKS.
Using a GPU for inference when scoring with a machine learning pipeline is supported only on Azure Machine Learning compute.
When choosing a cluster SKU, first scale up and then scale out. Start with a machine that has 150% of the RAM your model requires, profile the result and find a machine that has the performance you need. Once you've learned that, increase the number of machines to fit your need for concurrent inference.
Note
- Container instances are suitable only for small models less than 1 GB in size.
- Use single-node AKS clusters for dev/test of larger models.