© Copyright 2024, Intel Corporation
This module provides functionality to create a SageMaker Endpoint based on the latest 4th gen Intel Xeon scalable processors (called Sapphire Rapids) that is available in SageMaker endpoints at the time of publication of this module.
Achieve up to 64% Better BERT-Large Inference Work Performances by Selecting AWS M6i Instances Featuring 3rd Gen Intel Xeon Scalable Processors
Amazon M6i Instances Featuring 3rd Gen Intel Xeon Scalable Processors Delivered up to 1.75 Times the Wide & Deep Recommender Performance
Handle Up to 2.94x the Frames per Second for ResNet50 Image Classification with AWS M6i Instances Featuring 3rd Gen Intel Xeon Scalable Processors
Classify up to 1.21x the Frames per Second for ResNet50 Workloads by Choosing AWS M6i Instances with 3rd Gen Intel Xeon Scalable Processors
Choose AWS M6i Instances with 3rd Gen Intel Xeon Scalable Processors for Better BERT Deep Learning Performance
Achieve up to 6.5x the BERT Deep Learning Performance with AWS M6i Instances Enabled by 3rd Gen Intel Xeon Scalable Processors
See examples folder for code ./examples/provisioned-realtime-endpoint/main.tf
Example of main.tf
#########################################################
# Local variables, modify for your needs #
#########################################################
# See policies.md for recommended instances
# Intel recommended instance types for SageMaker endpoint configurations
# Compute Optimized
# ml.c7i.large, ml.c7i.xlarge, ml.c7i.2xlarge, ml.c7i.4xlarge, ml.c7i.8xlarge, ml.c7i.12xlarge,
# ml.c7i.16xlarge, ml.c7i.24xlarge, ml.c7i.48xlarge, ml.c6i.large, ml.c6i.xlarge, ml.c6i.2xlarge, ml.c6i.4xlarge, ml.c6i.8xlarge, ml.c6i.12xlarge, ml.c6i.16xlarge, ml.c6i.24xlarge, ml.c6i.32xlarge
# General Purpose
# ml.m7i.large, ml.m7i.xlarge, ml.m7i.2xlarge, ml.m7i.4xlarge, ml.m7i.8xlarge, ml.m7i.12xlarge,
# ml.m7i.16xlarge, ml.m7i.24xlarge, ml.m7i.48xlarge, ml.m5.large, ml.m5.xlarge, ml.m5.2xlarge, ml.m5.4xlarge, ml.m5.12xlarge, ml.m5.24xlarge, ml.m5d.large, ml.m5d.xlarge, ml.m5d.2xlarge,ml.m5d.4xlarge, ml.m5d.12xlarge, ml.m5d.24xlarge
# Memory Optimized
# ml.r7i.large, ml.r7i.xlarge, ml.r7i.2xlarge, ml.r7i.4xlarge, ml.r7i.8xlarge, ml.r7i.12xlarge,
# ml.r7i.16xlarge, ml.r7i.24xlarge, ml.r7i.48xlarge, ml.r5.large, ml.r5.xlarge, ml.r5.2xlarge, ml.r5.4xlarge, ml.r5.12xlarge, ml.r5.24xlarge, ml.r5d.large, ml.r5d.xlarge, ml.r5d.2xlarge, ml.r5d.4xlarge, ml.r5d.12xlarge, ml.r5d.24xlarge
# Accelerated Computing
# ml.g4dn.xlarge, ml.g4dn.2xlarge, ml.g4dn.4xlarge, ml.g4dn.8xlarge, ml.g4dn.12xlarge, ml.g4dn.16xlarge, ml.inf1.xlarge,
# ml.inf1.2xlarge, ml.inf1.6xlarge, ml.inf1.24xlarge
locals {
region = "us-east-1"
sagemaker_container_log_level = "20"
sagemaker_program = "inference.py"
sagemaker_submit_directory = "/opt/ml/model/code"
# This is the place where you need to provide the S3 path to the model artifact. In this example, we are using a model
# artifact that is created from SageMaker jumpstart pre-trained model for Scikit Learn Linear regression.
# The S3 path for the model artifact will look like the example below.
aws-jumpstart-inference-model-uri = "s3://sagemaker-us-east-1-<AWS_Account_Id>/sklearn-regression-linear-20240208-220732/model.tar.gz" # change here
# This is the ECR registry path for the container image that is used for inferencing.
model_image = "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.23-1-cpu-py3"
enable_network_isolation = true
}
resource "random_id" "rid" {
byte_length = 5
}
module "sagemaker_scikit_learn_model" {
source = "../../modules"
# Specifying SageMaker Model Primary container parameters corresponding to the production variant
sagemaker_model_primary_container = [{
image = local.model_image
model_data_url = local.aws-jumpstart-inference-model-uri
environment = {
"SAGEMAKER_CONTAINER_LOG_LEVEL" = local.sagemaker_container_log_level
"SAGEMAKER_PROGRAM" = local.sagemaker_program
"SAGEMAKER_REGION" = local.region
"SAGEMAKER_SUBMIT_DIRECTORY" = local.sagemaker_submit_directory
}
}]
}
module "sagemaker_endpoint" {
source = "intel/aws-sagemaker-endpoint/intel"
# Specifying one production variant for the SageMaker endpoint configuration
endpoint_production_variants = [{
model_name = module.sagemaker_scikit_learn_model.sagemaker-model-name
instance_type = "ml.c7i.xlarge"
initial_instance_count = 1
variant_name = "my-variant-1-${random_id.rid.dec}"
}]
}
Run Terraform
terraform init
terraform plan
terraform apply
Note that this example may create resources. Run terraform destroy
when you don't need these resources anymore.
- The Sagemaker Endpoint resource created is a provisoned endpoint
Using the SageMaker Python SDK https://sagemaker.readthedocs.io/en/stable/overview.html#use-sagemaker-jumpstart-algorithms-with-pretrained-models
Deploy a Pre-Trained Model Directly to a SageMaker Endpoint https://sagemaker.readthedocs.io/en/stable/overview.html#use-built-in-algorithms-with-pre-trained-models-in-sagemaker-python-sdk
Built-in Algorithms with pre-trained Model Table https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html
Name | Version |
---|---|
terraform | >=1.3.0 |
aws | ~> 5.31 |
random | ~>3.4.3 |
Name | Version |
---|---|
aws | ~> 5.31 |
random | ~>3.4.3 |
No modules.
Name | Type |
---|---|
aws_sagemaker_endpoint.endpoint | resource |
aws_sagemaker_endpoint_configuration.ec | resource |
random_id.rid | resource |
Name | Description | Type | Default | Required |
---|---|---|---|---|
accelerator_type | The size of the Elastic Inference (EI) instance to use for the production variant. | string |
null |
no |
capture_mode | Specifies the data to be captured. Should be one of Input or Output. | string |
"Input" |
no |
create_shadow_variant | A boolean flag to determinie whether a shadow production variant will be created or not. | bool |
false |
no |
destination_s3_uri | The URL for S3 location where the captured data is stored. | any |
null |
no |
enable_capture | Flag to enable data capture. | bool |
false |
no |
enable_intel_tags | If true adds additional Intel tags to resources | bool |
true |
no |
endpoint_configuration_tags | Tags for the SageMaker Endpoint Configuration resource | map(string) |
null |
no |
endpoint_production_variants | A list of Production Variant objects, one for each model that you want to host at this endpoint. | list |
[] |
no |
endpoint_shadow_variants | Array of ProductionVariant objects. There is one for each model that you want to host at this endpoint in shadow mode with production traffic replicated from the model specified on ProductionVariants.If you use this field, you can only specify one variant for ProductionVariants and one variant for ShadowProductionVariants. | list |
[] |
no |
endpoint_tags | Tags for the SageMaker Endpoint resource | map(string) |
null |
no |
initial_instance_count | Initial number of instances used for auto-scaling. | number |
1 |
no |
initial_sampling_percentage | Portion of data to capture. Should be between 0 and 100. | number |
100 |
no |
initial_variant_weight | Determines initial traffic distribution among all of the models that you specify in the endpoint configuration. If unspecified, it defaults to 1.0. | string |
null |
no |
instance_type | The type of instance to start. | string |
"ml.c7i.large" |
no |
intel_tags | Intel Tags | map(string) |
{ |
no |
json_content_types | The JSON content type headers to capture. | any |
null |
no |
kms_key_arn | Amazon Resource Name (ARN) of a AWS Key Management Service key that Amazon SageMaker uses to encrypt data on the storage volume attached to the ML compute instance that hosts the endpoint. | string |
null |
no |
model_name | The name of the model to use. | string |
null |
no |
shadow_accelerator_type | The size of the Elastic Inference (EI) instance to use for the production variant. | string |
null |
no |
shadow_initial_instance_count | Initial number of instances used for auto-scaling. | number |
1 |
no |
shadow_initial_variant_weight | Determines initial traffic distribution among all of the models that you specify in the endpoint configuration. If unspecified, it defaults to 1.0. | string |
null |
no |
shadow_instance_type | The type of instance to start. | string |
"ml.c6i.large" |
no |
shadow_model_name | The name of the model to use. | string |
null |
no |
shadow_variant_name | The name of the variant. If omitted, Terraform will assign a random, unique name. | string |
null |
no |
variant_name | The name of the variant. If omitted, Terraform will assign a random, unique name. | string |
null |
no |
Name | Description |
---|---|
endpoint-arn | The Amazon Resource Name (ARN) assigned by AWS to this endpoint |
endpoint-configuration-arn | The Amazon Resource Name (ARN) assigned by AWS to this endpoint configuration |
endpoint-configuration-name | The name of the endpoint configuration. |
endpoint-configuration-tags_all | A map of tags assigned to the endpoint configuration, including those inherited from the provider default_tags configuration block. |
endpoint-name | The name of the endpoint |