Intel® Optimized Cloud Modules for Terraform

Amazon SageMaker Endpoint module

This module provides functionality to create a SageMaker Endpoint based on the latest 4th gen Intel Xeon scalable processors (called Sapphire Rapids) that is available in SageMaker endpoints at the time of publication of this module.

Performance Data

Find all the information below plus even more by navigating our full library

INTEL CLOUD PERFROMANCE DATA LIBRARY for AWS

Deliver a Better Customer Support Chatbot Experience with Higher-Value AWS EC2 M7i Instances

Achieve up to 64% Better BERT-Large Inference Work Performances by Selecting AWS M6i Instances Featuring 3rd Gen Intel Xeon Scalable Processors

Amazon M6i Instances Featuring 3rd Gen Intel Xeon Scalable Processors Delivered up to 1.75 Times the Wide & Deep Recommender Performance

Handle Up to 2.94x the Frames per Second for ResNet50 Image Classification with AWS M6i Instances Featuring 3rd Gen Intel Xeon Scalable Processors

Classify up to 1.21x the Frames per Second for ResNet50 Workloads by Choosing AWS M6i Instances with 3rd Gen Intel Xeon Scalable Processors

Choose AWS M6i Instances with 3rd Gen Intel Xeon Scalable Processors for Better BERT Deep Learning Performance

Achieve up to 6.5x the BERT Deep Learning Performance with AWS M6i Instances Enabled by 3rd Gen Intel Xeon Scalable Processors

Usage

See examples folder for code ./examples/provisioned-realtime-endpoint/main.tf

Example of main.tf

#########################################################
# Local variables, modify for your needs                #
#########################################################

# See policies.md for recommended instances
# Intel recommended instance types for SageMaker endpoint configurations

# Compute Optimized
# ml.c7i.large, ml.c7i.xlarge, ml.c7i.2xlarge, ml.c7i.4xlarge, ml.c7i.8xlarge, ml.c7i.12xlarge, 
# ml.c7i.16xlarge, ml.c7i.24xlarge, ml.c7i.48xlarge, ml.c6i.large, ml.c6i.xlarge, ml.c6i.2xlarge, ml.c6i.4xlarge, ml.c6i.8xlarge, ml.c6i.12xlarge, ml.c6i.16xlarge, ml.c6i.24xlarge, ml.c6i.32xlarge


# General Purpose
# ml.m7i.large, ml.m7i.xlarge, ml.m7i.2xlarge, ml.m7i.4xlarge, ml.m7i.8xlarge, ml.m7i.12xlarge, 
# ml.m7i.16xlarge, ml.m7i.24xlarge, ml.m7i.48xlarge, ml.m5.large, ml.m5.xlarge, ml.m5.2xlarge, ml.m5.4xlarge, ml.m5.12xlarge, ml.m5.24xlarge, ml.m5d.large, ml.m5d.xlarge, ml.m5d.2xlarge,ml.m5d.4xlarge, ml.m5d.12xlarge, ml.m5d.24xlarge

# Memory Optimized
# ml.r7i.large, ml.r7i.xlarge, ml.r7i.2xlarge, ml.r7i.4xlarge, ml.r7i.8xlarge, ml.r7i.12xlarge, 
# ml.r7i.16xlarge, ml.r7i.24xlarge, ml.r7i.48xlarge, ml.r5.large, ml.r5.xlarge, ml.r5.2xlarge, ml.r5.4xlarge, ml.r5.12xlarge, ml.r5.24xlarge, ml.r5d.large, ml.r5d.xlarge, ml.r5d.2xlarge, ml.r5d.4xlarge, ml.r5d.12xlarge, ml.r5d.24xlarge

# Accelerated Computing
# ml.g4dn.xlarge, ml.g4dn.2xlarge, ml.g4dn.4xlarge, ml.g4dn.8xlarge, ml.g4dn.12xlarge, ml.g4dn.16xlarge, ml.inf1.xlarge, 
# ml.inf1.2xlarge, ml.inf1.6xlarge, ml.inf1.24xlarge

locals {
  region                        = "us-east-1"
  sagemaker_container_log_level = "20"
  sagemaker_program             = "inference.py"
  sagemaker_submit_directory    = "/opt/ml/model/code"

  # This is the place where you need to provide the S3 path to the model artifact. In this example, we are using a model
  # artifact that is created from SageMaker jumpstart pre-trained model for Scikit Learn Linear regression.
  # The S3 path for the model artifact will look like the example below.
  aws-jumpstart-inference-model-uri = "s3://sagemaker-us-east-1-<AWS_Account_Id>/sklearn-regression-linear-20240208-220732/model.tar.gz" # change here

  # This is the ECR registry path for the container image that is used for inferencing.
  model_image = "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.23-1-cpu-py3"

  enable_network_isolation = true
}

resource "random_id" "rid" {
  byte_length = 5
}

module "sagemaker_scikit_learn_model" {
  source = "../../modules"

  # Specifying SageMaker Model Primary container parameters corresponding to the production variant
  sagemaker_model_primary_container = [{
    image          = local.model_image
    model_data_url = local.aws-jumpstart-inference-model-uri
    environment = {
      "SAGEMAKER_CONTAINER_LOG_LEVEL" = local.sagemaker_container_log_level
      "SAGEMAKER_PROGRAM"             = local.sagemaker_program
      "SAGEMAKER_REGION"              = local.region
      "SAGEMAKER_SUBMIT_DIRECTORY"    = local.sagemaker_submit_directory
    }
  }]
}

module "sagemaker_endpoint" {
  source = "intel/aws-sagemaker-endpoint/intel"

  # Specifying one production variant for the SageMaker endpoint configuration
  endpoint_production_variants = [{
    model_name             = module.sagemaker_scikit_learn_model.sagemaker-model-name
    instance_type          = "ml.c7i.xlarge"
    initial_instance_count = 1
    variant_name           = "my-variant-1-${random_id.rid.dec}"
  }]
}

Run Terraform

terraform init  
terraform plan
terraform apply

Note that this example may create resources. Run terraform destroy when you don't need these resources anymore.

Considerations

The Sagemaker Endpoint resource created is a provisoned endpoint

AWS References

Using the SageMaker Python SDK https://sagemaker.readthedocs.io/en/stable/overview.html#use-sagemaker-jumpstart-algorithms-with-pretrained-models

Deploy a Pre-Trained Model Directly to a SageMaker Endpoint https://sagemaker.readthedocs.io/en/stable/overview.html#use-built-in-algorithms-with-pre-trained-models-in-sagemaker-python-sdk

Built-in Algorithms with pre-trained Model Table https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html

Requirements

Name	Version
terraform	>=1.3.0
aws	~> 5.31
random	~>3.4.3

Providers

Name	Version
aws	~> 5.31
random	~>3.4.3

Modules

No modules.

Resources

Name	Type
aws_sagemaker_endpoint.endpoint	resource
aws_sagemaker_endpoint_configuration.ec	resource
random_id.rid	resource

Inputs

Name	Description	Type	Default	Required
accelerator_type	The size of the Elastic Inference (EI) instance to use for the production variant.	`string`	`null`	no
capture_mode	Specifies the data to be captured. Should be one of Input or Output.	`string`	`"Input"`	no
create_shadow_variant	A boolean flag to determinie whether a shadow production variant will be created or not.	`bool`	`false`	no
destination_s3_uri	The URL for S3 location where the captured data is stored.	`any`	`null`	no
enable_capture	Flag to enable data capture.	`bool`	`false`	no
enable_intel_tags	If true adds additional Intel tags to resources	`bool`	`true`	no
endpoint_configuration_tags	Tags for the SageMaker Endpoint Configuration resource	`map(string)`	`null`	no
endpoint_production_variants	A list of Production Variant objects, one for each model that you want to host at this endpoint.	`list`	`[]`	no
endpoint_shadow_variants	Array of ProductionVariant objects. There is one for each model that you want to host at this endpoint in shadow mode with production traffic replicated from the model specified on ProductionVariants.If you use this field, you can only specify one variant for ProductionVariants and one variant for ShadowProductionVariants.	`list`	`[]`	no
endpoint_tags	Tags for the SageMaker Endpoint resource	`map(string)`	`null`	no
initial_instance_count	Initial number of instances used for auto-scaling.	`number`	`1`	no
initial_sampling_percentage	Portion of data to capture. Should be between 0 and 100.	`number`	`100`	no
initial_variant_weight	Determines initial traffic distribution among all of the models that you specify in the endpoint configuration. If unspecified, it defaults to 1.0.	`string`	`null`	no
instance_type	The type of instance to start.	`string`	`"ml.c7i.large"`	no
intel_tags	Intel Tags	`map(string)`	{ "intel-module": "terraform-intel-aws-sagemaker-endpoint", "intel-registry": "https://registry.terraform.io/namespaces/intel" }	no
json_content_types	The JSON content type headers to capture.	`any`	`null`	no
kms_key_arn	Amazon Resource Name (ARN) of a AWS Key Management Service key that Amazon SageMaker uses to encrypt data on the storage volume attached to the ML compute instance that hosts the endpoint.	`string`	`null`	no
model_name	The name of the model to use.	`string`	`null`	no
shadow_accelerator_type	The size of the Elastic Inference (EI) instance to use for the production variant.	`string`	`null`	no
shadow_initial_instance_count	Initial number of instances used for auto-scaling.	`number`	`1`	no
shadow_initial_variant_weight	Determines initial traffic distribution among all of the models that you specify in the endpoint configuration. If unspecified, it defaults to 1.0.	`string`	`null`	no
shadow_instance_type	The type of instance to start.	`string`	`"ml.c6i.large"`	no
shadow_model_name	The name of the model to use.	`string`	`null`	no
shadow_variant_name	The name of the variant. If omitted, Terraform will assign a random, unique name.	`string`	`null`	no
variant_name	The name of the variant. If omitted, Terraform will assign a random, unique name.	`string`	`null`	no

Outputs

Name	Description
endpoint-arn	The Amazon Resource Name (ARN) assigned by AWS to this endpoint
endpoint-configuration-arn	The Amazon Resource Name (ARN) assigned by AWS to this endpoint configuration
endpoint-configuration-name	The name of the endpoint configuration.
endpoint-configuration-tags_all	A map of tags assigned to the endpoint configuration, including those inherited from the provider default_tags configuration block.
endpoint-name	The name of the endpoint

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github		.github
examples		examples
images		images
modules		modules
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
NOTICE.md		NOTICE.md
README.md		README.md
main.tf		main.tf
outputs.tf		outputs.tf
policies.md		policies.md
security.md		security.md
terraform-docs-setup.md		terraform-docs-setup.md
variables.tf		variables.tf
versions.tf		versions.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Intel® Optimized Cloud Modules for Terraform

Amazon SageMaker Endpoint module

Performance Data

Find all the information below plus even more by navigating our full library

INTEL CLOUD PERFROMANCE DATA LIBRARY for AWS

Deliver a Better Customer Support Chatbot Experience with Higher-Value AWS EC2 M7i Instances

Achieve up to 64% Better BERT-Large Inference Work Performances by Selecting AWS M6i Instances Featuring 3rd Gen Intel Xeon Scalable Processors

Amazon M6i Instances Featuring 3rd Gen Intel Xeon Scalable Processors Delivered up to 1.75 Times the Wide & Deep Recommender Performance

Handle Up to 2.94x the Frames per Second for ResNet50 Image Classification with AWS M6i Instances Featuring 3rd Gen Intel Xeon Scalable Processors

Classify up to 1.21x the Frames per Second for ResNet50 Workloads by Choosing AWS M6i Instances with 3rd Gen Intel Xeon Scalable Processors

Choose AWS M6i Instances with 3rd Gen Intel Xeon Scalable Processors for Better BERT Deep Learning Performance

Achieve up to 6.5x the BERT Deep Learning Performance with AWS M6i Instances Enabled by 3rd Gen Intel Xeon Scalable Processors

Usage

Considerations

AWS References

Requirements

Providers

Modules

Resources

Inputs

Outputs

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors 8

Uh oh!

Languages

License

intel/terraform-intel-aws-sagemaker-endpoint

Folders and files

Latest commit

History

Repository files navigation

Intel® Optimized Cloud Modules for Terraform

Amazon SageMaker Endpoint module

Performance Data

Find all the information below plus even more by navigating our full library

Usage

Considerations

AWS References

Requirements

Providers

Modules

Resources

Inputs

Outputs

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Languages