[Enhancement]: one to one mapping with sagemaker jumpstart model creation #35011

Bryson14 · 2023-12-20T15:12:23Z

Terraform Core Version

1.6.5

AWS Provider Version

5.31.0

Affected Resource(s)

Sagemaker Engpoint config.

Expected Behavior

When creating a jumpstart endpoint through the SageMaker studio, you can create a LLM (like mistral) on an managed endpoint. There are few hacks you have to do to get this to work with Terraform because the values are these jumpstart images and s3 locations are not published. But by deploying a model on studio, then using aws cli to get the model's primary_container.environment and model_data_source, terraform can copy it.

The issue is that the aws_sagemaker_endpoint_configuration cannot support the configuration that sagemaker studio creates by default.

Here is the described endpoint configuration made by studio:

{
    "EndpointConfigName": "jumpstart-mistral-1703083557499",
    "EndpointConfigArn": "arn:aws:sagemaker:us-east-1:XXXX:endpoint-config/jumpstart-mistral-1703083557499",
    "ProductionVariants": [
        {
            "VariantName": "AllTraffic",
            "InitialInstanceCount": 1,
            "InstanceType": "ml.g5.2xlarge",
            "ManagedInstanceScaling": {
                "Status": "ENABLED",
                "MinInstanceCount": 1,
                "MaxInstanceCount": 20
            }
        }
    ],
    "CreationTime": "2023-12-20T14:45:57.903000+00:00",
    "ExecutionRoleArn": "arn:aws:iam::XXXX:role/service-role/AmazonSageMaker-ExecutionRole-20231215T143587",
    "EnableNetworkIsolation": true
}

Actual Behavior

With terraform, it is not possible to specify ManagedInstanceScaling:

 "ManagedInstanceScaling": {
                "Status": "ENABLED",
                "MinInstanceCount": 1,
                "MaxInstanceCount": 20
            }

It is also not possible to specify NetworkIsolation

This is the endpoint configuration created by terraform

{
    "EndpointConfigName": "chat-bot-sagemaker-config",
    "EndpointConfigArn": "arn:aws:sagemaker:us-east-1:XXX:endpoint-config/chat-bot-sagemaker-config",
    "ProductionVariants": [
        {
            "VariantName": "mistral-7b-variant",
            "ModelName": "llm-mistral-7b-instruct-model",
            "InitialInstanceCount": 1,
            "InstanceType": "ml.m5.2xlarge",
            "InitialVariantWeight": 1.0,
            "EnableSSMAccess": false
        }
    ],
    "CreationTime": "2023-12-19T22:54:12.219000+00:00",
    "EnableNetworkIsolation": false
}

Relevant Error/Panic Output Snippet

No response

Terraform Configuration Files

# README
# these values for environment and model data source were found by deploying a 
# JumpStart endpoint with sagemaker studio then copying the values on that model 
# using `aws sagemaker describe-model --model-name your_model_name`
# Without these, the endpoint will fail to deploy. You can check cloudwatch logs for the reason
# The standard way to deploy a endpoint is with boto3.sagemaker or python CDK.
# There are no resources online where to find the env and model data source info.
resource "aws_sagemaker_model" "mistral_sagemaker_model" {
  name               = "llm-mistral-7b-instruct-model"
  execution_role_arn = aws_iam_role.sagemaker_trust_role.arn

  primary_container {
    image = var.sagemaker_mistral_public_image
    mode  = "SingleModel"
    environment = {
      ENDPOINT_SERVER_TIMEOUT        = "3600"
      HF_MODEL_ID                    = "/opt/ml/model"
      MAX_BATCH_PREFILL_TOKENS       = "8191"
      MAX_INPUT_LENGTH               = "8191"
      MAX_TOTAL_TOKENS               = "8192"
      MODEL_CACHE_ROOT               = "/opt/ml/model"
      SAGEMAKER_ENV                  = "1"
      SAGEMAKER_MODEL_SERVER_WORKERS = "1"
      SAGEMAKER_PROGRAM              = "inference.py"
      SM_NUM_GPUS                    = "1"
    }

    model_data_source {
      s3_data_source {
        s3_uri = "s3://jumpstart-cache-prod-us-east-1/huggingface-llm/huggingface-llm-mistral-7b-instruct/artifacts/inference-prepack/v1.0.0/",
        s3_data_type = "S3Prefix"
        compression_type = "None"
      }
    }
  }

  tags = {
    Application = var.app_name

    ENDPOINT_SERVER_TIMEOUT        = "3600"
    HF_MODEL_ID                    = "/opt/ml/model"
    MAX_BATCH_PREFILL_TOKENS       = "8191"
    MAX_INPUT_LENGTH               = "8191"
    MAX_TOTAL_TOKENS               = "8192"
    MODEL_CACHE_ROOT               = "/opt/ml/model"
    SAGEMAKER_ENV                  = "1"
    SAGEMAKER_MODEL_SERVER_WORKERS = "1"
    SAGEMAKER_PROGRAM              = "inference.py"
    SM_NUM_GPUS                    = "1"
  }
}

resource "aws_sagemaker_endpoint_configuration" "config" {
  name = "chat-bot-sagemaker-config"

  production_variants {
    variant_name           = "mistral-7b-variant"
    model_name             = aws_sagemaker_model.mistral_sagemaker_model.name
    initial_instance_count = 1
    instance_type          = var.sagemaker_inference_compute_size
  }
  net

  tags = {
    Application = var.app_name
  }
}

resource "aws_sagemaker_endpoint" "endpoint" {
  name                 = "sagemaker-mistral-inference-ep"
  endpoint_config_name = aws_sagemaker_endpoint_configuration.config.name

  tags = {
    Application = var.app_name
  }
}

resource "aws_iam_role" "sagemaker_trust_role" {
  name = "sagemaker_role"

  assume_role_policy = <<-EOF
  {
    "Version": "2012-10-17",
    "Statement": [
      {
        "Action": "sts:AssumeRole",
        "Principal": {
          "Service": "sagemaker.amazonaws.com"
        },
        "Effect": "Allow",
        "Sid": ""
      }
    ]
  }
  EOF

  tags = {
    Application = var.app_name
  }
}

resource "aws_iam_role_policy_attachment" "sagemaker_full_access" {
  role       = aws_iam_role.sagemaker_trust_role.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess"
}

resource "aws_iam_role_policy_attachment" "s3_read_write_access" {
  role       = aws_iam_role.sagemaker_trust_role.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonS3FullAccess"
}

Steps to Reproduce

run standard terraform init, plan, and apply and check the comparison between the endpoint configurations deployed by terraform and SageMaker studio UI.

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

None

The text was updated successfully, but these errors were encountered:

github-actions · 2023-12-20T15:12:44Z

Community Note

Voting for Prioritization

Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
Please see our prioritization guide for information on how we prioritize.
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

If you are interested in working on this issue, please leave a comment.
If this would be your first contribution, please review the contribution guide.

justinretzolk · 2024-01-11T19:33:09Z

Hey @Bryson14 👋 Thank you for taking the time to raise this! As a heads up, we consider adding additional arguments to existing resources to be an enhancement, so I've updated the labels with that in mind.

deepakbshetty · 2024-01-21T21:55:24Z

Hi @Bryson14 , Are you able to provide the ECR image you have used for sagemaker_mistral_public_image. Also, if you can provide a working example either in CLI or anywhere else would great. Most models I have tried do not support managed instance scaling, so its blocking me from writing a test case to enable this feature.

I have used example here - https://repost.aws/questions/QUODaQEyKNTbqWLYszAIYCIg/creating-jumpstart-sagemaker-endpoint-with-terraform-fails-with-model-needs-flash-attention

endpoint_configuration_test.go:162: Step 1/3 error: Error running apply: exit status 1

    Error: creating SageMaker Endpoint Configuration: ValidationException: ManagedInstanceScaling is not supported with the given EndpointConfig setup.
            status code: 400, request id: 32f1694c-6389-43f9-9bea-5245a1497bfd

      with aws_sagemaker_endpoint_configuration.test,
      on terraform_plugin_test.tf line 54, in resource "aws_sagemaker_endpoint_configuration" "test":
      54: resource "aws_sagemaker_endpoint_configuration" "test" {

dkhundley · 2024-01-23T22:20:07Z

Isn't enabling the network isolation done in the SageMaker model and not the endpoint config?

deepakbshetty · 2024-01-23T22:38:01Z

Yes, when a model is specified in endpoint config VPC/subnet details and network isolation cannot be specified and is mutually.exclusive. The endpoint config inherits VPC config and network isolation from model definition.

RLashofRegas · 2024-06-20T23:37:30Z

Not sure if it helps, but at least it may help someone who stumbles upon this issue later. To answer the issue mentioned above of "the values are these jumpstart images and s3 locations are not published". I was able to retrieve these programmatically like this:

(vs-code jupyter notebook script formatting)

# %%
from sagemaker.jumpstart.notebook_utils import list_jumpstart_models
from sagemaker import image_uris, model_uris

# %%
region = "us-west-2"  # Your region.
instance_type = "ml.g5.2xlarge"  # Your desired instance type. Note image will be different for gpu vs cpu instances.

# %%
# find model_id for a given search string
[m for m in list_jumpstart_models(region=region) if "mistral" in m]

# %%
model_id = "huggingface-llm-mistral-7b-instruct"

# %%
# find latest version of model_id
[m for m in list_jumpstart_models(filter=f"model_id=={model_id}", list_versions=True, region=region)]

# %%
model_version = "3.1.0"

# %%
image_uris.retrieve(framework=None, instance_type=instance_type, image_scope="inference", model_id=model_id, model_version=model_version, region=region)

# %%
model_uris.retrieve(instance_type=instance_type, model_scope="inference", model_id=model_id, model_version=model_version, region=region)

github-actions · 2024-09-09T14:25:21Z

Warning

This issue has been closed, meaning that any additional comments are hard for our team to see. Please assume that the maintainers will not see them.

Ongoing conversations amongst community members are welcome, however, the issue will be locked after 30 days. Moving conversations to another venue, such as the AWS Provider forum, is recommended. If you have additional concerns, please open a new issue, referencing this one where needed.

github-actions · 2024-09-13T00:40:03Z

This functionality has been released in v5.67.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

github-actions · 2024-10-13T02:18:28Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

Bryson14 added the bug Addresses a defect in current functionality. label Dec 20, 2023

github-actions bot added service/iam Issues and PRs that pertain to the iam service. service/sagemaker Issues and PRs that pertain to the sagemaker service. labels Dec 20, 2023

terraform-aws-provider bot added the needs-triage Waiting for first response or review from a maintainer. label Dec 20, 2023

justinretzolk changed the title ~~[Bug]: one to one mapping with sagemaker jumpstart model creation~~ [Enhancement]: one to one mapping with sagemaker jumpstart model creation Jan 11, 2024

github-actions bot added the service/iam Issues and PRs that pertain to the iam service. label Jan 11, 2024

gdavison removed the service/iam Issues and PRs that pertain to the iam service. label Jan 15, 2024

deepakbshetty mentioned this issue Jan 21, 2024

sagemaker_domain add docker_settings & fix domain_settings update #35416

Merged

deepakbshetty mentioned this issue Jan 25, 2024

add managed_instance_scaling to sagemaker endpoint config production_variants #35479

Merged

ewbankkit closed this as completed in #35479 Sep 9, 2024

github-actions bot added this to the v5.67.0 milestone Sep 9, 2024

github-actions bot locked as resolved and limited conversation to collaborators Oct 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement]: one to one mapping with sagemaker jumpstart model creation #35011

[Enhancement]: one to one mapping with sagemaker jumpstart model creation #35011

Bryson14 commented Dec 20, 2023

github-actions bot commented Dec 20, 2023

justinretzolk commented Jan 11, 2024

deepakbshetty commented Jan 21, 2024 •

edited

Loading

dkhundley commented Jan 23, 2024

deepakbshetty commented Jan 23, 2024

RLashofRegas commented Jun 20, 2024

github-actions bot commented Sep 9, 2024

github-actions bot commented Sep 13, 2024

github-actions bot commented Oct 13, 2024

[Enhancement]: one to one mapping with sagemaker jumpstart model creation #35011

[Enhancement]: one to one mapping with sagemaker jumpstart model creation #35011

Comments

Bryson14 commented Dec 20, 2023

Terraform Core Version

AWS Provider Version

Affected Resource(s)

Expected Behavior

Actual Behavior

This is the endpoint configuration created by terraform

Relevant Error/Panic Output Snippet

Terraform Configuration Files

Steps to Reproduce

Debug Output

Panic Output

Important Factoids

References

Would you like to implement a fix?

github-actions bot commented Dec 20, 2023

Community Note

justinretzolk commented Jan 11, 2024

deepakbshetty commented Jan 21, 2024 • edited Loading

dkhundley commented Jan 23, 2024

deepakbshetty commented Jan 23, 2024

RLashofRegas commented Jun 20, 2024

github-actions bot commented Sep 9, 2024

github-actions bot commented Sep 13, 2024

github-actions bot commented Oct 13, 2024

deepakbshetty commented Jan 21, 2024 •

edited

Loading