-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Resource: aws_sagemaker_endpoint #2479
New Resource: aws_sagemaker_endpoint #2479
Conversation
086e849
to
0b62a91
Compare
da54f7e
to
32d48d2
Compare
0b1aff5
to
e01a3f5
Compare
e01a3f5
to
3b45324
Compare
cba6271
to
0ee0866
Compare
0ee0866
to
95ab771
Compare
I try to create an endpoint using Terrafrom and after a few minutes the endpoint fail because my primary image in the model was wrong, after i fix the image name on the model terraform applied the change to the model but never recreate the endpoint. The plan was showing nothing to add.
After this i delete the endpoint by hand on the AWS console but then terraform fail the validation.
|
95ab771
to
9d1de26
Compare
@guerremdq I assume that the endpoint isn't recreated when you change the model because the What you can do: rename it (see below) or omit the name (then it's randomly generated, which should also update the endpoint):
Let me know if that helps. I am coming back to you regarding your 2nd question about what's happening there. |
9d1de26
to
583d763
Compare
583d763
to
4651421
Compare
The endpoint was created successfully however I got the following error:
On the AWS console I see that the endpoint is in service. The next |
Thanks for the feedback. I will come up with a fix and a test for updating. Update: 21st July 2018 @giannisbetas I fixed the error you saw now. Branch is updated with the fix. |
4651421
to
fb3bc80
Compare
e336608
to
7c73c8a
Compare
@mbfrahry I am done here :) It would be great to get your feedback about how we want to proceed with this test (https://github.com/terraform-providers/terraform-provider-aws/pull/2479/files#diff-18f0a49c700a6954cce00b7ca936a2dfR84). Please see also my questions in the descriptions. At the moment, we don't have a positive test as creating a running endpoint requires a specific docker container that runs a webserver that returns status 200 under |
@mbfrahry How can we help to get this merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi again @jckuester 👋 Here's the next round of review items once we rebase after merging aws_sagemaker_endpoint_configuration
👍 Please reach out with any questions or if you do not have time to implement the feedback.
|
||
log.Printf("[DEBUG] Waiting for SageMaker endpoint (%s) to become available", d.Id()) | ||
stateConf := &resource.StateChangeConf{ | ||
Pending: []string{"Creating"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The AWS Go SDK provides constants for these here 😄
Pending: []string{sagemaker.EndpointStatusCreating},
Target: []string{sagemaker.EndpointStatusInService},
} | ||
|
||
d.SetId(name) | ||
log.Printf("[INFO] SageMaker endpoint ID: %s", d.Id()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Extraneous logging can be removed here.
|
||
endpointRaw, _, err := SagemakerEndpointStateRefreshFunc(conn, d.Id())() | ||
if err != nil { | ||
if sagemakerErr, ok := err.(awserr.Error); ok && sagemakerErr.Code() == "ValidationException" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We have a helper function for this type of checking:
if sagemakerErr, ok := err.(awserr.Error); ok && sagemakerErr.Code() == "ValidationException" { | |
if isAWSErr(err, "ValidationException", "") { |
} | ||
d.SetPartial("tags") | ||
|
||
if d.HasChange("endpoint_config_name") && !d.IsNewResource() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume the d.IsNewResource()
check was here because the Create
function called the Update
function previously. This check can be removed. 👍
log.Printf("[INFO] SageMaker endpoint ID: %s", d.Id()) | ||
|
||
log.Printf("[DEBUG] Waiting for SageMaker endpoint (%s) to become available", d.Id()) | ||
stateConf := &resource.StateChangeConf{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth trying to use the AWS Go SDK provided waiter functions (WaitUntilEndpointDeleted()
/ WaitUntilEndpointInService()
) here, during update, and during deletion to simplify our code instead of using StateChangeConf
, e.g.
_, err := conn.CreateEndpoint(createOpts)
if err != nil {
return fmt.Errorf("error creating SageMaker Endpoint: %s", err)
}
d.SetId(name)
describeInput := &sagemaker.DescribeEndpointInput{
EndpointName: aws.String(name),
}
if err := conn.WaitUntilEndpointInService(describeInput); err != nil {
return fmt.Errorf("error waiting for SageMaker Endpoint (%s) to be in service: %s", name, err)
}
return resourceAwsSagemakerEndpointRead(d, meta)
} | ||
|
||
resp, err := conn.ListEndpoints(&sagemaker.ListEndpointsInput{ | ||
NameContains: aws.String(rs.Primary.ID), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar note as aws_sagemaker_endpoint_configuration
resource, should prefer conn.DescribeEndpoint()
here to remove potential false positives.
Provides a Sagemaker endpoint resource. | ||
--- | ||
|
||
# aws\_sagemaker\_endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Forward slashes are no longer required in the documentation titles 👍
name = "my-endpoint" | ||
configuration_name = "${aws_sagemaker_endpoint_configuration.ec.name}" | ||
|
||
tags { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Terraform 0.12 will require the equals
sign as this is a TypeMap
argument and not a configuration block. The equals sign is compatible with Terraform 0.11 and earlier. 👍
tags { | |
tags = { |
|
* Update sagemaker_endpoint_configuration not found message when reading resources * Update sagemaker_endpoint error when resource is NotFound * Fix ResourceNotFound to ValidationException
7c73c8a
to
d14c3fe
Compare
@bflad I'll take care of this in the next few days :) |
c596d3e
to
417ab72
Compare
@bflad I am done (see last commit). |
417ab72
to
3dbf0d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jckuester 👋 This was in great shape and I was able to finish up the acceptance testing with a working model and endpoint with an AWS SageMaker TensorFlow Serving container and its test model. Away we go! 🚀
Test Model Source: https://github.com/aws/sagemaker-tensorflow-serving-container/blob/7d64ac5e2fe999cd76991e28557dd45abd2290fd/test/data/test-model.tar.gz
Test Configuration:
data "aws_iam_policy_document" "access" {
statement {
effect = "Allow"
actions = [
"cloudwatch:PutMetricData",
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:CreateLogGroup",
"logs:DescribeLogStreams",
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"s3:GetObject",
]
resources = ["*"]
}
}
data "aws_iam_policy_document" "assume_role" {
statement {
actions = [ "sts:AssumeRole" ]
principals {
type = "Service"
identifiers = [ "sagemaker.amazonaws.com" ]
}
}
}
resource "aws_iam_role" "test" {
name = %[1]q
path = "/"
assume_role_policy = "${data.aws_iam_policy_document.assume_role.json}"
}
resource "aws_iam_role_policy" "test" {
role = "${aws_iam_role.test.name}"
policy = "${data.aws_iam_policy_document.access.json}"
}
resource "aws_s3_bucket" "test" {
acl = "private"
bucket = %[1]q
}
resource "aws_s3_bucket_object" "test" {
bucket = "${aws_s3_bucket.test.id}"
key = "model.tar.gz"
source = "test-fixtures/sagemaker-tensorflow-serving-test-model.tar.gz"
}
resource "aws_sagemaker_model" "test" {
name = %[1]q
execution_role_arn = "${aws_iam_role.test.arn}"
primary_container {
image = "520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-tensorflow-serving:1.12-cpu"
model_data_url = "https://${aws_s3_bucket.test.bucket_regional_domain_name}/${aws_s3_bucket_object.test.key}"
}
depends_on = ["aws_iam_role_policy.test"]
}
resource "aws_sagemaker_endpoint_configuration" "test" {
name = %[1]q
production_variants {
initial_instance_count = 1
initial_variant_weight = 1
instance_type = "ml.t2.medium"
model_name = "${aws_sagemaker_model.test.name}"
variant_name = "variant-1"
}
}
Output from acceptance testing:
--- PASS: TestAccAWSSagemakerEndpoint_basic (415.53s)
--- PASS: TestAccAWSSagemakerEndpoint_Tags (440.21s)
--- PASS: TestAccAWSSagemakerEndpoint_EndpointConfigName (769.71s)
This has been released in version 2.5.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. |
@jckuester , does the Endpoint automatically update if you deploy to the same endpoint with a new endpoint configuration? Or is there an error? |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks! |
Please review only the last commit of this PR (contains the
aws_sagemaker_endpoint
resource). The other commits come from rebasing the PRs of theresource_aws_sagemaker_model
(#2478) andresource_aws_sagemaker_endpoint_configuration
(#2477), which are needed for the tests here.Add the resource
aws_sagemaker_endpoint
for the newly announced service called SageMaker.This PR contributes:
Questions
http://localhost:8080/ping
is needed (acting as a health check for the endpoint). Otherwise, the endpoint is going to end up in state "Failed" (see my example PR Sagemaker example of "deploy your own model" #2585 about how to create an endpoint that will be "InService").My question: Currently, I am using an expected error in the tests; is this sufficient or de we want "real working model" for the tests here to test endpoints being "InService"? (this is not so easy to achieve and needs a lot of boilerplate to set up the test; I also tried to use a model provided by Amazon,
174872318107.dkr.ecr.us-west-2.amazonaws.com/kmeans:1
, but it needs the trained model data in an S3 bucket to work, sigh).