Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Resource: aws_sagemaker_endpoint #2479

Merged

Conversation

jckuester
Copy link
Contributor

@jckuester jckuester commented Nov 30, 2017

Please review only the last commit of this PR (contains the aws_sagemaker_endpoint resource). The other commits come from rebasing the PRs of theresource_aws_sagemaker_model (#2478) and resource_aws_sagemaker_endpoint_configuration (#2477), which are needed for the tests here.

Add the resource aws_sagemaker_endpoint for the newly announced service called SageMaker.

This PR contributes:

  • Documentation/Website
  • Schema
  • Implementation of CRUD operations
  • Acceptance tests

Questions

  • A SageMaker endpoint can be in one of the following states: "Creating", "Updating", "InService", or "Failed". To create an endpoint that ends up "InService", a Docker container serving a REST API returning status code 200 for http://localhost:8080/ping is needed (acting as a health check for the endpoint). Otherwise, the endpoint is going to end up in state "Failed" (see my example PR Sagemaker example of "deploy your own model" #2585 about how to create an endpoint that will be "InService").
    My question: Currently, I am using an expected error in the tests; is this sufficient or de we want "real working model" for the tests here to test endpoints being "InService"? (this is not so easy to achieve and needs a lot of boilerplate to set up the test; I also tried to use a model provided by Amazon, 174872318107.dkr.ecr.us-west-2.amazonaws.com/kmeans:1, but it needs the trained model data in an S3 bucket to work, sigh).

@jckuester jckuester mentioned this pull request Nov 30, 2017
@radeksimko radeksimko added the new-resource Introduces a new resource. label Nov 30, 2017
@jckuester jckuester force-pushed the feature/resource-aws-sagemaker-endpoint branch from 086e849 to 0b62a91 Compare November 30, 2017 22:21
@jckuester jckuester force-pushed the feature/resource-aws-sagemaker-endpoint branch 3 times, most recently from da54f7e to 32d48d2 Compare December 7, 2017 23:21
@jckuester jckuester force-pushed the feature/resource-aws-sagemaker-endpoint branch 3 times, most recently from 0b1aff5 to e01a3f5 Compare December 18, 2017 11:50
@jckuester jckuester force-pushed the feature/resource-aws-sagemaker-endpoint branch from e01a3f5 to 3b45324 Compare January 7, 2018 19:06
@jckuester jckuester changed the title [WIP] Support for new resource: sagemaker endpoint Support for new resource: sagemaker endpoint Jan 7, 2018
@radeksimko radeksimko added the service/sagemaker Issues and PRs that pertain to the sagemaker service. label Jan 16, 2018
@radeksimko radeksimko changed the title Support for new resource: sagemaker endpoint New Resource: aws_sagemaker_endpoint Jan 16, 2018
@jckuester jckuester force-pushed the feature/resource-aws-sagemaker-endpoint branch 2 times, most recently from cba6271 to 0ee0866 Compare January 21, 2018 00:31
@jckuester jckuester force-pushed the feature/resource-aws-sagemaker-endpoint branch from 0ee0866 to 95ab771 Compare February 14, 2018 11:27
@ghost ghost added the size/XXL Managed by automation to categorize the size of a PR. label Feb 14, 2018
@guerremdq
Copy link

I try to create an endpoint using Terrafrom and after a few minutes the endpoint fail because my primary image in the model was wrong, after i fix the image name on the model terraform applied the change to the model but never recreate the endpoint. The plan was showing nothing to add.

Error: Error applying plan:

1 error(s) occurred:

* module.test-mode.aws_sagemaker_endpoint.endpoint: 1 error(s) occurred:

* aws_sagemaker_endpoint.endpoint: Error waiting for Sagemaker endpoint (My-Test-Model) to become available: unexpected state 'Failed', wanted target 'InService'. last error: %!s(<nil>)

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

After this i delete the endpoint by hand on the AWS console but then terraform fail the validation.

aws_sagemaker_endpoint.endpoint: Refreshing state... (ID: My-Test-Model)

Error: Error refreshing state: 1 error(s) occurred:

* module.test-mode.aws_sagemaker_endpoint.endpoint: 1 error(s) occurred:

* module.test-mode.aws_sagemaker_endpoint.endpoint: aws_sagemaker_endpoint.endpoint: ValidationException: Could not find endpoint "arn:aws:sagemaker:eu-west-1:XXXXXXXX:endpoint/my-test-model".
	status code: 400, request id: 3b34bc75-ab7e-4b51-b3f0-1516d07db804

@jckuester jckuester force-pushed the feature/resource-aws-sagemaker-endpoint branch from 95ab771 to 9d1de26 Compare April 25, 2018 09:39
@ghost ghost added the size/XXL Managed by automation to categorize the size of a PR. label Apr 25, 2018
@jckuester
Copy link
Contributor Author

jckuester commented Apr 25, 2018

@guerremdq I assume that the endpoint isn't recreated when you change the model because theaws_sagemaker_endpoint_configuration resource -- the only direct dependency to the endpoint -- doesn't change (even though it's recreated, but the name stays the same).

What you can do: rename it (see below) or omit the name (then it's randomly generated, which should also update the endpoint):

resource "aws_sagemaker_endpoint_configuration" "foo" {
	name = "foo-updated"

	production_variants {
        ...
	}
}

Let me know if that helps. I am coming back to you regarding your 2nd question about what's happening there.

@jckuester jckuester force-pushed the feature/resource-aws-sagemaker-endpoint branch from 9d1de26 to 583d763 Compare April 26, 2018 19:00
@ghost ghost added the size/XXL Managed by automation to categorize the size of a PR. label Apr 26, 2018
@jckuester jckuester force-pushed the feature/resource-aws-sagemaker-endpoint branch from 583d763 to 4651421 Compare April 26, 2018 19:07
@ghost ghost added the size/XXL Managed by automation to categorize the size of a PR. label Apr 26, 2018
@giannisbetas
Copy link

The endpoint was created successfully however I got the following error:

Error: Error applying plan:

1 error(s) occurred:

* module.sagemaker_testmodel.aws_sagemaker_endpoint.endpoint: 1 error(s) occurred:

aws_sagemaker_endpoint.endpoint: ValidationException: Cannot update endpoint "testmodel" with the currently in use endpoint configuration "testmodel".
	status code: 400, request id: acfe0815-ce8a-4a54-8663-3fc3a715d014

On the AWS console I see that the endpoint is in service.

The next terraform plan shows No changes which means that all resources were created successfully.

@jckuester
Copy link
Contributor Author

jckuester commented May 30, 2018

Thanks for the feedback. I will come up with a fix and a test for updating.

Update: 21st July 2018

@giannisbetas I fixed the error you saw now. Branch is updated with the fix.

@jckuester jckuester force-pushed the feature/resource-aws-sagemaker-endpoint branch from 4651421 to fb3bc80 Compare July 21, 2018 18:08
@jckuester jckuester force-pushed the feature/resource-aws-sagemaker-endpoint branch from e336608 to 7c73c8a Compare January 21, 2019 17:08
@jckuester
Copy link
Contributor Author

@mbfrahry I am done here :) It would be great to get your feedback about how we want to proceed with this test (https://github.com/terraform-providers/terraform-provider-aws/pull/2479/files#diff-18f0a49c700a6954cce00b7ca936a2dfR84). Please see also my questions in the descriptions. At the moment, we don't have a positive test as creating a running endpoint requires a specific docker container that runs a webserver that returns status 200 under http://localhost:8080/ping

@xmj
Copy link

xmj commented Mar 18, 2019

@mbfrahry How can we help to get this merged?

@bflad bflad self-assigned this Mar 22, 2019
Copy link
Contributor

@bflad bflad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi again @jckuester 👋 Here's the next round of review items once we rebase after merging aws_sagemaker_endpoint_configuration 👍 Please reach out with any questions or if you do not have time to implement the feedback.


log.Printf("[DEBUG] Waiting for SageMaker endpoint (%s) to become available", d.Id())
stateConf := &resource.StateChangeConf{
Pending: []string{"Creating"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The AWS Go SDK provides constants for these here 😄

		Pending: []string{sagemaker.EndpointStatusCreating},
		Target:  []string{sagemaker.EndpointStatusInService},

}

d.SetId(name)
log.Printf("[INFO] SageMaker endpoint ID: %s", d.Id())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Extraneous logging can be removed here.


endpointRaw, _, err := SagemakerEndpointStateRefreshFunc(conn, d.Id())()
if err != nil {
if sagemakerErr, ok := err.(awserr.Error); ok && sagemakerErr.Code() == "ValidationException" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We have a helper function for this type of checking:

Suggested change
if sagemakerErr, ok := err.(awserr.Error); ok && sagemakerErr.Code() == "ValidationException" {
if isAWSErr(err, "ValidationException", "") {

aws/resource_aws_sagemaker_endpoint.go Show resolved Hide resolved
}
d.SetPartial("tags")

if d.HasChange("endpoint_config_name") && !d.IsNewResource() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume the d.IsNewResource() check was here because the Create function called the Update function previously. This check can be removed. 👍

log.Printf("[INFO] SageMaker endpoint ID: %s", d.Id())

log.Printf("[DEBUG] Waiting for SageMaker endpoint (%s) to become available", d.Id())
stateConf := &resource.StateChangeConf{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth trying to use the AWS Go SDK provided waiter functions (WaitUntilEndpointDeleted() / WaitUntilEndpointInService()) here, during update, and during deletion to simplify our code instead of using StateChangeConf, e.g.

	_, err := conn.CreateEndpoint(createOpts)

	if err != nil {
		return fmt.Errorf("error creating SageMaker Endpoint: %s", err)
	}

	d.SetId(name)

	describeInput := &sagemaker.DescribeEndpointInput{
		EndpointName: aws.String(name),
	}

	if err := conn.WaitUntilEndpointInService(describeInput); err != nil {
		return fmt.Errorf("error waiting for SageMaker Endpoint (%s) to be in service: %s", name, err)
	}

	return resourceAwsSagemakerEndpointRead(d, meta)

aws/resource_aws_sagemaker_endpoint.go Show resolved Hide resolved
}

resp, err := conn.ListEndpoints(&sagemaker.ListEndpointsInput{
NameContains: aws.String(rs.Primary.ID),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar note as aws_sagemaker_endpoint_configuration resource, should prefer conn.DescribeEndpoint() here to remove potential false positives.

Provides a Sagemaker endpoint resource.
---

# aws\_sagemaker\_endpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Forward slashes are no longer required in the documentation titles 👍

name = "my-endpoint"
configuration_name = "${aws_sagemaker_endpoint_configuration.ec.name}"

tags {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terraform 0.12 will require the equals sign as this is a TypeMap argument and not a configuration block. The equals sign is compatible with Terraform 0.11 and earlier. 👍

Suggested change
tags {
tags = {

@bflad
Copy link
Contributor

bflad commented Mar 25, 2019

aws_sagemaker_endpoint_configuration was merged -- this is ready for rebase. 😄

jckuester and others added 4 commits March 25, 2019 22:30
* Update sagemaker_endpoint_configuration not found message when reading resources

* Update sagemaker_endpoint error when resource is NotFound

* Fix ResourceNotFound to ValidationException
@jckuester jckuester force-pushed the feature/resource-aws-sagemaker-endpoint branch from 7c73c8a to d14c3fe Compare March 25, 2019 21:32
@ghost ghost added size/XL Managed by automation to categorize the size of a PR. and removed size/XXL Managed by automation to categorize the size of a PR. labels Mar 25, 2019
@jckuester
Copy link
Contributor Author

@bflad I'll take care of this in the next few days :)

@bflad bflad added the waiting-response Maintainers are waiting on response from community or contributor. label Mar 27, 2019
@jckuester jckuester force-pushed the feature/resource-aws-sagemaker-endpoint branch from c596d3e to 417ab72 Compare April 3, 2019 10:16
@jckuester
Copy link
Contributor Author

@bflad I am done (see last commit).

@ghost ghost removed the waiting-response Maintainers are waiting on response from community or contributor. label Apr 3, 2019
@jckuester jckuester force-pushed the feature/resource-aws-sagemaker-endpoint branch from 417ab72 to 3dbf0d2 Compare April 3, 2019 10:44
@bflad bflad added this to the v2.5.0 milestone Apr 5, 2019
Copy link
Contributor

@bflad bflad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jckuester 👋 This was in great shape and I was able to finish up the acceptance testing with a working model and endpoint with an AWS SageMaker TensorFlow Serving container and its test model. Away we go! 🚀

Test Model Source: https://github.com/aws/sagemaker-tensorflow-serving-container/blob/7d64ac5e2fe999cd76991e28557dd45abd2290fd/test/data/test-model.tar.gz

Test Configuration:

data "aws_iam_policy_document" "access" {
  statement {
    effect = "Allow"
    actions = [
      "cloudwatch:PutMetricData",
      "logs:CreateLogStream",
      "logs:PutLogEvents",
      "logs:CreateLogGroup",
      "logs:DescribeLogStreams",
      "ecr:GetAuthorizationToken",
      "ecr:BatchCheckLayerAvailability",
      "ecr:GetDownloadUrlForLayer",
      "ecr:BatchGetImage",
      "s3:GetObject",
    ]
    resources = ["*"]
  }
}

data "aws_iam_policy_document" "assume_role" {
  statement {
    actions = [ "sts:AssumeRole" ]
    principals {
      type = "Service"
      identifiers = [ "sagemaker.amazonaws.com" ]
    }
  }
}

resource "aws_iam_role" "test" {
  name = %[1]q
  path = "/"
  assume_role_policy = "${data.aws_iam_policy_document.assume_role.json}"
}

resource "aws_iam_role_policy" "test" {
  role   = "${aws_iam_role.test.name}"
  policy = "${data.aws_iam_policy_document.access.json}"
}

resource "aws_s3_bucket" "test" {
  acl    = "private"
  bucket = %[1]q
}

resource "aws_s3_bucket_object" "test" {
  bucket = "${aws_s3_bucket.test.id}"
  key    = "model.tar.gz"
  source = "test-fixtures/sagemaker-tensorflow-serving-test-model.tar.gz"
}

resource "aws_sagemaker_model" "test" {
  name               = %[1]q
  execution_role_arn = "${aws_iam_role.test.arn}"

  primary_container {
    image          = "520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-tensorflow-serving:1.12-cpu"
    model_data_url = "https://${aws_s3_bucket.test.bucket_regional_domain_name}/${aws_s3_bucket_object.test.key}"
  }

  depends_on = ["aws_iam_role_policy.test"]
}

resource "aws_sagemaker_endpoint_configuration" "test" {
  name = %[1]q

  production_variants {
    initial_instance_count = 1
    initial_variant_weight = 1
    instance_type          = "ml.t2.medium"
    model_name             = "${aws_sagemaker_model.test.name}"
    variant_name           = "variant-1"
  }
}

Output from acceptance testing:

--- PASS: TestAccAWSSagemakerEndpoint_basic (415.53s)
--- PASS: TestAccAWSSagemakerEndpoint_Tags (440.21s)
--- PASS: TestAccAWSSagemakerEndpoint_EndpointConfigName (769.71s)

@bflad bflad merged commit 3dbf0d2 into hashicorp:master Apr 5, 2019
bflad added a commit that referenced this pull request Apr 5, 2019
@jckuester jckuester deleted the feature/resource-aws-sagemaker-endpoint branch April 5, 2019 09:24
@bflad
Copy link
Contributor

bflad commented Apr 5, 2019

This has been released in version 2.5.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

@fhaque
Copy link

fhaque commented Oct 8, 2019

@jckuester , does the Endpoint automatically update if you deploy to the same endpoint with a new endpoint configuration? Or is there an error?

@ghost
Copy link

ghost commented Nov 1, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked and limited conversation to collaborators Nov 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Introduces or discusses updates to documentation. new-resource Introduces a new resource. provider Pertains to the provider itself, rather than any interaction with AWS. service/sagemaker Issues and PRs that pertain to the sagemaker service. size/XL Managed by automation to categorize the size of a PR. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants