Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sagemaker): add Model L2 construct #22549

Merged
merged 15 commits into from
Nov 9, 2022

Conversation

petermeansrock
Copy link
Contributor

This is the first of three PRs to complete the implementation of RFC 431:
aws/aws-cdk-rfcs#431

fixes #2809

Co-authored-by: Matt McClean mmcclean@amazon.com
Co-authored-by: Long Yao yl1984108@gmail.com
Co-authored-by: Drew Jetter 60628154+jetterdj@users.noreply.github.com
Co-authored-by: Murali Ganesh 59461079+foxpro24@users.noreply.github.com
Co-authored-by: Abilash Rangoju 988529+rangoju@users.noreply.github.com


All Submissions:

Adding new Unconventional Dependencies:

  • This PR adds new unconventional dependencies following the process described here

New Features

  • Have you added the new feature to an integration test?
    • Did you use yarn integ to deploy the infrastructure and generate the snapshot (i.e. yarn integ without --dry-run)?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

This is the first of three PRs to complete the implementation of RFC
431:
aws/aws-cdk-rfcs#431

fixes aws#2809

Co-authored-by: Matt McClean <mmcclean@amazon.com>
Co-authored-by: Long Yao <yl1984108@gmail.com>
Co-authored-by: Drew Jetter <60628154+jetterdj@users.noreply.github.com>
Co-authored-by: Murali Ganesh <59461079+foxpro24@users.noreply.github.com>
Co-authored-by: Abilash Rangoju <988529+rangoju@users.noreply.github.com>
@gitpod-io
Copy link

gitpod-io bot commented Oct 18, 2022

@github-actions github-actions bot added effort/large Large work item – several weeks of effort feature-request A feature should be added or improved. p2 beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK labels Oct 18, 2022
@aws-cdk-automation aws-cdk-automation requested a review from a team October 18, 2022 21:35
@petermeansrock petermeansrock marked this pull request as ready for review October 18, 2022 21:40
@Naumel Naumel added the @aws-cdk/aws-sagemaker Related to AWS SageMaker label Oct 19, 2022
@kaizencc kaizencc self-requested a review October 20, 2022 12:18
@kaizencc kaizencc self-assigned this Oct 20, 2022
Copy link
Contributor

@kaizencc kaizencc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @petermeansrock! Sorry for the delay in getting this reviewed. In general, this is very high quality, thanks :). I have a few comments here and there, but no blockers. After they're addressed, i'll take a more scrutinizing look on the tests, but in general they look good. and then we'll merge!

});

test('by supplying a file with an unsupported extension, an exception is thrown', () => {
// GIVEN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: if there's nothing given, we might as well omit this and save 2 lines :)

packages/@aws-cdk/aws-sagemaker/README.md Show resolved Hide resolved
Comment on lines 15 to 16
* For example, 012345678910.dkr.ecr.<region-name>.amazonaws.com/<repository-name>:latest or
* 012345678910.dkr.ecr.<region-name>.amazonaws.com/<repository-name>@sha256:94afd1f2e64d908bc90dbca0035a5b567EXAMPLE.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* For example, 012345678910.dkr.ecr.<region-name>.amazonaws.com/<repository-name>:latest or
* 012345678910.dkr.ecr.<region-name>.amazonaws.com/<repository-name>@sha256:94afd1f2e64d908bc90dbca0035a5b567EXAMPLE.
* For example, `012345678910.dkr.ecr.<region-name>.amazonaws.com/<repository-name>:latest` or
* `012345678910.dkr.ecr.<region-name>.amazonaws.com/<repository-name>@sha256:94afd1f2e64d908bc90dbca0035a5b567EXAMPLE`.

Comment on lines 7 to 8
// The only supported extension for local asset model data
const ARTIFACT_EXTENSION = '.tar.gz';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think this might be extended by sagemaker anytime soon? if there's a chance that they do, it feels more maintainable to have this be a list:

const ARTIFACT_EXTENSIONS = ['.tar.gz'];

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is slightly orthogonal to your question, but I first want to call it out as your question itself is forward looking.


Technically, SageMaker has already tweaked this behavior, but for multi-model support (described in the RFC as a future addition that can be added, so not covered by this PR).

With multi-model endpoints, SageMaker expects the ModelDataUrl to contain an S3 prefix. At runtime, when SageMaker receives an InvokeEndpoint request for multi-model endpoints, a customer passes a TargetModel parameter specifying the S3 suffix (relative to the ModelDataUrl pointing to the model artifacts).

So assume in S3 I have the following object keys:

  • s3://bucket/some/path/foo.tar.gz
  • s3://bucket/some/path/bar.tar.gz

With this setup, at configuration time, I would create my model with a ModelDataUrl of s3://bucket/some/path/.

At runtime, when invoking the endpoint, I'd either pass foo.tar.gz or bar.tar.gz as my TargetModel.

Unlike with single models (including inference pipelines, both covered in this PR), with multi-models, SageMaker doesn't know at the point of instance provisioning which model artifacts to load. It instead lazily loads the artifacts at request-time.

To close on this comment:

  • I think the .tar.gz check makes sense for now as it still protects customers in the single model scenario.
  • I'm not sure what the best strategy would be for protecting against the multi-model scenario, but I assume figuring that out is out-of-scope for this PR.
  • Given that .tar.gz has been the only supported artifact type for 3+ years (regardless of single vs multi-model), I don't expect that to change any time soon. In other words, it seems sufficient to keep the existing single artifact suffix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that the scenario you describe is out-of-scope for this PR. I think for the multi-model scenario the best we can do is just skip this synth-time check and leave it to deploy-time unfortunately. But not necessary at all for this first PR.

packages/@aws-cdk/aws-sagemaker/lib/model.ts Show resolved Hide resolved
Comment on lines 294 to 295
// apply a name tag to the model resource
cdk.Tags.of(this).add(NAME_TAG, this.node.path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this is just my unfamiliarity with the resource, but is there a reason why we always add a tag to the resource?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually thought this was a more prevalent pattern in the CDK itself, but upon performing a more recent search, it looks like only a handful of modules do this. I'm not personally attached to this tag, so if you'd like, I can just rip this out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I've never seen this before, which is the cause for my confusion. In these scenarios I prefer to leave them out. We can always add them in later, but we can't remove it later if it's already there.

Comment on lines 311 to 312
// post-construction validation
this.node.addValidation({ validate: () => this.validateContainers() });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not called inside of this.renderContainers()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it just seemed intuitive to me to perform synthesis-time validation using the validation hook. I can certainly move the check though if you'd like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can still have this.validateContainers(), but that should be called istelf inside this.renderContainers(). I call this out because it seems different from the designs of other L2s, and for maintainability purposes, it's probably good to keep things relatively in line with other L2s.

/**
* The name of this model.
*/
readonly modelName: string;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why modelName here? The "real" required property is modelArn, which you are then splicing together in the Import construct. However, this then requires the model to exist in the same environment that the stack is in. We're not serving customers who have models in different environments that they want to reference here.

I think this is what we want:

  1. change modelName to modelArn here as a "required" attribute.
  2. the implementation of Model.fromModelName() will need to do the arn generation before passing the arn into fromModelAttributes.
  3. optionally include a fromModelArn() method.

In the actual implementation of the Import construct, I'll write a comment on how we can get the environment from the arn to support this use case i'm describing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "real" required property is modelArn, which you are then splicing together in the Import construct.

What do you mean by "'real' required property"? This modelName field represents the physical name of the Model resource. It corresponds to the AWS::SageMaker::Model resource's ModelName attribute. Here are the snippets relevant to the physical name and its use on the L1 CfnModel construct:

    super(scope, id, {
       physicalName: props.modelName,
     });

    ...

    const model = new CfnModel(this, 'Model', {
       executionRoleArn: this.role.roleArn,
       modelName: this.physicalName,
       primaryContainer: cdk.Lazy.any({ produce: () => this.renderPrimaryContainer() }),
       vpcConfig: cdk.Lazy.any({ produce: () => this.renderVpcConfig() }),
       containers: cdk.Lazy.any({ produce: () => this.renderContainers() }),
     });

However, this then requires the model to exist in the same environment that the stack is in. We're not serving customers who have models in different environments that they want to reference here.

I think I might be missing something. Model resources cannot be shared cross-account nor cross-region. A Model is always referenced (via CreateEndpoint, CreateTransformJob) by its name and not ARN. Doesn't that mean that this PR is working as intended?

Copy link
Contributor

@kaizencc kaizencc Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By 'real', I mean that in the fromModelAttributes method further below, we do the following:

        this.modelArn = cdk.Stack.of(this).formatArn({
          service: 'sagemaker',
          resource: 'model',
          resourceName: this.modelName,
        });

i.e. we need the modelArn anyway.

While Model resources cannot be shared cross-account or cross-region today, if that were ever possible we would immediately have a bug in how we import resources. See https://github.com/aws/aws-cdk/pull/22549/files#r1008209404 for what that bug is.

I don't have a problem with having a fromModelName method the way you have it currently. I think we should also have a fromModelArn method. It would look something like this:

  public static fromModelArn(scope: Construct, id: string, arn: string): IModel {
    const modelName = // splice arn to get name
    return Model.fromModelAttributes(scope, id, { modelName, modelArn: arn });
  }

  public static fromModelName(scope: Construct, id: string, name: string): IModel {
    const modelArn = cdk.Stack.of(scope).formatArn({
          service: 'sagemaker',
          resource: 'model',
          resourceName: name,
        });
    return Model.fromModelAttributes(scope, id, { modelName: name, modelArn: modelArn });
  }

  public static fromModelAttributes(scope: Construct, id: string, attrs: ModelAttributes): IModel {
    const modelName = attrs.modelName;
    const role = attrs.role;
    const modelArn = attrs.modelArn;

    class Import extends ModelBase {
      public readonly modelName = modelName;
      public readonly role = role;
      public readonly grantPrincipal: iam.IPrincipal;
      public readonly modelArn = modelArn;

      constructor(s: Construct, i: string) {
        super(s, i, {
          environmentFromArn: modelArn,
        });
        ...
      }
  }

The payoff is the environmentFromArn. Yes, it may be useless today because Models cannot be shared cross env. However, we are trying to standardize how we do imports for L2 to guard against this cross env sharing if/when it gets introduced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean I'd end up using a technique like this to prevent customers from attempting to use IModel across accounts-or-regions downstream within the EndpointConfig implementation (and from IEndpointConfig within the Endpoint implementation)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just do that in fromModelAttributes for now -- if the arn region / account is not the same as the scope region / account, then throw an error.

This is still better than before because we now explicitly throw an error in these scenarios, and can easily remove the synth-time check in the future if sagemaker decides to support this.

Copy link
Contributor Author

@petermeansrock petermeansrock Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it, there are two additional cross-account-and-region scenarios I need to guard against as well (which I've not yet done in downstream changes, including EndpointConfig for example):

  1. Reusing a Model instance from an origin stack in a different account/region:
      const app = new cdk.App();
    
      const originStack = new cdk.Stack(app, 'OriginStack', {
        env:
          {
            region: 'us-west-2',
            account: '123456789012',
          },
      });
      const originStackModel = new sagemaker.Model(originStack, 'MyModel', {
        modelName: 'explicitly-named-model',
        containers: [{
          image: ContainerImage.fromAsset(path.join(__dirname, 'test-image')),
        }],
      });
    
      const destinationStack = new cdk.Stack(app, 'DestinationStack', {
        env:
          {
            region: 'us-west-2',
            account: '123456789013',
          },
      });
    
      // FIXME: This should fail at synthesis but doesn't currently
      new sagemaker.EndpointConfig(destinationStack, 'MyEndpointConfig', {
        instanceProductionVariants: [{
          variantName: 'my-variant',
          model: originStackModel,
        }],
      });
    
      app.synth();
    
  2. Using an unowned IModel instance (generated via Model.from*) from a different account region:
      const app = new cdk.App();
    
      const originStack = new cdk.Stack(app, 'OriginStack', {
        env:
          {
            region: 'us-west-2',
            account: '123456789012',
          },
      });
      const originStackModel = sagemaker.Model.fromModelName(originStack, 'MyModel', 'my-name');
    
      const destinationStack = new cdk.Stack(app, 'DestinationStack', {
        env:
          {
            region: 'us-west-2',
            account: '123456789013',
          },
      });
    
      // FIXME: This should fail at synthesis but doesn't currently
      new sagemaker.EndpointConfig(destinationStack, 'MyEndpointConfig', {
        instanceProductionVariants: [{
          variantName: 'my-variant',
          model: originStackModel,
        }],
      });
    
      app.synth();
    

Rather than implement one region/account check in fromModelAttributes, wouldn't it be better for me to handle both Model and unowned IModel instances universally within EndpointConfig? If so, since I've already implemented Model.fromModelArn without any region/account checks infromModelAttributes, this PR is ready for review. I'll go ahead & mark it as so just in case, but if you think that an additional check here is warranted, I'll incorporate that change as well.

public readonly modelArn: string;

constructor(s: Construct, i: string) {
super(s, i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
super(s, i);
super(s, i, {
environmentFromArn: modelArn,
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this gets the environment from the arn, so that we serve imports from different envs.

Copy link
Contributor Author

@petermeansrock petermeansrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, @kaizencc! Here are a set of follow-up questions that I had for a subset of your comments.

Comment on lines 7 to 8
// The only supported extension for local asset model data
const ARTIFACT_EXTENSION = '.tar.gz';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is slightly orthogonal to your question, but I first want to call it out as your question itself is forward looking.


Technically, SageMaker has already tweaked this behavior, but for multi-model support (described in the RFC as a future addition that can be added, so not covered by this PR).

With multi-model endpoints, SageMaker expects the ModelDataUrl to contain an S3 prefix. At runtime, when SageMaker receives an InvokeEndpoint request for multi-model endpoints, a customer passes a TargetModel parameter specifying the S3 suffix (relative to the ModelDataUrl pointing to the model artifacts).

So assume in S3 I have the following object keys:

  • s3://bucket/some/path/foo.tar.gz
  • s3://bucket/some/path/bar.tar.gz

With this setup, at configuration time, I would create my model with a ModelDataUrl of s3://bucket/some/path/.

At runtime, when invoking the endpoint, I'd either pass foo.tar.gz or bar.tar.gz as my TargetModel.

Unlike with single models (including inference pipelines, both covered in this PR), with multi-models, SageMaker doesn't know at the point of instance provisioning which model artifacts to load. It instead lazily loads the artifacts at request-time.

To close on this comment:

  • I think the .tar.gz check makes sense for now as it still protects customers in the single model scenario.
  • I'm not sure what the best strategy would be for protecting against the multi-model scenario, but I assume figuring that out is out-of-scope for this PR.
  • Given that .tar.gz has been the only supported artifact type for 3+ years (regardless of single vs multi-model), I don't expect that to change any time soon. In other words, it seems sufficient to keep the existing single artifact suffix.

/**
* The name of this model.
*/
readonly modelName: string;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "real" required property is modelArn, which you are then splicing together in the Import construct.

What do you mean by "'real' required property"? This modelName field represents the physical name of the Model resource. It corresponds to the AWS::SageMaker::Model resource's ModelName attribute. Here are the snippets relevant to the physical name and its use on the L1 CfnModel construct:

    super(scope, id, {
       physicalName: props.modelName,
     });

    ...

    const model = new CfnModel(this, 'Model', {
       executionRoleArn: this.role.roleArn,
       modelName: this.physicalName,
       primaryContainer: cdk.Lazy.any({ produce: () => this.renderPrimaryContainer() }),
       vpcConfig: cdk.Lazy.any({ produce: () => this.renderVpcConfig() }),
       containers: cdk.Lazy.any({ produce: () => this.renderContainers() }),
     });

However, this then requires the model to exist in the same environment that the stack is in. We're not serving customers who have models in different environments that they want to reference here.

I think I might be missing something. Model resources cannot be shared cross-account nor cross-region. A Model is always referenced (via CreateEndpoint, CreateTransformJob) by its name and not ARN. Doesn't that mean that this PR is working as intended?

/**
* Hostname of the container.
*
* @default - none
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the docs (which is a lot more complicated than I remember):

This parameter is ignored for models that contain only a PrimaryContainer. When a ContainerDefinition is part of an inference pipeline, the value of the parameter uniquely identifies the container for the purposes of logging and metrics. For information, see Use Logs and Metrics to Monitor an Inference Pipeline. If you don't specify a value for this parameter for a ContainerDefinition that is part of an inference pipeline, a unique name is automatically assigned based on the position of the ContainerDefinition in the pipeline. If you specify a value for the ContainerHostName for any ContainerDefinition that is part of an inference pipeline, you must specify a value for the ContainerHostName parameter of every ContainerDefinition in that pipeline.

So, in summary:

  1. For a single container model, the value is always ignored.
  2. For inference pipeline models:
    1. If one hostname is specified for a container, hostnames must be specified for all of them.
    2. Otherwise, a unique name is automatically "assigned" by SageMaker.

Here are some action items that could come out of this:

  1. (Debatable) - Throw an exception on synthesis of single container model's with a hostname specified.
  2. Throw an exception on synthesis of an inference pipeline model with partially specified hostnames.
  3. Expand ContainerDefinition documentation for containerHostname to include these details (including real @default behavior).

Please let me know (1) if those sound good and (2) if you can think of others.

packages/@aws-cdk/aws-sagemaker/lib/model.ts Show resolved Hide resolved
Comment on lines 294 to 295
// apply a name tag to the model resource
cdk.Tags.of(this).add(NAME_TAG, this.node.path);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually thought this was a more prevalent pattern in the CDK itself, but upon performing a more recent search, it looks like only a handful of modules do this. I'm not personally attached to this tag, so if you'd like, I can just rip this out.

Comment on lines 311 to 312
// post-construction validation
this.node.addValidation({ validate: () => this.validateContainers() });
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it just seemed intuitive to me to perform synthesis-time validation using the validation hook. I can certainly move the check though if you'd like.

@petermeansrock petermeansrock marked this pull request as draft November 7, 2022 21:02
Co-authored-by: Kaizen Conroy <36202692+kaizencc@users.noreply.github.com>
@mergify mergify bot dismissed kaizencc’s stale review November 7, 2022 21:06

Pull request has been modified.

@petermeansrock petermeansrock marked this pull request as ready for review November 9, 2022 01:26
Copy link
Contributor

@kaizencc kaizencc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Any sharp edges that come out of this can be worked on in separate PRs -- it's probably more important to get this out there and usable as an experimental construct first. Great work!

@mergify
Copy link
Contributor

mergify bot commented Nov 9, 2022

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: d82d8c6
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@petermeansrock
Copy link
Contributor Author

@Mergifyio requeue

@mergify
Copy link
Contributor

mergify bot commented Nov 9, 2022

requeue

❌ Command disallowed due to command restrictions in the Mergify configuration.

  • sender-permission>=write

@mergify
Copy link
Contributor

mergify bot commented Nov 9, 2022

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit 93915f1 into aws:main Nov 9, 2022
@petermeansrock petermeansrock deleted the sagemaker-l2-model branch November 10, 2022 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-sagemaker Related to AWS SageMaker beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK effort/large Large work item – several weeks of effort feature-request A feature should be added or improved. p2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sagemaker: Create L2 construct for Amazon SageMaker
4 participants