Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ecs): add support for elastic inference accelerators in ECS task defintions #13950

Merged

Conversation

upparekh
Copy link
Contributor

@upparekh upparekh commented Apr 1, 2021

This PR would enable users to attach inference accelerators to their ECS tasks (currently not supported by Fargate). It adds an optional inferenceAccelerators property to the EC2 Task Definition. It also adds resourceRequirement property to the Container definition to enable containers to refer to the inference accelerators provided in the task definition.

Closes #12460

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@gitpod-io
Copy link

gitpod-io bot commented Apr 1, 2021

@upparekh upparekh requested a review from SoManyHs April 1, 2021 23:26
@upparekh upparekh added the @aws-cdk/aws-ecs Related to Amazon Elastic Container label Apr 2, 2021
@upparekh upparekh self-assigned this Apr 2, 2021
packages/@aws-cdk/aws-ecs/README.md Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ecs/lib/container-definition.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ecs/README.md Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ecs/README.md Outdated Show resolved Hide resolved
readonly deviceName?: string;

/**
* The Elastic Inference accelerator type to use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be helpful to include what values are valid here in the docstring (i.e., eia2.medium, eia2.large, and eia2.xlarge, according to the docs).
Normally, having the different valid values as an enum might be nice so that an IDE can pick up the valid values, but the downside is that we'd have to manually add any new types that get added in the future at the risk of always being behind what's available, so I think a string is fine for now.

* The type and amount of a resource to assign to a container.
* @default - No resources assigned.
*/
readonly resourceRequirements?: ResourceRequirement[];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should have the new field be something that is more specific to EIA, since resourceRequirements in CFN technically encompasses both GPU and EIA. GPU currently is specified through its own field, and it may be confusing for this field to exist when we have GPU specified elsewhere.

Copy link
Contributor Author

@upparekh upparekh Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it might get confusing this way. I will make the required changes.

Copy link
Contributor Author

@upparekh upparekh Apr 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the resourceRequirements prop to inferenceAcceleratorResources (as inferenceAccelerators was already a task definiton prop) and changed the type to be a list of device names (strings).

piradeepk and others added 2 commits April 2, 2021 16:01
Updates to documentation and allocation of props in tests

Co-authored-by: Hsing-Hui Hsu <hsinghui@amazon.com>
@upparekh upparekh requested a review from iamhopaul123 April 2, 2021 21:10
Copy link
Contributor

@iamhopaul123 iamhopaul123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very awesome PR (even without considering it is the first PR that you have) 🎉

packages/@aws-cdk/aws-ecs/README.md Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ecs/lib/base/task-definition.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ecs/lib/base/task-definition.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ecs/lib/container-definition.ts Outdated Show resolved Hide resolved
upparekh and others added 2 commits April 5, 2021 11:01
Updated docs and sanity check condition

Co-authored-by: Penghao He <penghaoh@amazon.com>
Copy link
Contributor

@iamhopaul123 iamhopaul123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good just some nits. Just a reminder but make sure every error you throw is covered in the unit test!

packages/@aws-cdk/aws-ecs/lib/container-definition.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ecs/lib/container-definition.ts Outdated Show resolved Hide resolved
function renderResourceRequirements(gpuCount: number = 0, inferenceAcceleratorResources: string[] = []):
CfnTaskDefinition.ResourceRequirementProperty[] | undefined {
if (inferenceAcceleratorResources.length > 0 && gpuCount > 0) {
throw new Error('Both inference accelerator and gpu count defined in the container definition.');
Copy link
Contributor

@iamhopaul123 iamhopaul123 Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw new Error('Both inference accelerator and gpu count defined in the container definition.');
throw new Error('Cannot define both inference accelerator and gpu count in the container definition.');

We might need to be more specific for the reason why this condition fails https://uxdworld.com/2018/05/30/how-to-write-good-error-messages/
we also need unit test for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed about this offline but posting here for the record, we decided to render both gpuCount and inferenceAcceleratorResources properties and let the validators in other stages handle this check.

Updating error messages

Co-authored-by: Penghao He <penghaoh@amazon.com>
@gitpod-io
Copy link

gitpod-io bot commented Apr 9, 2021

iamhopaul123
iamhopaul123 previously approved these changes Apr 14, 2021
Copy link
Contributor

@iamhopaul123 iamhopaul123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with just two nits. Feel free to take off the do-not-merge label once those are addressed!

function renderResourceRequirements(gpuCount: number = 0, inferenceAcceleratorResources: string[] = []):
CfnTaskDefinition.ResourceRequirementProperty[] | undefined {
const ret = [];
if (inferenceAcceleratorResources.length > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit but this check might not be required?

@iamhopaul123 iamhopaul123 added the pr/do-not-merge This PR should not be merged at this time. label Apr 14, 2021
@gitpod-io
Copy link

gitpod-io bot commented Apr 14, 2021

@mergify mergify bot dismissed iamhopaul123’s stale review April 14, 2021 18:20

Pull request has been modified.

iamhopaul123
iamhopaul123 previously approved these changes Apr 14, 2021
@iamhopaul123 iamhopaul123 removed the pr/do-not-merge This PR should not be merged at this time. label Apr 14, 2021
@mergify
Copy link
Contributor

mergify bot commented Apr 14, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot dismissed iamhopaul123’s stale review April 14, 2021 23:28

Pull request has been modified.

@mergify
Copy link
Contributor

mergify bot commented Apr 14, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject89A8053A-LhjRyN9kxr8o
  • Commit ID: 790a118
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify
Copy link
Contributor

mergify bot commented Apr 15, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit 23986d7 into aws:master Apr 15, 2021
@upparekh upparekh deleted the upparekh/support-for-elastic-inference-accelerators branch April 15, 2021 22:40
hollanddd pushed a commit to hollanddd/aws-cdk that referenced this pull request Aug 26, 2021
… defintions (aws#13950)

This PR would enable users to attach inference accelerators to their ECS tasks (currently not supported by Fargate). It adds an optional `inferenceAccelerators` property to the EC2 Task Definition. It also adds `resourceRequirement` property to the Container definition to enable containers to refer to the inference accelerators provided in the task definition.  

Closes aws#12460 

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-ecs Related to Amazon Elastic Container
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(aws-ecs): Add support for ElasticInferenceAccelerators in ECS Task definitions
5 participants