-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(stepfunctions-tasks): add emr-containers support for calling CreateVirtualCluster, DeleteVirtualCluster, and StartJobRun #15262
Conversation
…part fails New-style ARNs are of the form 'arn:aws:s4:us-west-1:12345:/resource-type/resource-name'. We didn't handle that correctly in parseArn(), and instead returned an `undefined` resource, which funnily enough should never happen according to our types. Introduce the concept of ARN formats, represented by an enum in core, and replace the `Stack.parseArn()` method by a new one `Stack.splitArn()`, taking that enum as a required second argument. Spotted in https://github.com/aws/aws-cdk/pull/15140/files#r653112073
… createvirtualcluster, and deleteVirtualCluster finished
In the case of |
… createvirtualcluster, and deleteVirtualCluster finished
I'll probably just start writing the implementation or tests for start-job-run to get as much done as possible and iterate on any failures later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate the correspondence with the underlying API, but one of the big benefits of the CDK is that we can gloss over some of the details and provide helpful defaults for our users. This means that we want to provide fairly flat interfaces, few required parameters, and rely on the type system. To that end, let's try to simplify the interfaces a bit.
interface EmrContainersCreateVirtualClusterProps {
eksCluster: eks.ICluster;
eksNamespace: string; // unless this is derivable somehow? will this almost always be "default"? is it important to be able to configure?
virtualClusterName: string;
tags?: { [key: string]: string };
}
// needs a better name
class VirtualClusterProp {
static fromTaskInput(taskInput: sfn.TaskInput): VirtualClusterArgument
static fromVirtualClusterId(virtualClusterId: string): VirtualClusterArgument
// doesn't exist yet, but will eventually
static fromVirtualCluster(virtualCluster: emrcontainers.IVirtualCluster): VirtualClusterArgument
constructor(public readonly id: string);
}
interface ApplicationConfiguration {} // copy from Configuration
class ReleaseLabel {
static EMR_5_32_0(): ReleaseLabel
static EMR_5_33_0(): ReleaseLabel
static EMR_6_2_0(): ReleaseLabel
static EMR_6_3_0(): ReleaseLabel
constructor(public readonly label: string);
}
interface EmrContainersStartJobRunProps {
virtualCluster: VirtualClusterProp;
executionRole: iam.IRole;
releaseLabel: ReleaseLabel;
entryPoint: sfn.TaskInput;
jobArguments?: sfn.TaskInput;
sparkSubmitParameters?: string; // is this needed or can everything be configured via the application configuration?
applicationConfiguration?: ApplicationConfiguration[];
logging?: boolean; // if set to true, creates a cloudwatch log group
logGroup?: cloudwatch.ILogGroup; // possibly remove altogether
logBucket?: s3.IBucket; // possibly remove altogether
persistentApplicationUI?: boolean;
jobName?: string;
tags?: { [key: string]: string };
}
This is just a first crack at it, let's discuss what you think will give us both a smooth experience and enough flexibility.
packages/@aws-cdk/aws-stepfunctions-tasks/lib/emrcontainers/create-virtual-cluster.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/lib/emrcontainers/create-virtual-cluster.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/lib/emrcontainers/create-virtual-cluster.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/lib/emrcontainers/create-virtual-cluster.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/lib/emrcontainers/create-virtual-cluster.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/lib/emrcontainers/delete-virtual-cluster.ts
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/lib/emrcontainers/start-job-run.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/lib/emrcontainers/start-job-run.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/lib/emrcontainers/base-types.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/lib/emrcontainers/base-types.ts
Outdated
Show resolved
Hide resolved
…eate-virtual-cluster.ts Co-authored-by: Ben Chaimberg <youppi3@gmail.com>
…eate-virtual-cluster.ts Co-authored-by: Ben Chaimberg <youppi3@gmail.com>
…eate-virtual-cluster.ts Co-authored-by: Ben Chaimberg <youppi3@gmail.com>
…eate-virtual-cluster.ts Co-authored-by: Ben Chaimberg <youppi3@gmail.com>
Pull request has been modified.
…eate-virtual-cluster.ts Co-authored-by: Ben Chaimberg <youppi3@gmail.com>
…eate-virtual-cluster.ts Co-authored-by: Ben Chaimberg <youppi3@gmail.com>
…eate-virtual-cluster.ts Co-authored-by: Ben Chaimberg <youppi3@gmail.com>
…art-job-run.ts Co-authored-by: Ben Chaimberg <youppi3@gmail.com>
…se-types.ts Co-authored-by: Ben Chaimberg <youppi3@gmail.com>
…eate-virtual-cluster.ts Co-authored-by: Ben Chaimberg <youppi3@gmail.com>
…lete-virtual-cluster.ts Co-authored-by: Ben Chaimberg <youppi3@gmail.com>
…lete-virtual-cluster.ts Co-authored-by: Ben Chaimberg <youppi3@gmail.com>
…art-job-run.ts Co-authored-by: Ben Chaimberg <youppi3@gmail.com>
Getting this when using the L1 cfnvirtualcluster structure -
|
packages/@aws-cdk/aws-stepfunctions-tasks/lib/emrcontainers/start-job-run.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also re-opened some resolved comments from a previous review that need further work. Once you are done making changes on the integration tests, please confirm that you have deployed both of them and successfully execute the state machines.
packages/@aws-cdk/aws-stepfunctions-tasks/lib/emrcontainers/create-virtual-cluster.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/test/emrcontainers/start-job-run.test.ts
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/test/emrcontainers/start-job-run.test.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/test/emrcontainers/start-job-run.test.ts
Outdated
Show resolved
Hide resolved
'Fn::GetAtt': [ | ||
'EMRContainersStartJobRunMonitoringBucket8BB3FC54', | ||
'Arn', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is something broken in your implementation because this should just be "providedbucket"; looks like you are still creating a new bucket even if one is passed in
packages/@aws-cdk/aws-stepfunctions-tasks/test/emrcontainers/integ.job-submission-workflow.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/test/emrcontainers/integ.start-job-run.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/test/emrcontainers/integ.start-job-run.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/test/emrcontainers/integ.start-job-run.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/test/emrcontainers/integ.start-job-run.ts
Show resolved
Hide resolved
Co-authored-by: Ben Chaimberg <youppi3@gmail.com>
Pull request has been modified.
Co-authored-by: Ben Chaimberg <youppi3@gmail.com>
Please replace the current PR description with a brief overview of what you created (three tasks, some custom resources that do X) and some of the design decisions we made (what we do automatically, what the customer needs to do, why that's the case) |
throw new Error('Execution role cannot be undefined when the virtual cluster ID is not a concrete value. Provide an execution role with the correct trust policy'); | ||
} | ||
|
||
this.logGroup = this.props.monitoring?.logGroup ?? this.props.monitoring?.logging ? new logs.LogGroup(this, 'Monitoring Log Group') : undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something wrong with the implementation of monitoring configuration for when users provide their own log group and log bucket. Currently, it always results in the automatically generated log group and buckets being used instead of user provided ones.
…eate-virtual-cluster.ts Co-authored-by: Ben Chaimberg <youppi3@gmail.com>
@matthewsvu – will you be completing the work on this PR? |
No. Don't have the necessary resources on a personal laptop to run the integration tests and confirm if they work. Someone from my team will pick it up. |
924c117
to
ebfd5f2
Compare
packages/@aws-cdk/aws-stepfunctions-tasks/lib/emrcontainers/create-virtual-cluster.ts
Outdated
Show resolved
Hide resolved
Co-authored-by: kaizen3031593 <36202692+kaizen3031593@users.noreply.github.com>
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Closing in favor of #17103. |
This CDK feature adds support for Emr on Eks by implementing API service integrations for the following three APIs. This PR adds three tasks which support Emr on Eks: 1) [Create Virtual Cluster](https://docs.aws.amazon.com/emr-on-eks/latest/APIReference/API_CreateVirtualCluster.html) 2) [ Start a job run](https://docs.aws.amazon.com/emr-on-eks/latest/APIReference/API_StartJobRun.html) 3) [Delete virtual cluster ](https://docs.aws.amazon.com/emr-on-eks/latest/APIReference/API_DeleteVirtualCluster.html) Continuation of #15262 by @matthewsvu and @BenChaimberg: Closes #15234. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
This CDK feature adds support for Emr on Eks by implementing API service integrations for the following three APIs. This PR adds three tasks which support Emr on Eks: 1) [Create Virtual Cluster](https://docs.aws.amazon.com/emr-on-eks/latest/APIReference/API_CreateVirtualCluster.html) 2) [ Start a job run](https://docs.aws.amazon.com/emr-on-eks/latest/APIReference/API_StartJobRun.html) 3) [Delete virtual cluster ](https://docs.aws.amazon.com/emr-on-eks/latest/APIReference/API_DeleteVirtualCluster.html) Continuation of aws#15262 by @matthewsvu and @BenChaimberg: Closes aws#15234. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
This CDK feature adds support for Emr on Eks by implementing API service integrations for the following three APIs. This PR adds three tasks which support Emr on Eks: 1) [Create Virtual Cluster](https://docs.aws.amazon.com/emr-on-eks/latest/APIReference/API_CreateVirtualCluster.html) 2) [ Start a job run](https://docs.aws.amazon.com/emr-on-eks/latest/APIReference/API_StartJobRun.html) 3) [Delete virtual cluster ](https://docs.aws.amazon.com/emr-on-eks/latest/APIReference/API_DeleteVirtualCluster.html) Continuation of aws#15262 by @matthewsvu and @BenChaimberg: Closes aws#15234. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
API as per documentation here:
https://docs.aws.amazon.com/step-functions/latest/dg/connect-emr-eks.html
https://docs.aws.amazon.com/emr-on-eks/latest/APIReference/API_Operations.html
Overview
This CDK feature has implemented a simplified API service integration for EMR on EKS, providing multiple defaults and automating setup processes to customers using existing CDK library features. Reducing time to setup.
Task features added
Design decisions
CreateVirtualCluster
andStartJobRun
and provided multiple different inputs and defaults for all EMR on EKS SFN service integrations.IGrantable
interface, so users can also add permissions to the automatically generated Job Execution Role inStartJobRun
viagrantPrincipal
.describeVirtualCluster
and retrieves the EKS Cluster'snamespace
andvirtualClusterId
namespace
andvirtualClusterId
retrieved from the previous Custom Resource.Required setup
Running Tests
Change directory into
aws-cdk/packages/@aws-cdk/aws-stepfunctions-tasks
and run the following commandsyarn build
npm test
cdk deploy --app integ.start-job-run.js
cdk deploy --app integ.job-submission-workflow.js
closes #15234
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license