Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(glue): validate maxCapacity, workerCount, and workerType #26241

Merged
merged 3 commits into from
Jul 10, 2023

Conversation

go-to-k
Copy link
Contributor

@go-to-k go-to-k commented Jul 5, 2023

This PR adds the following validations, allows errors to be output in the synth phase instead of the deploy phase.

  • The case that maxCapacity and workerType (and workerCount) are specified.
    • In this doc, Do not set MaxCapacity if using WorkerType and NumberOfWorkers.
  • The case for maxCapacity with GlueVersion 2.0 or later.
    • In this doc, For Glue version 2.0 or later jobs, you cannot specify a Maximum capacity
  • The case that only either workerType or workerCount is specified.
    • Then an error occurs in CloudFormation, Please set both Worker Type and Number of Workers.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@gitpod-io
Copy link

gitpod-io bot commented Jul 5, 2023

@github-actions github-actions bot added the valued-contributor [Pilot] contributed between 6-12 PRs to the CDK label Jul 5, 2023
@aws-cdk-automation aws-cdk-automation requested a review from a team July 5, 2023 12:16
@github-actions github-actions bot added the p2 label Jul 5, 2023
@aws-cdk-automation aws-cdk-automation added the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Jul 5, 2023
@colifran colifran self-assigned this Jul 5, 2023
@@ -719,6 +719,16 @@ export class Job extends JobBase {
}
}

if (props.maxCapacity !== undefined && (props.workerType || props.workerCount !== undefined)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking for some clarification on this - the documentation says that maxCapacity cannot be used when setting workerType and workerCount. Why are we only checking for workerType or workerCount instead of workerType and workerCount?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colifran

Thanks for your review!

I was really worried.

If "and" is used, in the case where maxCapacity is specified but only one of workerType or workerCount is specified, it will break through this branch and throw the error Both workerType and workerCount must be set.
I did this because I thought that the essential error, the collision between maxCapacity and workerType , should be fixed first.

But it's certainly possible that there could be some confusion due to the deviation from the document, so I'll fix it!

Copy link
Contributor

@colifran colifran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@go-to-k this looks great! I just have one small question.

@colifran colifran changed the title chore(glue): add validations for maxCapacity and workerType feat(glue): validate maxCapacity, workerCount, and workerType Jul 6, 2023
@colifran colifran added the pr-linter/exempt-integ-test The PR linter will not require integ test changes label Jul 6, 2023
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request linter has failed. See the aws-cdk-automation comment below for failure reasons. If you believe this pull request should receive an exemption, please comment and provide a justification.

A comment requesting an exemption should contain the text Exemption Request. Additionally, if clarification is needed add Clarification Request to a comment.

@aws-cdk-automation aws-cdk-automation removed the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Jul 6, 2023
@colifran colifran added the pr-linter/exempt-readme The PR linter will not require README changes label Jul 6, 2023
@aws-cdk-automation aws-cdk-automation dismissed their stale review July 6, 2023 14:34

✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.

@aws-cdk-automation aws-cdk-automation added the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Jul 6, 2023
@@ -719,6 +719,16 @@ export class Job extends JobBase {
}
}

if (props.maxCapacity !== undefined && (props.workerType && props.workerCount !== undefined)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not thinking of this earlier, but I have one more thing I think we should add. There is a small chance that maxCapacity and workerCount could be Tokens. Could we wrap this validation in an if statement that first checks if both maxCapacity and workerCount are resolved. As an example:

if (!Token.isUnresolved(props.maxCapacity) && !Token.isUnresolved(props.workerCount)) {
  // your code here
}

We should probably also include an associated unit test for this as well. I hope this makes sense. Let me know if you have any questions.

Copy link
Contributor Author

@go-to-k go-to-k Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colifran

There is a small chance that maxCapacity and workerCount could be Tokens.

I see. But in what case would props.maxCapacity and props.workerCount become Unresolved, since I assume they are input values from the user, not values generated from the resource? Depending on that, I will consider new unit tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@go-to-k I realize it's a small chance, but it's better to add the check for it now since we're adding new code rather than having to go back and fix a bug. Basically, this would only occur if someone were to provide those as Lazy values. As an example:

{
  workerCount: cdk.Lazy.number({ produce: () => 5 }),
  maxCapacity: cdk.Lazy.number({ produce: () => 5 }),
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or when using a Parameter it can also happen.

Copy link
Contributor Author

@go-to-k go-to-k Jul 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colifran

Thanks, I understand the Lazy case, but I may still not agree with validation only when they are not Token.

In the following code, when maxCapacity and workerCount are Token (Unresolved), they are not validated.

if (!Token.isUnresolved(props.maxCapacity) && !Token.isUnresolved(props.workerCount)) {
  if (props.maxCapacity !== undefined && (props.workerType && props.workerCount !== undefined)) {
    throw new Error('maxCapacity cannot be used when setting workerType and workerCount');
  }
}

However, even if the value is a Token and Unresolved, the fact that you are specifying that combination of the parameters is itself a problem (i.e., specifying both maxCapacity and workerCount (and workerType), regardless of the value, is itself a problem).

Therefore, shouldn't they be validated regardless of whether they are tokens or not? In fact, wouldn't it also validate and prevent cases like the following?

new glue.Job(stack, 'Job', {
  workerType: glue.WorkerType.G_1X,
  workerCount: cdk.Lazy.number({ produce: () => 5 }),
  maxCapacity: cdk.Lazy.number({ produce: () => 5 }),
})

I think we can check if the parameters are undefined or not, even if encoded numbers as Token.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@go-to-k good catch. I agree with this. We're checking for just the existence of the properties and not actually validating the value of the properties. That said, I think this is good to go. Thanks for your effort on this!

Copy link
Contributor

@colifran colifran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your change looks good. There is one other minor piece I didn't think about earlier that we should add.

colifran
colifran previously approved these changes Jul 10, 2023
@mergify
Copy link
Contributor

mergify bot commented Jul 10, 2023

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot dismissed colifran’s stale review July 10, 2023 22:28

Pull request has been modified.

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: 5e69c1a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@aws-cdk-automation aws-cdk-automation removed the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Jul 10, 2023
@mergify
Copy link
Contributor

mergify bot commented Jul 10, 2023

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit 349e4d4 into aws:main Jul 10, 2023
@go-to-k go-to-k deleted the glue-max-capacity-validation branch July 11, 2023 04:24
bmoffatt pushed a commit to bmoffatt/aws-cdk that referenced this pull request Jul 29, 2023
)

This PR adds the following validations, allows errors to be output in the synth phase instead of the deploy phase.

- The case that maxCapacity and workerType (and workerCount) are specified.
  - In this [doc](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-jobs-job.html#aws-glue-api-jobs-job-Job), `Do not set MaxCapacity if using WorkerType and NumberOfWorkers.`
- The case for maxCapacity with GlueVersion 2.0 or later.
  - In this [doc](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-jobs-job.html#aws-glue-api-jobs-job-Job), `For Glue version 2.0 or later jobs, you cannot specify a Maximum capacity`
- The case that only either workerType or workerCount is specified.
  - Then an error occurs in CloudFormation, `Please set both Worker Type and Number of Workers.`

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p2 pr-linter/exempt-integ-test The PR linter will not require integ test changes pr-linter/exempt-readme The PR linter will not require README changes valued-contributor [Pilot] contributed between 6-12 PRs to the CDK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants