Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(eks): missing VPC permissions for fargate profiles #6074

Merged
merged 3 commits into from
Feb 3, 2020
Merged

fix(eks): missing VPC permissions for fargate profiles #6074

merged 3 commits into from
Feb 3, 2020

Conversation

michaelmoussa
Copy link
Contributor

This pull request addresses a couple of issues affecting the FargateCluster and FargateProfile constructs. This excerpt will reproduce the problems:

const vpc = new Vpc(this, 'VPC', {maxAzs: 2});
const cluster = new FargateCluster(this, 'Cluster', {
  clusterName: 'my-app',
  mastersRole: new Role(this, 'ClusterAdminRole', {
    assumedBy: new AccountRootPrincipal()}
  ),
  vpc,
});
cluster.addFargateProfile('MyAppFargateProfile', {
  fargateProfileName: 'my-app',
  selectors: [
    {namespace: 'my-app'}
  ],
  vpc
});

The expected outcome is:

  • A new VPC
  • A new EKS cluster with the default profile created by FargateCluster
  • An additional profile called my-app associated to the new cluster.

Instead, there's a race to see which error will occur first:

  • The default Fargate profile fails to be created because the my-app Fargate profile creation is already in progress (or vice-versa, depending on which one ended up starting first)
  • The default Fargate profile fails to be created due to a Missing permissions for "ec2:DescribeSubnets" (or vice-versa, depending on which one ended up starting first and whether or not it manages to get to that step before the "profile creation already in progress" error pops up).

The first error occurs because you cannot create multiple Fargate profiles at the same time, and the 2nd profile doesn't want for the 1st to finish being created before starting. This isn't a problem if the default profile that FargateCluster creates for you satisfies all your needs; however, if it doesn't, there doesn't appear to be a way to customize it, making the FargateCluster construct unusable here. The second error occurs because the FargateProfile constructor uses props.vpc.selectSubnets(...) to determine which subnets to use for placement of the pods, and the IAM Role being used doesn't include the ec2:DescribeSubnets permission.

This PR addresses both of these issues by adding the missing ec2:DescribeSubnets permission, as well as allowing you to define how you'd like the default profile to be created (e.g. specifying the my-app namespace, some subnets, etc).

NOTE: If you've stumbled upon this PR after encountering the same issue and it has not yet been merged, you can work around the issue by using the Cluster construct instead of FargateCluster, setting {coreDnsComputeType: CoreDnsComputeType.FARGATE, defaultCapacity: 0, kubectlEnabled: true} in the ClusterProps, and then creating the FargateProfile yourself with {subnetSelection: {subnets: vpc.privateSubnets}} in the FargateProfileOptions. Be sure to also include {namespace: 'default'} and {namespace: 'kube-system'} in the profile's selectors, as the Cluster construct does not automatically create a Fargate profile for you!


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@mergify
Copy link
Contributor

mergify bot commented Feb 3, 2020

Title does not follow the guidelines of Conventional Commits. Please adjust title before merge.

@michaelmoussa michaelmoussa changed the title fix(eks) Fargate profile (defaults & vpc permission issue) fix(eks): Default Fargate profile and VPC/Subnet permission issues Feb 3, 2020
@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

eladb
eladb previously requested changes Feb 3, 2020
Copy link
Contributor

@eladb eladb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but I am wondering if this actually solves the root cause. It sounds like addFargateProfile cannot be used more than once against a cluster because CFN will always try to create all the profiles at once.

So, either we simply remove this capability (and perhaps allow people to specific an array of profile options) or we somehow discover all the profiles associated with a cluster a serialize their creation by adding explicit dependencies between them. I prefer the latter option.

// WHEN
new eks.FargateCluster(stack, 'FargateCluster', {
fargateProfile: {
fargateProfileName: 'custom', selectors: [{namespace: 'foo'}, {namespace: 'bar'}]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a unit test that covers the case where fargateProfileName is not specified?

Copy link
Contributor Author

@michaelmoussa michaelmoussa Feb 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eladb Sure. Should I drop the fargate prefix as well given this comment?

edit: Oops, disregard. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

* @default - A profile called "default" with 'default' and 'kube-system'
* selectors will be created if this is left undefined.
*/
readonly fargateProfile?: FargateProfileOptions;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be called “defaultProfile” instead (since this is a fargate cluster the “fargate” prefix is not needed and we usually reserve it to resource attributes (like the physical name))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eladb A case could probably be made for simply calling it profile too, I think. If leaving the value empty results in the same thing being created every time and having the name default, then what you'd be passing as options isn't really the default. That's why I went with fargateProfile instead of defaultProfile. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the PR to use defaultProfile instead of fargateProfile.

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify mergify bot dismissed eladb’s stale review February 3, 2020 18:58

Pull request has been modified.

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@michaelmoussa
Copy link
Contributor Author

michaelmoussa commented Feb 3, 2020

I created #6084 to track the comprehensive root cause fix.

I'm leaving my original response below for posterity, but let's keep any further discussion in #6084. :)


Looks good but I am wondering if this actually solves the root cause. It sounds like addFargateProfile cannot be used more than once against a cluster because CFN will always try to create all the profiles at once.

Technically you could use addFargateProfile(...) to add a new profile to a cluster that had already been established using CDK. That would succeed, but attempting to re-create the whole stack from scratch would indeed fail.

So, either we simply remove this capability (and perhaps allow people to specific an array of profile options) or we somehow discover all the profiles associated with a cluster a serialize their creation by adding explicit dependencies between them. I prefer the latter option.

The latter seems doable. We can modify the FargateProfileResourceHandler to list all profiles associated to the cluster and describe each one to ensure they all have the status ACTIVE before requesting the current one to be built. That should result in the profiles being created sequentially, and we can have it wait 30-60 seconds before checking to see if it's OK to to proceed.

It's not ideal and I'm not sure if this approach is taken anywhere else in CDK, but it avoids the need to somehow capture all profile creation requests to set up multiple cross-dependencies, and we don't need to introduce any kind of independent queue or state file to manage everything. The individual profile request Custom Resource functions should be able to wait politely for their turns.

There is still a possibility of a race condition if there are several pending profiles and two see an opportunity to get themselves created at the same time, but it's unlikely (or, we could address that by not treating an error due to profile creation in progress being a failure, and just have it keep waiting).

I don't have time to work on that now, but if it seems like a viable approach, would you be OK with merging this PR and opening a separate issue for it to be picked up later?

@eladb
Copy link
Contributor

eladb commented Feb 3, 2020

@michaelmoussa wrote:

I don't have time to work on that now, but if it seems like a viable approach, would you be OK with merging this PR and opening a separate issue for it to be picked up later?

Yes, totally fine. Could you just raise a bug titled “can’t define a fargate cluster with more than one profile” and I’ll take it from there.

@eladb eladb changed the title fix(eks): Default Fargate profile and VPC/Subnet permission issues fix(eks): missing VPC permissions for fargate profiles Feb 3, 2020
@mergify
Copy link
Contributor

mergify bot commented Feb 3, 2020

Thank you for contributing! Your pull request is now being automatically merged.

@mergify mergify bot merged commit 0a586fc into aws:master Feb 3, 2020
@michaelmoussa michaelmoussa deleted the eks-on-fargate-profile-fixes branch February 4, 2020 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants