Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(custom-resources): AwsCustomResource leaks assumed role to other custom resources #15425

Closed
nicolai-shape opened this issue Jul 5, 2021 · 9 comments · Fixed by #15776
Closed
Labels
@aws-cdk/custom-resources Related to AWS CDK Custom Resources bug This issue is a bug. effort/medium Medium work item – several days of effort p1

Comments

@nicolai-shape
Copy link

nicolai-shape commented Jul 5, 2021

The runtime handler of the AwsCustomResource does not correctly reset the credentials after executing when given a assumedRoleArn in any of the AwsSdkCall objects.

This means if you have two AwsCustomResource constructs in the same stack, and the first one that is deployed supplies a assumedRoleArn then the second one will fail to deploy if it executes any commands that are not covered by the policy of the assumed role of the first custom resource.

This obviously only happens if the execution context of the Lambda is reused, which is quite likely if you have a dependency between the custom resources so they're not executed concurrently.

Reproduction Steps

import * as cdk from '@aws-cdk/core'
import * as iam from '@aws-cdk/aws-iam'
import * as customResources from '@aws-cdk/custom-resources'

export class TestStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props)

    const fooRole = new iam.Role(this, 'FooRole', {
      assumedBy: new iam.ServicePrincipal('lambda.amazonaws.com')
    })
    fooRole.addToPolicy(
      new iam.PolicyStatement({
        actions: ['s3:CreateBucket'],
        resources: ['arn:aws:s3:::Foo']
      })
    )

    // We specify an explicit role that is only allowed to create a bucket called 'Foo'
    new customResources.AwsCustomResource(this, 'Foo', {
      onCreate: {
        service: 'S3',
        action: 'createBucket',
        parameters: {
          Bucket: 'Foo'
        },
        assumedRoleArn: fooRole.roleArn,
        physicalResourceId: customResources.PhysicalResourceId.fromResponse('Bucket.Id')
      },
      installLatestAwsSdk: false,
      policy: customResources.AwsCustomResourcePolicy.fromSdkCalls({
        resources: customResources.AwsCustomResourcePolicy.ANY_RESOURCE
      })
    })

    // We do not explicitly assume a role and expect the custom resource policy to grant permission to CreateBucket
    new customResources.AwsCustomResource(this, 'Bar', {
      onCreate: {
        service: 'S3',
        action: 'createBucket',
        parameters: {
          Bucket: 'Bar'
        },
        physicalResourceId: customResources.PhysicalResourceId.fromResponse('Bucket.Id')
      },
      installLatestAwsSdk: false,
      policy: customResources.AwsCustomResourcePolicy.fromSdkCalls({
        resources: customResources.AwsCustomResourcePolicy.ANY_RESOURCE
      })
    })
  }
}

What did you expect to happen?

Deployment of both resources should succeed.

What actually happened?

The deployment of the second custom resource fails due to insufficient permissions of the role that was specified for the first custom resource.

Environment

  • CDK CLI Version : 1.109.0
  • Framework Version: 1.109.0
  • Node.js Version: v15.14.0
  • OS : MacOS 11.2.3
  • Language (Version): TypeScript (3.9.7)

Other

The issue is located here https://github.com/aws/aws-cdk/blob/master/packages/@aws-cdk/custom-resources/lib/aws-custom-resource/runtime/index.ts#L134. I propose to replace lines 134-143 with the following:

let credentials: AWS.Credentials | undefined

if (call.assumedRoleArn) {
  const timestamp = new Date().getTime()

  const params = {
    RoleArn: call.assumedRoleArn,
    RoleSessionName: `${physicalResourceId}-${timestamp}`
  }

  credentials = new AWS.ChainableTemporaryCredentials({
    params: params
  })
}

const awsService = new (AWS as any)[call.service]({
  apiVersion: call.apiVersion,
  region: call.region,
  credentials: credentials
})

Instead of modifying the global AWS SDK config, we only apply it to the temporary service client.


This is 🐛 Bug Report

@nicolai-shape nicolai-shape added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jul 5, 2021
@github-actions github-actions bot added the @aws-cdk/custom-resources Related to AWS CDK Custom Resources label Jul 5, 2021
@rix0rrr rix0rrr added effort/medium Medium work item – several days of effort p1 and removed needs-triage This issue or PR still needs to be triaged. labels Jul 7, 2021
@rix0rrr rix0rrr removed their assignment Jul 7, 2021
@theipster
Copy link
Contributor

I've been trialling @nicolai-shape's suggestion above for roughly a week now, with great success. 👍

While the fix itself seems pretty straightforward, I guess the complication is how to test it. I've prepared some failing tests to describe the scenario, which then starts passing once the above fix gets applied: theipster#1.

Does this look useful / worthy of submitting a formal PR?

@lebetz
Copy link

lebetz commented Jul 23, 2021

@theipster , for me it looks very useful and this bug is a showstopper for me and also a security issue. So it would be great if you can submit a formal PR.

@julienbonastre
Copy link

Yeh it's not ideal, but I ended up working around this by using multiple nested stacks just to house the lambda/awscustomresource within because the Lambda is stack-scoped and thus it's environmental context stays consistent for role assumption.

YMMV of course if you use more discrete principal role names in your target roles.

@mattvanstone
Copy link

I've discovered a related issue when you have one custom resource with multiple SDK calls (onCreate, onUpdate, onDelete).

In the example below onCreate will work successfully but if you update this stack or delete it shortly after creation and the lambda is reused you will get an error Received response status [FAILED] from custom resource. Message returned: Missing credentials in config, if using AWS_CONFIG_FILE, set AWS_SDK_LOAD_CONFIG=1

This module creates a DNS record in a hosted zone in another account, using a cross account role.

import { Construct } from "@aws-cdk/core";
import { CloudFrontTarget } from "@aws-cdk/aws-route53-targets"
import { AwsCustomResource, AwsCustomResourcePolicy, PhysicalResourceId } from "@aws-cdk/custom-resources";

interface Route53CrossAccountRecordProps {
  /** The Route53 domain name to create */
  Name: string,
  /** CloudFront URL to create an alias to */
  DNSName: string,
  /** The arn of the role that has access to update Route53 */
  AssumeRoleArn: string,
  /** The Route53 hosted zone ID to create the record in */
  HostedZoneId: string,
}

export class Route53CrossAccountRecord extends Construct {
  constructor(scope: Construct, id: string, props: Route53CrossAccountRecordProps) {
    super(scope, id);
    
    const ResourceRecordSet = {
      Name: props.Name,
      Type: "A",
      AliasTarget: {
        DNSName: props.DNSName,
        EvaluateTargetHealth: false,
        HostedZoneId: CloudFrontTarget.CLOUDFRONT_ZONE_ID,
      },
    };

    new AwsCustomResource(this, "dnsRecord", {
      policy: AwsCustomResourcePolicy.fromSdkCalls({ resources: AwsCustomResourcePolicy.ANY_RESOURCE }),
      onCreate: {
        assumedRoleArn: props.AssumeRoleArn,
        service: "Route53",
        action: "changeResourceRecordSets",
        parameters: { 
          HostedZoneId: props.HostedZoneId,
          ChangeBatch: {
            Changes: [
              {
                Action: "UPSERT",
                ResourceRecordSet,
              },
            ],
          }, 
        },
        physicalResourceId: PhysicalResourceId.of(ResourceRecordSet.Name),
      },
      onUpdate: {
        assumedRoleArn: props.AssumeRoleArn,
        service: "Route53",
        action: "changeResourceRecordSets",
        parameters: { 
          HostedZoneId: props.HostedZoneId,
          ChangeBatch: {
            Changes: [
              {
                Action: "UPSERT",
                ResourceRecordSet,
              },
            ],
          }, 
        },
        physicalResourceId: PhysicalResourceId.of(ResourceRecordSet.Name),
      },
      onDelete: {
        assumedRoleArn: props.AssumeRoleArn,
        service: "Route53",
        action: "changeResourceRecordSets",
        parameters: { 
          HostedZoneId: props.HostedZoneId,
          ChangeBatch: {
            Changes: [
              {
                Action: "DELETE",
                ResourceRecordSet,
              },
            ],
          }, 
        },
        physicalResourceId: PhysicalResourceId.of(ResourceRecordSet.Name),
      },
    });
  }
}

In the logs I can see this is because the role is already being used by the lambda, and then it tries to assume itself:

"errorMessage": "User: arn:aws:sts::[REDACTED]:assumed-role/[REDACTED] is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::[REDACTED]:role/[REDACTED]",

if you remove assumedRoleArn: props.AssumeRoleArn from the onUpdate and onDelete SDK calls it will work, as long as the lambda is reused. If the lambda has idled out and a new one is invoked then onUpdate and onDelete will fail because the default role does not have access to the hosted zone.

Workaround

I have been able to work around this by granting my role permission to assume itself.

@julienbonastre
Copy link

Hey @mattvanstone . So just to confirm, is the target assumeRoleArn the same each time or does it switch between different account ID's?

I have a pattern which uses AWS Custom Resources in this fashion too for manipulating remote/spoke account Route53ResolverRule associations via SDK API, however what I found is that this was due to the singleton lambda trying to reissue the API call with the role it had assumed and cached for a different account in its context.

As my pattern issues hundreds of calls to the AWSCustomResource within seconds it appeared as though the Lambda wouldn't re-claim a new STS role assumption to the provided target ARN role in the given spoke account each time and would seemingly "reuse" whatever it last used.

Given this pattern performs these 100+ lambda calls within a few seconds across dozen+ accounts it would ALWAYS hit this same error

Workaround?
I simply used nested stacks from my root/parent stack which was based on a per-account basis and initiated their own AWSCustomeResource. This ensured that even if the AWSCR lambda was reused it still had the required/appropriate IAM roleAssumption in its context and not trying to switch to another target account.

As per AWS Doco, each AWSCustomResource is a singleton lambda only per stack in which it is generated, all other instantiations are indeed simply invocation calls to the lambda function during CFN operations.

LMK what you think

@mattvanstone
Copy link

@julienbonastre In my scenario the target assumeRoleArn is always the same arn, so you would think it would work, but if I trigger an onCreate and then immediately update the stack and trigger an onUpdate or onDelete the lambda is executed again, but the second time it already has the role from the onCreate and then it is passed the assumedRoleArn parameter again and tries to assume it, but the lambad is already assuming it, so it errors if the permissions on that role doesn't allow it to assume itself.

Before I solved my issue I looked at the workaround you mentioned but since I'm trying to run different SDK calls on the same resource it wouldn't work for me.

Seems like two different issues, but with the same root cause.

@theipster
Copy link
Contributor

Ultimately, the way I see the situation is quite simply: the Lambda was designed to be a singleton, therefore it should ideally be made stateless or else there'll always be a risk of prior side effects affecting the output - as seen in all of the scenarios mentioned above.

Unfortunately, in its current form, the Lambda isn't stateless. State exists in the form of AWS.config.credentials which, once set (bear in mind, some invocations may not necessarily attempt to set it at all, and some invocations may attempt to set an existing value to another value), persists throughout the lifetime of that Lambda execution context and thus means it leaks into all subsequent invocations that happen to execute in the same context.

While the workaround using nested stacks might appear to work,

  1. It loses the efficiency gained from using the singleton pattern. In other words, an application would have multiple Lambdas in deployment simultaneously (each incurring their own costs, etc.), all of which are actually just duplicates of each other but simply holding different state at runtime.
  2. It forces re-architecting the application to use nested stacks, which introduces additional complexity that may or may not be warranted.
  3. It still doesn't solve the problem when the same custom resource needs to assume a different role (or indeed, none at all) between its create/update/delete lifecycle hooks (e.g. @mattvanstone's use case), because there's no way to apply the same technique of splitting the hooks of a single custom resource across separate stacks.

To summarise, making the Lambda stateless will solve the problem while keeping true to the original design.

@mergify mergify bot closed this as completed in #15776 Dec 13, 2021
mergify bot pushed a commit that referenced this issue Dec 13, 2021
…leaked to next execution (#15776)

Fixes #15425

Credit to @nicolai-shape for proposing the fix itself.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

TikiTDO pushed a commit to TikiTDO/aws-cdk that referenced this issue Feb 21, 2022
…leaked to next execution (aws#15776)

Fixes aws#15425

Credit to @nicolai-shape for proposing the fix itself.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@Lewenhaupt
Copy link

This is still an issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/custom-resources Related to AWS CDK Custom Resources bug This issue is a bug. effort/medium Medium work item – several days of effort p1
Projects
None yet
7 participants