(custom-resources): AwsCustomResource leaks assumed role to other custom resources #15425

nicolai-shape · 2021-07-05T15:37:51Z

The runtime handler of the AwsCustomResource does not correctly reset the credentials after executing when given a assumedRoleArn in any of the AwsSdkCall objects.

This means if you have two AwsCustomResource constructs in the same stack, and the first one that is deployed supplies a assumedRoleArn then the second one will fail to deploy if it executes any commands that are not covered by the policy of the assumed role of the first custom resource.

This obviously only happens if the execution context of the Lambda is reused, which is quite likely if you have a dependency between the custom resources so they're not executed concurrently.

Reproduction Steps

import * as cdk from '@aws-cdk/core'
import * as iam from '@aws-cdk/aws-iam'
import * as customResources from '@aws-cdk/custom-resources'

export class TestStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props)

    const fooRole = new iam.Role(this, 'FooRole', {
      assumedBy: new iam.ServicePrincipal('lambda.amazonaws.com')
    })
    fooRole.addToPolicy(
      new iam.PolicyStatement({
        actions: ['s3:CreateBucket'],
        resources: ['arn:aws:s3:::Foo']
      })
    )

    // We specify an explicit role that is only allowed to create a bucket called 'Foo'
    new customResources.AwsCustomResource(this, 'Foo', {
      onCreate: {
        service: 'S3',
        action: 'createBucket',
        parameters: {
          Bucket: 'Foo'
        },
        assumedRoleArn: fooRole.roleArn,
        physicalResourceId: customResources.PhysicalResourceId.fromResponse('Bucket.Id')
      },
      installLatestAwsSdk: false,
      policy: customResources.AwsCustomResourcePolicy.fromSdkCalls({
        resources: customResources.AwsCustomResourcePolicy.ANY_RESOURCE
      })
    })

    // We do not explicitly assume a role and expect the custom resource policy to grant permission to CreateBucket
    new customResources.AwsCustomResource(this, 'Bar', {
      onCreate: {
        service: 'S3',
        action: 'createBucket',
        parameters: {
          Bucket: 'Bar'
        },
        physicalResourceId: customResources.PhysicalResourceId.fromResponse('Bucket.Id')
      },
      installLatestAwsSdk: false,
      policy: customResources.AwsCustomResourcePolicy.fromSdkCalls({
        resources: customResources.AwsCustomResourcePolicy.ANY_RESOURCE
      })
    })
  }
}

What did you expect to happen?

Deployment of both resources should succeed.

What actually happened?

The deployment of the second custom resource fails due to insufficient permissions of the role that was specified for the first custom resource.

Environment

CDK CLI Version : 1.109.0
Framework Version: 1.109.0
Node.js Version: v15.14.0
OS : MacOS 11.2.3
Language (Version): TypeScript (3.9.7)

Other

The issue is located here https://github.com/aws/aws-cdk/blob/master/packages/@aws-cdk/custom-resources/lib/aws-custom-resource/runtime/index.ts#L134. I propose to replace lines 134-143 with the following:

let credentials: AWS.Credentials | undefined

if (call.assumedRoleArn) {
  const timestamp = new Date().getTime()

  const params = {
    RoleArn: call.assumedRoleArn,
    RoleSessionName: `${physicalResourceId}-${timestamp}`
  }

  credentials = new AWS.ChainableTemporaryCredentials({
    params: params
  })
}

const awsService = new (AWS as any)[call.service]({
  apiVersion: call.apiVersion,
  region: call.region,
  credentials: credentials
})

Instead of modifying the global AWS SDK config, we only apply it to the temporary service client.

This is 🐛 Bug Report

The text was updated successfully, but these errors were encountered:

theipster · 2021-07-22T01:02:21Z

I've been trialling @nicolai-shape's suggestion above for roughly a week now, with great success. 👍

While the fix itself seems pretty straightforward, I guess the complication is how to test it. I've prepared some failing tests to describe the scenario, which then starts passing once the above fix gets applied: theipster#1.

Does this look useful / worthy of submitting a formal PR?

lebetz · 2021-07-23T13:37:57Z

@theipster , for me it looks very useful and this bug is a showstopper for me and also a security issue. So it would be great if you can submit a formal PR.

julienbonastre · 2021-08-26T06:57:59Z

Yeh it's not ideal, but I ended up working around this by using multiple nested stacks just to house the lambda/awscustomresource within because the Lambda is stack-scoped and thus it's environmental context stays consistent for role assumption.

YMMV of course if you use more discrete principal role names in your target roles.

mattvanstone · 2021-10-19T23:33:33Z

I've discovered a related issue when you have one custom resource with multiple SDK calls (onCreate, onUpdate, onDelete).

In the example below onCreate will work successfully but if you update this stack or delete it shortly after creation and the lambda is reused you will get an error Received response status [FAILED] from custom resource. Message returned: Missing credentials in config, if using AWS_CONFIG_FILE, set AWS_SDK_LOAD_CONFIG=1

This module creates a DNS record in a hosted zone in another account, using a cross account role.

import { Construct } from "@aws-cdk/core";
import { CloudFrontTarget } from "@aws-cdk/aws-route53-targets"
import { AwsCustomResource, AwsCustomResourcePolicy, PhysicalResourceId } from "@aws-cdk/custom-resources";

interface Route53CrossAccountRecordProps {
  /** The Route53 domain name to create */
  Name: string,
  /** CloudFront URL to create an alias to */
  DNSName: string,
  /** The arn of the role that has access to update Route53 */
  AssumeRoleArn: string,
  /** The Route53 hosted zone ID to create the record in */
  HostedZoneId: string,
}

export class Route53CrossAccountRecord extends Construct {
  constructor(scope: Construct, id: string, props: Route53CrossAccountRecordProps) {
    super(scope, id);
    
    const ResourceRecordSet = {
      Name: props.Name,
      Type: "A",
      AliasTarget: {
        DNSName: props.DNSName,
        EvaluateTargetHealth: false,
        HostedZoneId: CloudFrontTarget.CLOUDFRONT_ZONE_ID,
      },
    };

    new AwsCustomResource(this, "dnsRecord", {
      policy: AwsCustomResourcePolicy.fromSdkCalls({ resources: AwsCustomResourcePolicy.ANY_RESOURCE }),
      onCreate: {
        assumedRoleArn: props.AssumeRoleArn,
        service: "Route53",
        action: "changeResourceRecordSets",
        parameters: { 
          HostedZoneId: props.HostedZoneId,
          ChangeBatch: {
            Changes: [
              {
                Action: "UPSERT",
                ResourceRecordSet,
              },
            ],
          }, 
        },
        physicalResourceId: PhysicalResourceId.of(ResourceRecordSet.Name),
      },
      onUpdate: {
        assumedRoleArn: props.AssumeRoleArn,
        service: "Route53",
        action: "changeResourceRecordSets",
        parameters: { 
          HostedZoneId: props.HostedZoneId,
          ChangeBatch: {
            Changes: [
              {
                Action: "UPSERT",
                ResourceRecordSet,
              },
            ],
          }, 
        },
        physicalResourceId: PhysicalResourceId.of(ResourceRecordSet.Name),
      },
      onDelete: {
        assumedRoleArn: props.AssumeRoleArn,
        service: "Route53",
        action: "changeResourceRecordSets",
        parameters: { 
          HostedZoneId: props.HostedZoneId,
          ChangeBatch: {
            Changes: [
              {
                Action: "DELETE",
                ResourceRecordSet,
              },
            ],
          }, 
        },
        physicalResourceId: PhysicalResourceId.of(ResourceRecordSet.Name),
      },
    });
  }
}

In the logs I can see this is because the role is already being used by the lambda, and then it tries to assume itself:

"errorMessage": "User: arn:aws:sts::[REDACTED]:assumed-role/[REDACTED] is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::[REDACTED]:role/[REDACTED]",

if you remove assumedRoleArn: props.AssumeRoleArn from the onUpdate and onDelete SDK calls it will work, as long as the lambda is reused. If the lambda has idled out and a new one is invoked then onUpdate and onDelete will fail because the default role does not have access to the hosted zone.

Workaround

I have been able to work around this by granting my role permission to assume itself.

julienbonastre · 2021-10-20T00:39:48Z

Hey @mattvanstone . So just to confirm, is the target assumeRoleArn the same each time or does it switch between different account ID's?

I have a pattern which uses AWS Custom Resources in this fashion too for manipulating remote/spoke account Route53ResolverRule associations via SDK API, however what I found is that this was due to the singleton lambda trying to reissue the API call with the role it had assumed and cached for a different account in its context.

As my pattern issues hundreds of calls to the AWSCustomResource within seconds it appeared as though the Lambda wouldn't re-claim a new STS role assumption to the provided target ARN role in the given spoke account each time and would seemingly "reuse" whatever it last used.

Given this pattern performs these 100+ lambda calls within a few seconds across dozen+ accounts it would ALWAYS hit this same error

Workaround?
I simply used nested stacks from my root/parent stack which was based on a per-account basis and initiated their own AWSCustomeResource. This ensured that even if the AWSCR lambda was reused it still had the required/appropriate IAM roleAssumption in its context and not trying to switch to another target account.

As per AWS Doco, each AWSCustomResource is a singleton lambda only per stack in which it is generated, all other instantiations are indeed simply invocation calls to the lambda function during CFN operations.

LMK what you think

mattvanstone · 2021-10-20T14:32:56Z

@julienbonastre In my scenario the target assumeRoleArn is always the same arn, so you would think it would work, but if I trigger an onCreate and then immediately update the stack and trigger an onUpdate or onDelete the lambda is executed again, but the second time it already has the role from the onCreate and then it is passed the assumedRoleArn parameter again and tries to assume it, but the lambad is already assuming it, so it errors if the permissions on that role doesn't allow it to assume itself.

Before I solved my issue I looked at the workaround you mentioned but since I'm trying to run different SDK calls on the same resource it wouldn't work for me.

Seems like two different issues, but with the same root cause.

theipster · 2021-10-23T18:34:25Z

Ultimately, the way I see the situation is quite simply: the Lambda was designed to be a singleton, therefore it should ideally be made stateless or else there'll always be a risk of prior side effects affecting the output - as seen in all of the scenarios mentioned above.

Unfortunately, in its current form, the Lambda isn't stateless. State exists in the form of AWS.config.credentials which, once set (bear in mind, some invocations may not necessarily attempt to set it at all, and some invocations may attempt to set an existing value to another value), persists throughout the lifetime of that Lambda execution context and thus means it leaks into all subsequent invocations that happen to execute in the same context.

While the workaround using nested stacks might appear to work,

It loses the efficiency gained from using the singleton pattern. In other words, an application would have multiple Lambdas in deployment simultaneously (each incurring their own costs, etc.), all of which are actually just duplicates of each other but simply holding different state at runtime.
It forces re-architecting the application to use nested stacks, which introduces additional complexity that may or may not be warranted.
It still doesn't solve the problem when the same custom resource needs to assume a different role (or indeed, none at all) between its create/update/delete lifecycle hooks (e.g. @mattvanstone's use case), because there's no way to apply the same technique of splitting the hooks of a single custom resource across separate stacks.

To summarise, making the Lambda stateless will solve the problem while keeping true to the original design.

@nicolai-shape

…leaked to next execution (#15776) Fixes #15425 Credit to @nicolai-shape for proposing the fix itself. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*

github-actions · 2021-12-13T20:09:19Z

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@nicolai-shape

…leaked to next execution (aws#15776) Fixes aws#15425 Credit to @nicolai-shape for proposing the fix itself. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*

Lewenhaupt · 2023-10-11T15:48:42Z

This is still an issue

nicolai-shape added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jul 5, 2021

github-actions bot added the @aws-cdk/custom-resources Related to AWS CDK Custom Resources label Jul 5, 2021

github-actions bot assigned rix0rrr Jul 5, 2021

rix0rrr added effort/medium Medium work item – several days of effort p1 and removed needs-triage This issue or PR still needs to be triaged. labels Jul 7, 2021

rix0rrr removed their assignment Jul 7, 2021

theipster mentioned this issue Jul 27, 2021

fix(custom-resources): assumedRole from AwsCustomResource invocation leaked to next execution #15776

Merged

nicolai-shape mentioned this issue Nov 9, 2021

(custom-resources): Permit caller to override the AwsCustomResource Singleton UUID #17387

Closed

2 tasks

mergify bot closed this as completed in #15776 Dec 13, 2021

Lewenhaupt mentioned this issue Oct 11, 2023

cr.AwsCustomResource: Missing credentials in config, if using AWS_CONFIG_FILE, set AWS_SDK_LOAD_CONFIG=1 #23736

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(custom-resources): AwsCustomResource leaks assumed role to other custom resources #15425

(custom-resources): AwsCustomResource leaks assumed role to other custom resources #15425

nicolai-shape commented Jul 5, 2021 •

edited

Loading

theipster commented Jul 22, 2021

lebetz commented Jul 23, 2021

julienbonastre commented Aug 26, 2021

mattvanstone commented Oct 19, 2021

julienbonastre commented Oct 20, 2021

mattvanstone commented Oct 20, 2021

theipster commented Oct 23, 2021

github-actions bot commented Dec 13, 2021

Lewenhaupt commented Oct 11, 2023

(custom-resources): AwsCustomResource leaks assumed role to other custom resources #15425

(custom-resources): AwsCustomResource leaks assumed role to other custom resources #15425

Comments

nicolai-shape commented Jul 5, 2021 • edited Loading

Reproduction Steps

What did you expect to happen?

What actually happened?

Environment

Other

theipster commented Jul 22, 2021

lebetz commented Jul 23, 2021

julienbonastre commented Aug 26, 2021

mattvanstone commented Oct 19, 2021

Workaround

julienbonastre commented Oct 20, 2021

mattvanstone commented Oct 20, 2021

theipster commented Oct 23, 2021

github-actions bot commented Dec 13, 2021

⚠️COMMENT VISIBILITY WARNING⚠️

Lewenhaupt commented Oct 11, 2023

nicolai-shape commented Jul 5, 2021 •

edited

Loading