Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(servicecatalog): Add Product Stack Asset Support #21508

Closed
wants to merge 4 commits into from

Conversation

imanolympic
Copy link

@imanolympic imanolympic commented Aug 8, 2022

Description

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@gitpod-io
Copy link

gitpod-io bot commented Aug 8, 2022

@github-actions github-actions bot added the p2 label Aug 8, 2022
@aws-cdk-automation aws-cdk-automation requested a review from a team August 8, 2022 16:50
Copy link
Contributor

@TheRealAmazonKendra TheRealAmazonKendra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we can provide a meaningful review there's a few things we need from this PR.

Firstly, I would like to see these split out. You can reference each PR in the other one to let us know that they're linked, but these are two issues that need their own review, not in an intertwined way.

Secondly, we need a passing build. The feedback essentially becomes void if it's against code that doesn't work. This is failing on a small issue, but it fails before ever getting to either of the two modules it covers, so we fundamentally don't know if this works.

Lastly, when updating integration tests, please update them to use the IntegTest construct.

@imanolympic
Copy link
Author

imanolympic commented Aug 8, 2022

Great, I will replace the unit test accounts with the dummy account value.

The reason we need an actual account id for the integration tests is that one of our constructs (product-stack-asset-bucket) creates an s3 bucket and the name of this bucket relies on the untokenized account id. Because the account id can't be a token, it is necessary for the account id to be defined in the stack environment. The user can do this in one of two ways:

  1. explicitly - env: { account: 'xxxxxxxxxxxxx', region: 'us-east-1' }
  2. implicitly - env: { account: process.env.CDK_DEFAULT_ACCOUNT, region: process.env.CDK_DEFAULT_REGION }

This being said, I did try referencing the account id implicitly. Even though I set up my local credentials, the integration test stack didn't pick up the local credentials (I am not sure why) and our new construct threw the following error: CDK Account ID must be defined in the application environment. This forced me to explicitly include the account id.

I imagine even if there was a dummy account id for the integration tests it wouldn't matter being that the names of our buckets in the CFN stack template would change depending on the account defined in the environment. Moreover, CDK complained when my local credentials didn't match the account id defined in the environment, forcing me to believe that in order to run integ tests, the local account credentials must match the environment account.

For the reasons above, It seems it may not be feasible to test this new feature in integration tests. What are your thoughts?

@mergify mergify bot dismissed TheRealAmazonKendra’s stale review August 8, 2022 22:07

Pull request has been modified.

@aws aws deleted a comment from imanolympic Aug 8, 2022
@aws aws deleted a comment from aws-cdk-automation Aug 8, 2022
@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: c31fad1
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

@mrgrain mrgrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a complex change and tricky to follow without significant domain knowledge. If given the PR a first pass. But can you please share your design and considerations with us. Thank you.

* If false, the file will remain zipped in the destination bucket.
* @default true
*/
readonly unzipFile?: boolean;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#8065 discusses alternatives for deploying a zipped asset. Specifically #8065 (comment)

Have you considered alternatives and why have you chosen this approach?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a separate PR will be drafted for the s3-deployment changes that clarifies design considerations and alternatives. The feature addressed by this PR relies on these s3 changes and given that we are working against a deadline, we submitted this PR as a draft to get early feedback solely on the changes made to aws-servicecatalog. Hope this helps to clarify.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrgrain Just to clarify, are you asking for design considerations and alternatives for the entire PR or just for the S3 changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say the entire PR. This is a pretty big deviation from how we handle assets in the CDK. It looks like you are essentially creating a shadow assets mechanism that we will have to support going forward.

Comment on lines +237 to +239
for (const accountId of this.sharedAccounts) {
bucket.grantRead(new iam.AccountPrincipal(accountId));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure what the purpose of this method is, but it might be easier/better to use a CompositePrincipal instead for the policy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is for the portfolio to collect a set of S3 Asset buckets and a list of shared accounts. Given that the portfolio does not have an onSynthesize method, we created an aspect that will take each bucket and grant read to all accounts. Essentially, in the scenario where the portfolio contains three asset buckets and two accounts, we would want the portfolio to iterate through each bucket and grant read permissions to the two accounts. If CompositePrincipal can eliminate that second for loop that would be great, will definitely look into it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this look? Hopefully it's more readable.

    for (const bucket of this.assetBuckets) {
      const compositeAccountPrincipals = new iam.CompositePrincipal(...this.sharedAccounts.map(
        (accountId) => new iam.AccountPrincipal(accountId))
      );
      bucket.grantRead(compositeAccountPrincipals);
    }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question here. We have the following scenario: our stack creates multiple porfolios containing one product. Each portfolio is shared with other account but sometimes the products have exaclty the same asset (reusable lambda in our case). Would that mean that for each portfolio there will be a separate bucket created?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question, this was actually the initial design goal - one product stack per portfolio. I considered working with the cdk bootstrap bucket but the idea of giving cross account access to a bucket containing all of the user's assets did not seem to adhere to best security practices.

Ideally, we would like to reduce the share scope as much as possible. Portfolios seemed like a great way to accomplish this until we realized the timeline of events would not allow for us to create an asset bucket in the portfolio and pass it down to the product stack? Why? Well, the product stack needs the s3 bucket containing the asset buckets at time of creation so that the product stack synthesizer is able to pass back the expected location of the asset in its public addFileAsset(_asset: cdk.FileAssetSource): cdk.FileAssetLocation method.

Portfolio constructor
ProductStack constructor
ProductStackSynthesizer::bind()
ProductStackSynthesizer::addFileAsset()
ProductStack::addFileAssetToParentSynthesizer() 
	This is where we need the product stack to know the name of the bucket
CloudFormationProductStackTemplate constructor
CloudFormationProduct constructor
CloudFormationProductStackTemplate::bind()
Portfolio::addProduct()
Portfolio::shareWithAccount()
ProductStackSynthesizer::synthesize()

We could technically make the product stack asset bucket external facing and have the user create the bucket in their application stack. This would require allowing product stack props so that we can take the asset bucket as a prop and proceed from there. This approach was less favored by our team given that it would force users to interact with their product stack differently from the way they already do (if they want to use assets). Here is what this user experience would look like:

export class HelloServerlessStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // Create development bucket
    const devBucket = new ProductStackAssetBucket(this, "ProductStackAssetBucket");

    // Create a portfolio
    const portfolio = new sc.Portfolio(this, 'DevToolsPortfolio', {
      displayName: 'DevTools', 
      providerName: 'IT', 
      assetBucket: devBucket, 
    });

    // Create a Lambda product from a Product Stack
    const product = new sc.CloudFormationProduct(this, 'HelloServerless', {
      productName: "HelloServerless",
      owner: "IT",
      productVersions: [
        {
           cloudFormationTemplate: sc.CloudFormationTemplate.fromProductStack(
            new HelloServerlessProduct(this, 'HelloServerlessProduct', {assetBucket: devBucket})),
          productVersionName: "v1",
          description: "A simple REST API backed by a hello Lambda function"
        },
      ],
    });

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I like this approach. Would it be possible to combine Your's approach with this You introduced here, but the assetBucket property would be optional? This would be a good solution for the use case we are facing with my team.
The other one would simply build over 100 buckets (soft quota).
I imagine we could create this AssetsBucket in a separate stack that would be deployed right before the portfolio stack and this would be in our case not a security concern. Actually our current apprach uses the cdk assets bucket, so I actually understand Your's considerations about security, because we also had this discussions already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With even more trickery, you could also return a Lazy.string() as bucketName in the Synthesizer, and copy in the the real bucket name when the ProductStack is being synthesized in the context of a Portfolio.

(Btw. Lazy.string() will lead to every ProductStack instance only being usable in the context of a single Portfolio. Not sure how desirable or how bad that is, but you might want to add validation to make sure users don't get it wrong. Alternatively, Lazy.uncachedString() might help, but needs a test to confirm that it works if that's what you want to support)


I considered working with the cdk bootstrap bucket but the idea of giving cross account access to a bucket containing all of the user's assets did not seem to adhere to best security practices.

Good call 😉. I like the approach you're taking here.

Portfolios seemed like a great way to accomplish this until we realized the timeline of events would not allow for us to create an asset bucket in the portfolio and pass it down to the product stack

In software, things are rarely as cut-and-dried as "it's not possible". Most things are possible (given an appropriate amount of work). You've identified the ideal customer experience already; now is the time to fight to see if you can achieve it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I like this approach. Would it be possible to combine Your's approach with this You introduced here, but the assetBucket property would be optional? This would be a good solution for the use case we are facing with my team. The other one would simply build over 100 buckets (soft quota). I imagine we could create this AssetsBucket in a separate stack that would be deployed right before the portfolio stack and this would be in our case not a security concern. Actually our current apprach uses the cdk assets bucket, so I actually understand Your's considerations about security, because we also had this discussions already.

Great, thanks for the feedback. We changed our implementation to support the user passing in an optional product stack asset bucket. The user will also have the ability to pass in a name for the assets bucket (otherwise, we'll generate the name for you 🙂).

Comment on lines +60 to +62
public _getAssetBucket(): IBucket | undefined {
return this.assetBucket;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make the property public readonly instead?

Copy link
Author

@imanolympic imanolympic Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make it public readonly we get the following error hint in the _setAssetbucket method:
Cannot assign to 'assetBucket' because it is a read-only property

Making this property public readonly would force us to initialize the property in the constructor, which we are unable to do directly. We are unable to do this because, by design, we want to conditionally define the assetBucket property only in the case where the user has a product stack that uses assets. If the user has a product stack without assets, we would like for the assetBucket property to stay as undefined.

Happy to provide further clarification

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have this property here at all? It only seems to exist here to copy the value from the synthesizer up to the portfolio.

Can we skip the middle man in some way?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the ultimate goal here is to get the asset bucket to the portfolio, so that the portfolio is able to share its buckets with any account the portfolio is shared with. For context on what we can do, take a look at: #21508 (comment)

Under the scenario where the user creates the product stack asset bucket in the application stack, we can skip the middle man by having the user pass in the product stack as a prop to each product stack AND the portfolio. However, would this provide users with an ideal experience? In the case where the user passes in the asset bucket to the portfolio and not the product stacks (or vice versa), how could we detect this?


if (!this.assetBucket) {
const parentStack = (this.stack as ProductStack)._getParentStack();
this.assetBucket = new ProductStackAssetBucket(parentStack, `ProductStackAssetBucket${hashValues(this.stack.stackName)}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hashing this doesn't seem necessary. Why not have the product name here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've picked the scope of asset buckets to be individual ProductStacks. What's the reason for picking that over, say, an asset bucket per Portfolio? Given that the entire portfolio is shared with the same groups of accounts anyway, a bucket per portfolio seems more desirable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've picked the scope of asset buckets to be individual ProductStacks. What's the reason for picking that over, say, an asset bucket per Portfolio? Given that the entire portfolio is shared with the same groups of accounts anyway, a bucket per portfolio seems more desirable?

Check out this comment: #21508 (comment)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hashing this doesn't seem necessary. Why not have the product name here?

Let's take a look at the following scenario:

    const product = new sc.CloudFormationProduct(this, 'HelloServerless', {
      productName: "HelloServerless",
      owner: "IT",
      productVersions: [
        {
          cloudFormationTemplate: sc.CloudFormationTemplate.fromProductStack(new ProductStack1(this, 'ProductStack1'),
          productVersionName: "v1",
          description: "A simple REST API backed by a hello Lambda function"
        },
        {
          cloudFormationTemplate: sc.CloudFormationTemplate.fromProductStack(new ProductStack2(this, 'ProductStack2')
          ),
          productVersionName: "v2",
          description: "A simple REST API backed by a goodbye Lambda function"
        },
      ],
    });

If we hashed by product name, the product stack asset buckets for both ProductStack1 and ProductStack2 would share the same id and thus error out. To avoid this, we can hash by parent stack + product stack which is what this.stack.StackName will give us.

* @internal
*/
public addAsset(asset: cdk.FileAssetSource): cdk.FileAssetLocation {
const assetPath = './cdk.out/' + asset.fileName;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is making too many assumptions.

Instead, I would hand the asset to the underlying stack's StackSynthesizer, and hand the result of that to the BucketDeployment (as a Source.s3()).


/**
* The S3 bucket containing product stack assets.
* @default -
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please? :)

Comment on lines +60 to +62
public _getAssetBucket(): IBucket | undefined {
return this.assetBucket;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have this property here at all? It only seems to exist here to copy the value from the synthesizer up to the portfolio.

Can we skip the middle man in some way?

Copy link
Contributor

@corymhall corymhall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of questions:

  1. Should there be a bucket per region? Like we do with cdk-assets?
  2. What about encryption on buckets?
  3. Should we think about garbage collection of old assets? If we eventually solve this for cdk-assets it seems like it will have to be solved here separately.

if (cdk.Token.isUnresolved(accountId)) {
throw new Error('CDK Account ID must be defined in the application environment');
}
return `product-stack-asset-bucket-${accountId}-${hashValues(id)}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the cloudformation intrinsic ${AWS::AccountId} instead? Do we need to know the bucketname at synth time?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind clarifying your first question? My understanding is that buckets share a global namespace and thus, the concept of regions doesn't necessary apply.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for garbage collection, the great thing about having a deterministic name for the bucket is that the user will be forced to delete the buckets in the case where they destroy and try to re deploy their stacks. This way, users will avoid the scenario of having s3 buckets pile up. This being said, the only garbage collection concern is one where a user destroys their stack (with the intention of never deploying it again) and forgets to delete the asset buckets on S3.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the cloudformation intrinsic ${AWS::AccountId} instead? Do we need to know the bucketname at synth time?

Yes, we are required to know the bucket name at synth time given that product stack synthesizer is required to return an expected location for the asset, and the entire appeal of product stacks is that they altogether skip deployment (they simply synthesize a template that gets uploaded to the CDK bootstrap bucket of the user).

This being said, we tried using ${AWS::AccountId} but the issue with doing so is that it contains a token rather than the explicit account id string. At deploy time, the token is resolved and the bucket is created. The issue comes before that, when the product stack synthesizer tries to return the expected asset location. Instead of returning product-stack-asset-bucket-1234 for the bucket name, it will return product-stack-asset-bucket-${TOKEN[Bucket.Name.1234]}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind clarifying your first question? My understanding is that buckets share a global namespace and thus, the concept of regions doesn't necessary apply.

A global namespace does not mean that it is a global service. If I publish the assets to a bucket in us-east-1, but I consume the assets in a stack deployment in us-east-2 I've created a cross region dependency. What if us-east-1 is experiencing an outage and I need to deploy my failover stack in us-east-2? I won't be able to deploy because I have a dependency on us-east-1.

As for garbage collection...

Maybe I misunderstand how this feature works. I'm more talking about garbage collection for the assets themselves. When I change the lambda code and deploy, does the new asset go to the same bucket? Overtime this might lead to a lot of outdated assets.

@padaszewski
Copy link

Hi @imanolympic
Is there any progress on this one?

@imanolympic
Copy link
Author

Hi @imanolympic Is there any progress on this one?

Yes, we have split this PR into two. The S3 PR will be out this week, and the SC PR will follow promptly. On a good note, we managed to find a way to support both the user passing in a product stack asset bucket and SC creating it in the background.

@aws-cdk-automation
Copy link
Collaborator

This PR has been in the BUILD FAILING state for 3 weeks, and looks abandoned. To keep this PR from being closed, please continue work on it. If not, it will automatically be closed in a week.

@aws-cdk-automation
Copy link
Collaborator

This PR has been deemed to be abandoned, and will be automatically closed. Please create a new PR for these changes if you think this decision has been made in error.

@aws-cdk-automation aws-cdk-automation added the closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. label Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. p2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants