-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(custom-resources): empty onEvent handler zip's being created, failing deploys #27342
Comments
Update...As I worked through testing different versions of the CDK libraries, I started setting I checked our devDeps: [
// Integ-Tests
'@aws-cdk/integ-tests-alpha@^2.94.0-alpha.0',
'@aws-cdk/integ-runner@^2.94.0-alpha.0',
], Seems innocuous - right? Just make sure that we have the integration tools at least at 2.94.0+? Well this does an interesting thing to our
You'll see that we have I'm beginning some tests where I force the aws-cdk/integ.* packages to be updated.. but I am wondering, how do people handle this? What is the right way to make sure that all of the |
Thanks for posting an update @diranged, I'm not entirely sure what you mean by the following:
I don't typically use projen, so it might be a bit tricky for me to try to reproduce this without more specific steps |
@peterwoodworth Thanks for responding, let me try to explain better. We make heavy use of Projen... and while we use a custom construct with a bunch of defaults, a common CDK project for us will look something lke this: const project awscdk.AwsCdkTypeScriptApp{
authorEmail: 'matt@....com',
authorName: 'Matt Wise',
authorUrl: 'https://github.com/ourorg/ourapp',
cdkVersion: '2.99.0',
defaultReleaseBranch: 'main',
name: 'infra-thingy',
projenrcTs: true,
deps: [
// src/constructs/aws_vpc
'@aws-cdk/aws-lambda-python-alpha',
],
devDeps: [
// Integ-Tests
'@aws-cdk/integ-tests-alpha',
'@aws-cdk/integ-runner',
],
gitignore: [
// CDK Temp Data
'cdk.out',
'cdk.staging',
// VIM
'*.swp',
'*.swo',
// Integ-Tests write some files that do not need to be committed:
'cdk.context.json',
'read*lock',
'test/integ/**/integ.*.snapshot/asset.*',
],
githubOptions: {
// https://projen.io/api/API.html#projen-github-pullrequestlint
pullRequestLintOptions: {
semanticTitleOptions: {
requireScope: true,
},
},
},
}); What I realized in this setup is that the What I see though is that the Our dependabot looks like this: version: 2
registries:
github:
type: npm-registry
url: https://npm.pkg.github.com
token: ${{ secrets.PROJEN_GITHUB_TOKEN }}
updates:
- package-ecosystem: npm
versioning-strategy: lockfile-only
directory: /
schedule:
interval: weekly
registries:
- github
groups:
awslibs:
patterns:
- "@aws*"
- aws*
node:
patterns:
- typescript*
- "@types*"
- eslint*
projen:
patterns:
- projen*
- jest
open-pull-requests-limit: 50 We recently had a PR update come through like this: Notice how In the short term, I am tweaking our project.addDevDeps(`@aws-cdk/aws-lambda-python-alpha@${project.cdkVersion}-alpha.0`);
project.addDevDeps(`@aws-cdk/integ-tests-alpha@${project.cdkVersion}-alpha.0`);
project.addDevDeps(`@aws-cdk/integ-runner@${project.cdkVersion}-alpha.0`); However, this won't automatically catch updates during Dependabot PRs .. instead, it will only force the update when we change |
@peterwoodworth,
I am going to start working back through the release versions to see which one fixes it... |
I've updated the main PR description ... the failure seems to begin at 2.94.0 and run through 2.99.0. If we roll back to 2.93.0, it's fine. |
Interestingly enough, this issue also is describing issues with testing when migrating to |
Thanks for pointing that out ... so we also ran into some really strange issues that I thought were related to our own code (see https://cdk-dev.slack.com/archives/C017RG58LM8/p1695824193947209), but I now suspect are going to be fixed by rolling back to 2.93.0. I've got some other work I have to complete first, but I'll try to post back results here. It definitely seems like something wonky happened in 2.94.0 |
@diranged, is there any way we could have a look at a repro of this? |
This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled. |
keep-alive please.. |
@diranged is there any way you can provide a repro? |
So if I open an AWS Support TIcket - I can upload the entire repo there along with instructions.. I haven't had time to be able to craft a smaller example unfortunately. |
ok, in that case we can put this ticket on hold until we're able to get a reproduction |
@peterwoodworth,
Additionally, we have a Lifecycle Policy on our CDK S3 buckets that purges data - it was set to My theory right now is that the integration tests rely on the S3 data existing for the "previous asset versions" - so that it can bring the test stacks up into Our expectation was that the code would just regenerate any assets it was missing from S3.. but perhaps we're wrong? |
@peterwoodworth Hey - this is still an issue.. any thoughts on my comment/question? We noticed a second situation where this occurs as well. Let's say that we have a change to the code that we know will pass ... changing Kubernetes manifest string from
It really feels like the answer here is that the |
@diranged This assumption is unfortunately not correct, like your second post also describes. By default,
What it does offer today is Either way, I think this needs to be documented better. |
@diranged Does including all files solve your problem? Also, given that we have this in the README:
Could you elaborate what lead you to the assumption you can ignore some files? Any suggestions how we can improve the readme? |
@mrgrain, Improving the UX Right off the bat .. I think that if the Snapshots in Git I'm curious how you've seen this used in the past... when it comes to building small dedicated CDK libraries that do a specific thing, I can imagine that storing the full snapshots isn't really a big deal .. but in our case we're launching integration tests to spin up real Kubernetes clusters along with a dozen different Lambda functions. These functions and the assets get pretty big:
Do you realistically see people storing these in Git - and updating them? Virtually every AWS CDK release makes changes to the Lambda handler code in some way, which causes new hash's to be generated, causing new functions to be built and new assets to be created. I'm not complaining ... just trying to figure out what the realistic pattern is here. Our NodeJS functions aren't too big - but we have a couple of Python functions that get big. For example:
In this particular case, General thoughts on the Integ Runner I think its an amazing tool .. I wish it got more love. I've opened a bunch of issues on it over the last year (#27437, #22804, #22329, #27445, #28549).. they all kind of fall into the theme of better-documentation, better examples, and improved errors/warnings that help developers actually understand the root cause of failures. |
@mrgrain, {
"verbose": true,
"directory": "test/integ",
"update-on-failed": true,
"parallel-regions": ["us-west-2"],
// https://github.com/aws/aws-cdk/issues/27342#issuecomment-1793592083"
"disable-update-workflow": true
} We assumed if there was any problem parsing the file - it would tell us ... and there were no warnings, so we thought we were all good. Turns out that this code silently catches the errors and then you don't get any of the behaviors you are expecting aws-cdk/packages/@aws-cdk/integ-runner/lib/cli.ts Lines 295 to 299 in 5a68f59
This bit us for a long time because we kept having intermittent failures, but we thought we had set |
@diranged Re the config file: Yeah that's bad and should be fixed. Would you mind opening a separate issue for this one? Obviously PRs are also welcome. I'll respond later to your other comment. |
Describe the bug
We recently started to see our integration tests failing, even though deploys were succeeding. The failures on the integration tests look like this:
When we then look in our S3 bucket, we find a series of
22 byte
sized zip files. These three images are from three separate build attempts, all with fresh emptycdk.out
directories, and all after we had wiped out the S3 cache files:When we dug into it, it seems that these files are all related to the
onEvent
handlers for thecustom-resource
constructs. Going back in time a bit, it looks like these hash values show up at or around a9ed64f#diff-8bf3c7acb1f51f01631ea642163612a520b448b843d7514dc31ccc6f140c0753..Attempts to fix
Roll back to
2.90.0
- successWe tried to roll back to
2.87.0
- but our codebase would have required too many changes for that, so we were able to roll back to2.90.0
though which is interestingly before several of the handlers were updated from Node16 to Node18.When we rolled back to 2.90.0, the integration tests work fine.
Roll forward to
2.91.0
- successSame as 2.90.0 - the tests work fine.
Roll forward to
2.92.0
- partial successIn https://github.com/aws/aws-cdk/releases/tag/v2.92.0, the custom-resources handler is bumped to use Node18 instead of Node16. That change creates the new asset hash
a3f66c60067b06b5d9d00094e9e817ee39dd7cb5c315c8c254f5f3c571959ce5
. This code mostly worked - however #26771 prevented us from fully testing the CDK construct for EKS.Roll forward to
2.93.0
- successIn
2.93.0
, we see the asset hash change from3f579d6c1ab146cac713730c96809dd4a9c5d9750440fb835ab20fd6925e528c.zip -> 9202bb21d52e07810fc1da0f6acf2dcb75a40a43a9a2efbcfc9ae39535c6260c.zip
. It seems that this release works just fine - though the tests are ongoing right now.Roll forward to
2.94.0
- failureIt seems that the failure starts as soon as we hit the 2.94.0 release.
Rolling back to '2.93.9' - success
Rolling back to 2.93.0 after the 2.94.0 failure immediately works... builds and integration tests pass again.
Expected Behavior
A few things here..
Current Behavior
As far as we can tell, once the corrupt file is created - there are some situations where it is uploaded to S3 (and thus poisoning the cache), and other situations where the upload fails to begin with.
Reproduction Steps
Working on this ... don't yet know exactly how to reproduce this
Possible Solution
No response
Additional Information/Context
No response
CDK CLI Version
2.95.0+
Framework Version
No response
Node.js Version
18
OS
Linux and OSX
Language
Typescript
Language Version
No response
Other information
No response
The text was updated successfully, but these errors were encountered: