-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Retry Mechanism to E2E EC2 Terraform Deployment #635
Add Retry Mechanism to E2E EC2 Terraform Deployment #635
Conversation
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #635 +/- ##
=============================================
- Coverage 85.71% 50.73% -34.99%
- Complexity 19 264 +245
=============================================
Files 3 39 +36
Lines 49 1301 +1252
Branches 5 141 +136
=============================================
+ Hits 42 660 +618
- Misses 3 609 +606
- Partials 4 32 +28 ☔ View full report in Codecov by Sentry. |
exit 1 | ||
fi | ||
echo "Attempt $retry_counter" | ||
success=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Success should be made 0 after the terraform apply
command has completed. That's is the assumption in the following code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Success of 1 indicates that the setting up App Signals on the sample app failed, while 0 indicates that everything ran successfully.
The logic is:
1. Set Success to 1 (Set initial value to 1 so that the while loop runs)
2. While Success is 1 (Indicates that terraform deployment/endpoint connection failed and will try again):
2a: Set Success to 0 (Set the value to 0 and if there were any failures change it to 1)
2b: Run Terraform apply (If the deployment failed, then success will change to 1)
2c: If Success is still 0, then install App Signals and check endpoint connection
2d: If endpoint connection failed, change success to 1
2e: If Success is 1 at this point, then either the deployment or connection failed and run the while loop again. If it is still 0, then the code ran successfully and exit the while loop
If the success is made 0 after the terraform apply, then it will override whether terraform deployment succeeded or not. If after the terraform deployment the success is 1, we want to skip the endpoint connection step and redeploy the terraform again.
Is there a sample run where we can see this change in action? |
* E2E Test: Ensure the use of IMDSv2 in EC2 instances (#621) * Add e2e canary to public preview regions (#623) * Fix trace validation error follow up fix (#626) * Fix Terrform Destroy Error on EKS Canary (#628) * fix-e2e-eks-terraform-destroy-error * Add region as parameter to terraform destroy * Bump nebula.release from 17.2.2 to 18.0.6 (#631) Bumps nebula.release from 17.2.2 to 18.0.6. --- updated-dependencies: - dependency-name: nebula.release dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump actions/setup-java from 3 to 4 (#629) Bumps [actions/setup-java](https://github.com/actions/setup-java) from 3 to 4. - [Release notes](https://github.com/actions/setup-java/releases) - [Commits](actions/setup-java@v3...v4) --- updated-dependencies: - dependency-name: actions/setup-java dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump hashicorp/setup-terraform from 2 to 3 (#586) Bumps [hashicorp/setup-terraform](https://github.com/hashicorp/setup-terraform) from 2 to 3. - [Release notes](https://github.com/hashicorp/setup-terraform/releases) - [Changelog](https://github.com/hashicorp/setup-terraform/blob/main/CHANGELOG.md) - [Commits](hashicorp/setup-terraform@v2...v3) --- updated-dependencies: - dependency-name: hashicorp/setup-terraform dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump rust from 1.73 to 1.74 (#611) Bumps rust from 1.73 to 1.74. --- updated-dependencies: - dependency-name: rust dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump actions/setup-node from 3 to 4 (#574) Bumps [actions/setup-node](https://github.com/actions/setup-node) from 3 to 4. - [Release notes](https://github.com/actions/setup-node/releases) - [Commits](actions/setup-node@v3...v4) --- updated-dependencies: - dependency-name: actions/setup-node dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump tempfile from 3.8.0 to 3.8.1 in /tools/cp-utility (#585) Bumps [tempfile](https://github.com/Stebalien/tempfile) from 3.8.0 to 3.8.1. - [Changelog](https://github.com/Stebalien/tempfile/blob/master/CHANGELOG.md) - [Commits](https://github.com/Stebalien/tempfile/commits) --- updated-dependencies: - dependency-name: tempfile dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Provide aws-region for the e2e test in worklow (#643) * Provide aws-region for the e2e test in worklow * Update region to us-east-1 and add concurrency * Revert "Provide aws-region for the e2e test in worklow (#643)" (#645) This reverts commit 44b5b68. * E2E Testing: Add concurrency tag to test in main build and nightly build (#646) * Use aws-region in the workflow (#649) * Add Retry Mechanism to E2E EKS Terraform Deployment (#634) * Add Retry Mechanism to E2E EKS Terraform Deployment * Add Extra Comments * Call Test APIs First before Validation * Add clean-app-signals to retry logic * Change App Signal Download Directory and modify if statement for validation * Modify while loop and refactor code * Dynamic input RPM link by region setting (#647) * Dynamic input RPM link by region setting * Remove unneeded env variable * Fix an issue in echo shell command * Revert previous wrong 'fix' regarding variable call * Add Retry Mechanism to E2E EC2 Terraform Deployment (#635) * Add Retry Mechanism to E2E EC2 Terraform Deployment * Add Extra Comments * Refactor code * Change App Signals Directory (#650) * change dep config to compileOnly to fix high cardinality metrics (#651) * E2E Testing: Fix EKS test candidate image override (#652) This change checks if there is an adot image passed to the workflow and patches the App Signals deployment to update the image and restarts the cloudwatch pods. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Mahad Janjua <134644284+majanjua-amzn@users.noreply.github.com> Co-authored-by: Harry <harryryu@amazon.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Vasi Vasireddy <41936996+vasireddy99@users.noreply.github.com> Co-authored-by: XinRan Zhang <xinranzh@amazon.com> Co-authored-by: Mengyi Zhou (bjrara) <zmengyi@amazon.com>
Issue #, if available:
The EC2 Canary occasionally fails due to transitivity issues. Some of the recurring errors are
Max attempts reached
in the Step :Wait for Endpoint to Come Online
and the Step:Timeout while waiting for state to become running
. This occurs due to the endpoint and the ec2 instances sometime taking longer than expected to become ready.Description of changes:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.