Reduce flakiness of space-specific ASG test #562
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The current "binding a space-specific ASG" test fails approximately 80%
of the time on a vanilla Cloud Foundry deployment. In a nutshell, the
new space-specific ASG has not propagated by the time that the app has
restarted, and the app is beholden to the old ASGs: What should be a
timeout is a "connection refused".
This commit fixes that by inserting a 60-second delay between
setting the new ASG and restarting the app. The 60-second delay is not
arbitrary; rather, it is the result of the meticulous gathering of
empirical data presented in the chart below:
Although 50 seconds should have been enough, we added another ten
seconds for no good reason other than headroom.
Fixes:
Are you submitting this PR against the develop branch?
All PR's to CATs should be submitted to develop and will be merged to main once they've passed acceptance.
What is this change about?
Describe the change and why it's needed.
Please provide contextual information.
Include any links to other PRs, stories, slack discussions, etc... that will help establish context.
What version of cf-deployment have you run this cf-acceptance-test change against?
Please check all that apply for this PR:
Did you update the README as appropriate for this change?
If you are introducing a new acceptance test, what is your rationale for including it CATs rather than your own acceptance test suite?
CATs should validate common operator workflows.
CATs is not a regression test suite.
CATs is run by every component team to validate their releases before promotion.
How many more (or fewer) seconds of runtime will this change introduce to CATs?
It'll add 60 seconds to a single suite.
What is the level of urgency for publishing this change?
Tag your pair, your PM, and/or team!
It's helpful to tag a few other folks on your team or your team alias in case we need to follow up later.