Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce flakiness of space-specific ASG test #562

Merged
merged 1 commit into from
Jun 2, 2022
Merged

Reduce flakiness of space-specific ASG test #562

merged 1 commit into from
Jun 2, 2022

Conversation

cunnie
Copy link
Member

@cunnie cunnie commented Jun 2, 2022

The current "binding a space-specific ASG" test fails approximately 80%
of the time on a vanilla Cloud Foundry deployment. In a nutshell, the
new space-specific ASG has not propagated by the time that the app has
restarted, and the app is beholden to the old ASGs: What should be a
timeout is a "connection refused".

This commit fixes that by inserting a 60-second delay between
setting the new ASG and restarting the app. The 60-second delay is not
arbitrary; rather, it is the result of the meticulous gathering of
empirical data presented in the chart below:

Delay (seconds) # of tests Success Rate (%) Success Rate Histogram
0 60 20 *****
5 95 28 *******
10 75 41 **********
15 70 38 *********
20 95 46 ***********
25 75 68 *****************
30 120 63 ***************
35 60 76 *******************
40 35 85 *********************
45 20 85 *********************
50 30 100 *************************
55 35 100 *************************
60 20 100 *************************
65 30 100 *************************
70 40 100 *************************

Although 50 seconds should have been enough, we added another ten
seconds for no good reason other than headroom.

Fixes:

[Fail] [tasks] v3 tasks when associating a task with an app and binding a space-specific ASG [It] applies the associated app's ASGs to the task
/Users/cunnie/workspace/cf-acceptance-tests/tasks/task.go:355
2022-05-30T10:26:44.86-0700 [APP/TASK/woof/0] ERR 0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to 10.0.244.255 port 80: Connection refused

Are you submitting this PR against the develop branch?

All PR's to CATs should be submitted to develop and will be merged to main once they've passed acceptance.

What is this change about?

Describe the change and why it's needed.

Please provide contextual information.

Include any links to other PRs, stories, slack discussions, etc... that will help establish context.

What version of cf-deployment have you run this cf-acceptance-test change against?

Please check all that apply for this PR:

  • introduces a new test --- Are you sure everyone should be running this test?
  • changes an existing test
  • requires an update to a CATs integration-config

Did you update the README as appropriate for this change?

  • YES
  • N/A

If you are introducing a new acceptance test, what is your rationale for including it CATs rather than your own acceptance test suite?

CATs should validate common operator workflows.
CATs is not a regression test suite.
CATs is run by every component team to validate their releases before promotion.

How many more (or fewer) seconds of runtime will this change introduce to CATs?

It'll add 60 seconds to a single suite.

What is the level of urgency for publishing this change?

  • Urgent - unblocks current or future work
  • Slightly Less than Urgent

Tag your pair, your PM, and/or team!

It's helpful to tag a few other folks on your team or your team alias in case we need to follow up later.

The current "binding a space-specific ASG" test fails approximately 80%
of the time on a vanilla Cloud Foundry deployment. In a nutshell, the
new space-specific ASG has not propagated by the time that the app has
restarted, and the app is beholden to the old ASGs: What should be a
timeout is a "connection refused".

This commit fixes that by inserting a 60-second delay between
setting the new ASG and restarting the app. The 60-second delay is not
arbitrary; rather, it is the result of the meticulous gathering of
empirical data presented in the chart below:

|Delay (seconds)|# of tests|Success Rate (%)|Success Rate Histogram     |
|--------------:|---------:|---------------:|:--------------------------|
|             0 |       60 |             20 | *****                     |
|             5 |       95 |             28 | *******                   |
|            10 |       75 |             41 | **********                |
|            15 |       70 |             38 | *********                 |
|            20 |       95 |             46 | ***********               |
|            25 |       75 |             68 | *****************         |
|            30 |      120 |             63 | ***************           |
|            35 |       60 |             76 | *******************       |
|            40 |       35 |             85 | *********************     |
|            45 |       20 |             85 | *********************     |
|            50 |       30 |            100 | ************************* |
|            55 |       35 |            100 | ************************* |
|            60 |       20 |            100 | ************************* |
|            65 |       30 |            100 | ************************* |
|            70 |       40 |            100 | ************************* |

Although 50 seconds should have been enough, we added another ten
seconds for no good reason other than headroom.

Fixes:
```
[Fail] [tasks] v3 tasks when associating a task with an app and binding a space-specific ASG [It] applies the associated app's ASGs to the task
/Users/cunnie/workspace/cf-acceptance-tests/tasks/task.go:355
```
```
2022-05-30T10:26:44.86-0700 [APP/TASK/woof/0] ERR 0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to 10.0.244.255 port 80: Connection refused
```
Copy link
Member

@ctlong ctlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ctlong ctlong merged commit 4ee6f9f into cloudfoundry:develop Jun 2, 2022
@cunnie cunnie deleted the space-specific-ASGs branch June 2, 2022 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants