Do not allow Fallback Mode when the CI node was retried to avoid running the wrong set of tests #100

ArturT · 2020-02-29T21:15:19Z

Problem

You run tests in Queue Mode with flag KNAPSACK_PRO_FIXED_TEST_SUITE_SPLIT=true or Regular Mode with the flag KNAPSACK_PRO_FIXED_TEST_SUITE_SPLIT=true (default).

How to reproduce race condition:

You run 1st CI build and one of parallel CI nodes tests failed.
You retry failed CI node. We expect knapsack_pro will run the same set of tests on the CI node during retry as it was recorded for the first CI node run. Let's assume knapsack_pro could not connect to Knapsack Pro API. In such a case, the Fallback Mode is started and a different set of tests are run on the CI node. This can lead to false-positive because different tests can be green and the retried CI node did not run failing tests at all.

Solution

knapsack_pro should not allow running tests in Fallback Mode in the case when CI node was retried.

Some of CI providers expose ENV variable that knapsack_pro can read so it will know that Fallback Mode is not allowed.

This PR adds support for Buildkite CI env var BUILDKITE_RETRY_COUNT that has value 1 when CI node was retried.

How to fix your CI config

If you use Buildkite CI then knapsack_pro won't allow for Fallback Mode when the parallel job was restarted. No action required on your side.
If you use other CI provider that allows retrying failed CI node you should set KNAPSACK_PRO_CI_NODE_RETRY_COUNT=1 when failed CI node is restarted.

If you can't do it then please don't use retrying single failed CI node. Instead, you should retry the whole CI build to be sure that the whole test suite is passing green. In the case of Fallback Mode starting on any parallel CI nodes the knapsack_pro ensures each test file is executed at least once so you can be sure the whole test suite is green (passing).

If you really need to use retry failed CI node then you can disable fallback mode completely KNAPSACK_PRO_FALLBACK_MODE_ENABLED=false. Thanks to that tests won't run when a connection with Knapsack Pro API is lost. Instead, exception will be raised.

…allback mode was disabled with KNAPSACK_PRO_FALLBACK_MODE_ENABLED=false

… started on retried CI node See more in KnapsackPro/knapsack_pro-ruby#100

Add ci_node_retry_count and more env methods

9be7be8

ArturT added bug enhancement labels Feb 29, 2020

ArturT added 5 commits February 29, 2020 22:21

Update CHANGELOG.md

f90da75

Add fallback_mode_enabled and fallback_mode_enabled?

9abdf69

Prevent from running fallback mode when CI node was retried or when f…

083ee29

…allback mode was disabled with KNAPSACK_PRO_FALLBACK_MODE_ENABLED=false

Update queue_allocator_spec.rb

e7a3438

Update allocator_spec.rb

5d43895

ArturT added a commit to KnapsackPro/rails-app-with-knapsack_pro that referenced this pull request Mar 1, 2020

Add bin files to test race condition when fallback mode should not be…

4d5a31b

… started on retried CI node See more in KnapsackPro/knapsack_pro-ruby#100

ArturT added 13 commits March 1, 2020 13:09

Add info about feature to retry only single failed CI node

de432ba

Update README.md

01af920

Update README.md

26da879

Update README.md

470b5f8

Update README.md

f513f1c

Update README.md

f57f5a4

Update README.md

38acf36

Update README.md

9220768

Update README.md

375f9bb

Fix typo

a4a6824

fix typo

4c93b45

Fix link to docs

e713ca0

Add link to docs about retried CI node race condition

563be5f

ArturT merged commit db447c7 into master Mar 1, 2020

ArturT deleted the do-not-run-fallback-mode-when-retry-ci-node branch March 1, 2020 13:01

ArturT mentioned this pull request May 20, 2020

Increase request retry timebox from 4s to 8s and when Fallback Mode is disabled then retry request 6 times #112

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not allow Fallback Mode when the CI node was retried to avoid running the wrong set of tests #100

Do not allow Fallback Mode when the CI node was retried to avoid running the wrong set of tests #100

ArturT commented Feb 29, 2020 •

edited

Loading

Do not allow Fallback Mode when the CI node was retried to avoid running the wrong set of tests #100

Do not allow Fallback Mode when the CI node was retried to avoid running the wrong set of tests #100

Conversation

ArturT commented Feb 29, 2020 • edited Loading

Problem

Solution

How to fix your CI config

ArturT commented Feb 29, 2020 •

edited

Loading