fix(Queue Mode): handle OS signals and RSpec internal wants_to_quit
and rspec_is_quitting
states to stop consuming tests from the Queue API when the CI node is terminated
#207
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
problem
When you use AWS Spot Instances, and your CI node is terminated, then it could happen that the RSpec process is somehow stopped and it does not execute tests, but the knapsack_pro gem keeps running and trying to fetch more tests from Queue API and assigns it to RSpec but RSpec does not execute it. Knapsack Pro thinks that you executed tests, and it asks for more tests from Queue API. This leads to allocating too many tests to the terminated CI node.
When the CI node is retried, it will rerun the tests that were assigned in the first place to the terminated CI node index. The retried CI node would get too many tests which lead to very slow tests execution on the CI node index.
solution
Let's handle the OS signals like
TERM
signal and stop consuming more tests from Queue API when it happens.handle RSpec internal
wants_to_quit
andrspec_is_quitting
states to stop consuming tests from the Queue API when the CI node is terminatedstory
https://trello.com/c/1UJnbWQG
how to reproduce the problem and fix on the local project
Run tests with bin script
bin/knapsack_pro_queue_rspec_record_first_run
.You can add a slow test example to one of the spec files like
spec/rake_tasks/dummy_rake_spec.rb
so that your tests would run slow enough and you will be able to perform all the following steps.Now when you run
bin/knapsack_pro_queue_rspec_record_first_run
please run also in another terminal:You will see a PID
29939
of the process, look for something like, please noterspec_go
in the process name.Then run:
After you do it, we expect to see the batch of tests fetched from Queue API to be completed and then you see an exception
Knapsack Pro process was terminated!
.Please note in the Knapsack Pro API logs that there is no more requests to Queue API to fetch more tests because after the batch of tests completed we raised the exception and stopped consuming tests from Queue API. Knapsack Pro can handle the
TERM
signal gracefully.