feat: cancel jobs that do not start in time #11
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Currently, if the Kubernetes job created by the controller does not start at all for whatever reason, it will only be deleted when the activeDeadlineSeconds set by the controller for job is reached.
The controller currently uses an active deadline of 24h – that's how long a Semaphore job can run – but it would be useful to handle jobs that do not start at all differently.
Solution
A new
JOB_START_TIMEOUT
parameter is added. If that timeout is reached and the Kubernetes job has not started, the Kubernetes job is deleted.How to know if a Kubernetes job started properly?
It depends on the Kubernetes version: