Backport of core: allow pausing and un-pausing of leader broker routine into release/1.3.x #13614
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport
This PR is auto-generated from #13045 to be assessed for backporting due to the inclusion of the label backport/1.3.x.
The below text is copied from the body of the original PR.
Test failure: percy-ui and unrelated
Related to #11638
Commits: split to make review easier
This changes allows operators to control whether or not the eval broker is running on the leader server. This is useful in outage situations where administrators wish to stop any work being available to the scheduler workers. It is also requisite work for future work which will allow deletion of evaluations which have not been processed.
Alongside pausing the eval broker, the blocked evals process is also disabled at the same time. This sub-process pushes evaluations into the eval broker and is also restored from the state store using the same process as the eval broker. It therefore makes sense, considering the end goal, to disable both processes for state consistency.
To ensure operators pausing/un-pausing the broker doesn't conflict with leadership transitions a mutex is used to control access to the eval broker and blocked evals processes. This is used along with a leadership check when changing the broker status which requires taking into account operator configuration.
The issue mentions disabling the
reapFailedEvaluations
process, however, I don't believe this is required for this change as disabling the eval broker also flushes all stored evaluation data. It therefore seemed safer to leave this alone and not require additional coordination.The new
operator scheduler
commands allow inspecting and modifying the scheduler config. This is useful as it doesn't require you to supply the full payload object or remember the exact curl command to run.