Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: change default minRunningApps to 1 #201

Merged
merged 2 commits into from
Oct 28, 2024

Conversation

Julusian
Copy link
Member

About Me

This pull request is posted on behalf of the BBC

Type of Contribution

This is a: Bug fix

Current Behavior

When the system is idle, package-manager is restarting the workers on a 10 minute interval.

This is happening because the workers are idle, so trigger the spinDownTime timer. As minRunningApps is 0, the appContainer would approve the request and spin down the worker.

Shortly after, the workforce would update its status and report itself as BAD due to No workers connected to workforce, causing an error to be reported in Sofie.

Then a little later, requestAppTypeForExpectation is called from checkIfNeedToScaleUp which results in a new worker starting and the error being dismissed.

New Behavior

The minRunningApps now defaults to 1, so that the system doesn't get itself into this restart loop.

Testing Instructions

Other Information

Status

  • PR is ready to be reviewed.
  • The functionality has been tested by the author.
  • Relevant unit tests has been added / updated.
  • Relevant documentation (code comments, system documentation) has been added / updated.

@Julusian Julusian requested a review from a team October 23, 2024 14:15
@Julusian Julusian added the Contributions from BBC Contributions sponsored by BBC (bbc.co.uk) label Oct 23, 2024
@Julusian
Copy link
Member Author

I am not 100% sure on whether this fix is the best solution.
Perhaps it should be enforced that the value should never go below 1?
Or maybe hitting 0 workers should be allowed when autoscaling?

@jstarpl
Copy link
Member

jstarpl commented Oct 23, 2024

Or maybe hitting 0 workers should be allowed when autoscaling?

I would argue that changing the algorithm for the workforce error message/status so that if minRunningApps is 0, this doesn't make it go BAD would be preferable? This would mean that if there are no expectations to be fulfilled, it could just scale down to 0, until such time as when there's work to be done (or checked for it's status).

@nytamin
Copy link
Contributor

nytamin commented Oct 24, 2024

I just har a quick talk with Julian, and I think I agree with both Julian and Jan.

  1. We should change the behavior so that workerCount === 0 is not a BAD thing.
  2. We should modify the defaults of PackageManagerSingleApp to be a "recommended config for a typical use case", so something like
    minRunningApps=1
    maxRunningApps=3
    minCriticalWorkerApps=1

@jstarpl jstarpl changed the base branch from master to develop October 28, 2024 13:44
@jstarpl jstarpl merged commit 9ab6bdc into nrkno:develop Oct 28, 2024
6 checks passed
@jstarpl jstarpl deleted the fix/excessive-worker-restarting branch October 28, 2024 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Contributions from BBC Contributions sponsored by BBC (bbc.co.uk)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants