Skip to content

Commit

Permalink
PR review Fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
mshitrit committed Mar 8, 2021
1 parent cc01360 commit 314eafd
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion enhancements/machine-api/short-circuiting-backoff.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,9 @@ create a new one. This isn't the best remediation strategy in all environments.

Any Machine that enters the `Failed` state is remediated immediately, without waiting, by the MHC
When this occurs, if the error which caused the failure is persistent (spot price too low, configuration error), replacement Machines will also be `Failed`
As replacement machines start and fail, MHC causes a hot loop of Machine being deleted and recreated
As replacement machines start and fail, MHC causes a hot loop of Machine being deleted and recreated.
Hot loop makes it difficult for users to find out why their Machines are failing.
Another side effect of machines constantly failing, is the risk of hitting the benchmark of machine failures percentage - thus triggering the "short-circuit" mechanism which will prevent all remediations.

With this enhancement we propose a better mechanism.
In case a machine enters the `Failed` state and does not have a NodeRef or a ProviderID it'll be remediated after a certain time period has passed - thus allowing a manual intervention in order to break to hot loop.
Expand Down

0 comments on commit 314eafd

Please sign in to comment.