Skip to content

Conversation

@ewoutp
Copy link
Contributor

@ewoutp ewoutp commented Mar 29, 2018

This PR adds a new high level layer of checks to improve resilience.
It introduced a phase "Failed" for a member. When a member has reached that phase, there is no hope of recovery and it will be removed. The existing reconciliation rules ensure that a new member will be added.

The resilience check goes over all members and checks for signs that the member is dead beyond hope of recovery. If so, it checks if the member is allowed to be replaced. If that is all the case, the member phase is set to failed. The reconciler will create a plan to remove it.

@ewoutp ewoutp added the 9 WIP label Mar 29, 2018
// The function returns nil when all agents are healthy or an error when something is wrong.
func AreAgentsHealthy(ctx context.Context, clients []Agency) error {
wg := sync.WaitGroup{}
invalidKey := []string{"does-not-exists-149e97e8-4b81-5664-a8a8-9ba93881d64c"}
Copy link
Contributor

@kvahed kvahed Apr 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for purpose of beauty: drop the s in exists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@kvahed kvahed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine apart from one tiny remark and a talky call. :)

@ewoutp ewoutp merged commit 1173d8e into master Apr 3, 2018
@ewoutp ewoutp deleted the check-member-failure branch April 3, 2018 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants