-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modularization & Status Enhancements #188
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the PR @timuthy! Some very minor comments/questions below.
Besides that, I see that you have addressed the transition to Unknown
state for the members but not the further transition from Unknown
to NotReady
and also the separate transition for not ready pod to NotReady
status. Would you address that in a separate PR or do you think those cases should be addressed differently?
Thanks for mentioning it @amshuman-kr. I will update this PR as written in the proposal and as discussed via Slack. |
/status author-action |
@timuthy The pull request was assigned to you under |
Added via 134958d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change @timuthy! They look good. Do you think adding status.initialReplicas
and using max(status.initialReplicas, len(status.Members))
make sense? The maintenance of status.initialReplicas
could be postponed to later issues where the custodian is enhanced for other reconciliation actions like triggering bootstrapping.
WDYT?
Remove `status.podRef` which was introduced by an earlier, unreleased commit and use `status.name` instead.
Thanks for the discussion @amshuman-kr. I added the missing pieces in the last 3 commits (separated it for an easier review). We can squash merge the PR once we are ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the changes @timuthy! The changes look good. Just a couple of comments below (the first one is merely FYI).
// Bootstrap is a special case which is handled by the etcd controller. | ||
if !inBootstrap(etcd) && len(members) != 0 { | ||
etcd.Status.ClusterSize = pointer.Int32Ptr(int32(len(members))) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, this is incorrect. Only the initial bootstrapping is triggered by the etcd controller. Later bootstrapping due to quorum failure is to be done by the custodian controller. But this change could be done in #158.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this comment as this reflects the current state.
Tbh, I'm not convinced yet that the Custodian controller performs self healing in the form of re-bootstrapping because the Etcd controller can in the meantime still reconcile etcd
resources which can lead to races. In order to avoid a race, you need to introduce some kind of lock which is also not too nice, IMO.
Maybe I didn't check all cases in detail but at the moment I think an alternative would be easier to implement:
- Custodian sees that cluster is not healthy and self healing is required.
- Custodian triggers a reconciliation for the affected
etcd
resource. - Etcd controller kicks in and takes necessary action to bring cluster up again (=> desired state). In the meantime any changes to the very same
etcd
resource cannot lead to a race.
We don't have to decide in this PR of course, but still wanted to give an alternative approach to think about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to avoid a race, you need to introduce some kind of lock which is also not too nice, IMO.
Even if only the etcd controller is responsible for bootstrapping non-quorate clusters, some sort of race avoidance is still required because the Etcd
spec might get updated while bootstrapping is still on-going, in which case, it should either wait until the on-going bootstrapping completed before applying the latest spec changes or should abort the current bootstrapping (which might unnecessarily prolong the downtime).
- Custodian triggers a reconciliation for the affected etcd resource.
This cannot be the regular etcd controller reconciliation which might apply Etcd
resource spec changes along with triggering bootstrapping. As a part of the contract of a gardener extension, etcd-druid
not apply Etcd
resource spec changes unless reconcile
operation
annotation is present.
On the other hand, it makes no sense for the etcd-druid
to wait for the next shoot reconciliation to fix a non-quorate ETCD cluster.
But let's take this discussion out of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @timuthy for the changes. /lgtm
How to categorize this PR?
/area control-plane
/kind enhancement
What this PR does / why we need it:
This PR adds the following status checks for the
etcd
resource:Ready
members instatus.members
to fulfill the quorum.status.members
areReady
.LastUpdateTime
as a heartbeat and checks if it is within the expected time range (configurable via--etcd-member-threshold
).Which issue(s) this PR fixes:
Fixes #151
Special notes for your reviewer:
The PR does not handle cases in which an etcd member is lost and replaced and thus the corresponding
status.member
has to be removed. This needs to be done by the component which removes the lost member from the etcd cluster.Release note:
A re-sync mechanism has been added for the Custodian controller. The new flag `--custodian-sync-period (default 30s)` controls the duration after which the Custodian controller re-enqueues `etcd` resources for reconciliation. This can be considered as a health check interval.