-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bootkube: Inject bootstrap MachineConfigs into cluster #1189
Conversation
This is an ugly fix for openshift/machine-config-operator#367
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: cgwalters If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
# Copy the bootstrap MCs to inject into the target cluster | ||
# Yes this is a brutal hack, need to improve the MCC bootstrap above | ||
# 9a so we're after 99 - should change the others to 50- or something? | ||
for x in /etc/mcs/bootstrap/machine-configs/*.yaml; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of the current code is here openshift/machine-config-operator#367 (comment)
but I don't see any installer code obviously populating it.
AIUI it's the static pod which we are just removing above this code:
/etc/kubernetes/manifests/machineconfigoperator-bootstrap-pod.yaml
If so, is there a reason you couldn't address this entirely within openshift/machine-config-operator?
Maybe - I didn't write this code and am still learning things here, so alternative suggestions appreciated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand things though...that bootstrap pod is what's serving out the Ignition to the masters. We don't have masters online until after echo "etcd cluster up. Killing etcd certificate signer..."
(right?)
I believe we won't even have the machineconfigs
CRD in the cluster until the MCO comes online. So we will end up waiting in openshift.service
for that - similar to how we wait for the cluster API objects and inject those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I don't see any installer code obviously populating it.
AIUI it's the static pod which we are just removing above this code:
/etc/kubernetes/manifests/machineconfigoperator-bootstrap-pod.yaml
So can we update that code to make these alterations? It's maybe here calling here?
So we will end up waiting in
openshift.service
for that - similar to how we wait for the cluster API objects and inject those.
I'm not sure how this comes into this pull request, but I'm pushing to get openshift.service
and openshift.sh
functionality moved into cluster-bootstrap
. See #1147, which I'm going to reroll after cluster-bootstrap
picks up openshift/library-go#220.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So can we update that code to make these alterations?
Mmm...this code needs to generate content to inject into the target cluster. The static pod is running on the bootstrap before the cluster, right?
but I'm pushing to get openshift.service and openshift.sh functionality moved into cluster-bootstrap.
That makes sense to me...I was surprised at the low-tech nature of openshift.sh
.
Why do you believe the bootstrap mcc and long-running mcc are creating 2 different final machineconfigs? /hold |
Let's discuss that in openshift/machine-config-operator#367 |
Just to emphasize this, look at e.g. this recent PR to the installer which passed CI just fine...yet if you drill down into the clusteroperator status you'll notice the MCO is degraded:
And if you look at the MCD logs in
|
We need to get to the point where we're gating I fully admit this PR is a hack but I'm trying to make some progress on this as it's blocking my work on OS updates. |
/test e2e-aws |
1 similar comment
/test e2e-aws |
@cgwalters: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
While I think this could still make sense...we don't need this right now since we did end up identifying the underlying problem in #1194 |
Reopening this; just hit the failure case again here: openshift/machine-config-operator#363 (comment) I know this isn't the most beautiful code but it should let us avoid this problem; we will be able to see the config drift. |
Open to any PRs that use mc* render to handle this like other operators. no inline. |
We have hit this in the field caused by a cluster-admin not getting all the options quite right. As I understand it, this can alleviate the "your cluster isn't installing" problem automatically in the unfortunate case and carries no negative side-effects in the common case. We should get this into a state where it (or something like it) can merge. /reopen |
@deads2k: Failed to re-open PR: state cannot be changed. The install-mcd-bootstrap branch was force-pushed or recreated. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Amazingly, this PR rebases cleanly. However, looking at it now with months more of experience, I think we can clean this up... |
Moved this to #2936 |
I don't fully agree with that, in the event there's a drift we're installing something which is what the user wants but then the MCO will reconcile to something completely not intended but the cluster can still function and nobody notices, it should be fatal instaed |
reply in #2936 (comment) |
This is an ugly fix for
openshift/machine-config-operator#367