bootkube: Inject bootstrap MachineConfigs into cluster #1189

cgwalters · 2019-02-04T21:15:53Z

This is an ugly fix for
openshift/machine-config-operator#367

This is an ugly fix for openshift/machine-config-operator#367

openshift-ci-robot · 2019-02-04T21:16:42Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cgwalters
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: flaper87

If they are not already assigned, you can assign the PR to them by writing /assign @flaper87 in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

wking · 2019-02-04T21:43:12Z

data/data/bootstrap/files/usr/local/bin/bootkube.sh.template

+# Copy the bootstrap MCs to inject into the target cluster
+# Yes this is a brutal hack, need to improve the MCC bootstrap above
+# 9a so we're after 99 - should change the others to 50- or something?
+for x in /etc/mcs/bootstrap/machine-configs/*.yaml; do


Where do these come from in the first place? Looks like /etc/mcs has only been mentioned in 6da1973 (#879), but I don't see any installer code obviously populating it. Maybe it's here? If so, is there a reason you couldn't address this entirely within openshift/machine-config-operator?

My understanding of the current code is here openshift/machine-config-operator#367 (comment)

but I don't see any installer code obviously populating it.

AIUI it's the static pod which we are just removing above this code:
/etc/kubernetes/manifests/machineconfigoperator-bootstrap-pod.yaml

If so, is there a reason you couldn't address this entirely within openshift/machine-config-operator?

Maybe - I didn't write this code and am still learning things here, so alternative suggestions appreciated!

As far as I understand things though...that bootstrap pod is what's serving out the Ignition to the masters. We don't have masters online until after echo "etcd cluster up. Killing etcd certificate signer..." (right?)

I believe we won't even have the machineconfigs CRD in the cluster until the MCO comes online. So we will end up waiting in openshift.service for that - similar to how we wait for the cluster API objects and inject those.

but I don't see any installer code obviously populating it.

AIUI it's the static pod which we are just removing above this code:
/etc/kubernetes/manifests/machineconfigoperator-bootstrap-pod.yaml

So can we update that code to make these alterations? It's maybe here calling here?

So we will end up waiting in openshift.service for that - similar to how we wait for the cluster API objects and inject those.

I'm not sure how this comes into this pull request, but I'm pushing to get openshift.service and openshift.sh functionality moved into cluster-bootstrap. See #1147, which I'm going to reroll after cluster-bootstrap picks up openshift/library-go#220.

So can we update that code to make these alterations?

Mmm...this code needs to generate content to inject into the target cluster. The static pod is running on the bootstrap before the cluster, right?

but I'm pushing to get openshift.service and openshift.sh functionality moved into cluster-bootstrap.

That makes sense to me...I was surprised at the low-tech nature of openshift.sh.

abhinavdahiya · 2019-02-04T22:21:29Z

This is an ugly fix for
openshift/machine-config-operator#367

Why do you believe the bootstrap mcc and long-running mcc are creating 2 different final machineconfigs?

/hold

cgwalters · 2019-02-04T22:26:04Z

Why do you believe the bootstrap mcc and long-running mcc are creating 2 different final machineconfigs?

Let's discuss that in openshift/machine-config-operator#367
Clearly a lot of people are hitting this.

cgwalters · 2019-02-04T22:32:31Z

Clearly a lot of people are hitting this.

Just to emphasize this, look at e.g. this recent PR to the installer which passed CI just fine...yet if you drill down into the clusteroperator status you'll notice the MCO is degraded:

"master": "pool is degraded because of 1 nodes are reporting degraded status on update. Cannot proceed.",

And if you look at the MCD logs in openshift-machine-config-operator_machine-config-daemon-82kdq_machine-config-daemon.log.gz:

I0202 02:27:31.338962 11501 daemon.go:875] While getting MachineConfig master-abdb3e951a9d508a63687694fe12b0cc, got: machineconfigs.machineconfiguration.openshift.io "master-abdb3e951a9d508a63687694fe12b0cc" not found. Retrying...

cgwalters · 2019-02-04T22:33:59Z

We need to get to the point where we're gating e2e-aws on nodes not being degraded (which is an interesting topic...I'd been adding tests for this to the MCO repo but do we have a way to extend e2e-aws with tests that live in other git repos besides origin)?

I fully admit this PR is a hack but I'm trying to make some progress on this as it's blocking my work on OS updates.

cgwalters · 2019-02-05T00:46:18Z

/test e2e-aws

cgwalters · 2019-02-05T14:27:23Z

/test e2e-aws

openshift-ci-robot · 2019-02-05T15:31:19Z

@cgwalters: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/prow/e2e-openstack	`c8d4ff6`	link	`/test e2e-openstack`
ci/prow/e2e-libvirt	`c8d4ff6`	link	`/test e2e-libvirt`
ci/prow/launch-aws	`c8d4ff6`	link	`/test launch-aws`
ci/prow/e2e-aws	`c8d4ff6`	link	`/test e2e-aws`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

cgwalters · 2019-02-08T22:21:52Z

While I think this could still make sense...we don't need this right now since we did end up identifying the underlying problem in #1194

cgwalters · 2019-02-12T23:02:35Z

Reopening this; just hit the failure case again here: openshift/machine-config-operator#363 (comment)

I know this isn't the most beautiful code but it should let us avoid this problem; we will be able to see the config drift.

abhinavdahiya · 2019-02-12T23:08:14Z

I know this isn't the most beautiful code but it should let us avoid this problem;

Open to any PRs that use mc* render to handle this like other operators. no inline.

deads2k · 2020-01-16T14:36:24Z

We have hit this in the field caused by a cluster-admin not getting all the options quite right. As I understand it, this can alleviate the "your cluster isn't installing" problem automatically in the unfortunate case and carries no negative side-effects in the common case. We should get this into a state where it (or something like it) can merge.

/reopen

openshift-ci-robot · 2020-01-16T14:36:25Z

@deads2k: Failed to re-open PR: state cannot be changed. The install-mcd-bootstrap branch was force-pushed or recreated.

In response to this:

We have hit this in the field caused by a cluster-admin not getting all the options quite right. As I understand it, this can alleviate the "your cluster isn't installing" problem automatically in the unfortunate case and carries no negative side-effects in the common case. We should get this into a state where it (or something like it) can merge.

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cgwalters · 2020-01-16T14:40:45Z

Amazingly, this PR rebases cleanly. However, looking at it now with months more of experience, I think we can clean this up...

cgwalters · 2020-01-16T14:42:54Z

Moved this to #2936

runcom · 2020-01-16T14:54:50Z

As I understand it, this can alleviate the "your cluster isn't installing" problem automatically in the unfortunate case

I don't fully agree with that, in the event there's a drift we're installing something which is what the user wants but then the MCO will reconcile to something completely not intended but the cluster can still function and nobody notices, it should be fatal instaed

deads2k · 2020-01-16T15:04:48Z

As I understand it, this can alleviate the "your cluster isn't installing" problem automatically in the unfortunate case

I don't fully agree with that, in the event there's a drift we're installing something which is what the user wants but then the MCO will reconcile to something completely not intended but the cluster can still function and nobody notices, it should be fatal instaed

reply in #2936 (comment)

bootkube: Inject bootstrap MachineConfigs into cluster

c8d4ff6

This is an ugly fix for openshift/machine-config-operator#367

cgwalters mentioned this pull request Feb 4, 2019

master machineconfig pool reports degraded on new cluster openshift/machine-config-operator#367

Closed

openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Feb 4, 2019

openshift-ci-robot requested review from rajatchopra and tomassedovic February 4, 2019 21:16

wking reviewed Feb 4, 2019

View reviewed changes

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 4, 2019

cgwalters closed this Feb 8, 2019

This was referenced Feb 8, 2019

pkg/operator: lay down ClusterOperator for MCO openshift/machine-config-operator#386

Merged

Inject OSImageURL from CVO into templated MachineConfigs openshift/machine-config-operator#363

Closed

cgwalters mentioned this pull request Feb 25, 2019

Pick image from release payload for crio.conf pause_image openshift/machine-config-operator#471

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bootkube: Inject bootstrap MachineConfigs into cluster #1189

bootkube: Inject bootstrap MachineConfigs into cluster #1189

cgwalters commented Feb 4, 2019

openshift-ci-robot commented Feb 4, 2019

wking Feb 4, 2019

cgwalters Feb 4, 2019

cgwalters Feb 4, 2019

wking Feb 4, 2019

cgwalters Feb 4, 2019

abhinavdahiya commented Feb 4, 2019

cgwalters commented Feb 4, 2019

cgwalters commented Feb 4, 2019 •

edited

Loading

cgwalters commented Feb 4, 2019

cgwalters commented Feb 5, 2019

cgwalters commented Feb 5, 2019

openshift-ci-robot commented Feb 5, 2019

cgwalters commented Feb 8, 2019

cgwalters commented Feb 12, 2019

abhinavdahiya commented Feb 12, 2019

deads2k commented Jan 16, 2020

openshift-ci-robot commented Jan 16, 2020

cgwalters commented Jan 16, 2020

cgwalters commented Jan 16, 2020

runcom commented Jan 16, 2020 •

edited

Loading

deads2k commented Jan 16, 2020

bootkube: Inject bootstrap MachineConfigs into cluster #1189

bootkube: Inject bootstrap MachineConfigs into cluster #1189

Conversation

cgwalters commented Feb 4, 2019

openshift-ci-robot commented Feb 4, 2019

wking Feb 4, 2019

Choose a reason for hiding this comment

cgwalters Feb 4, 2019

Choose a reason for hiding this comment

cgwalters Feb 4, 2019

Choose a reason for hiding this comment

wking Feb 4, 2019

Choose a reason for hiding this comment

cgwalters Feb 4, 2019

Choose a reason for hiding this comment

abhinavdahiya commented Feb 4, 2019

cgwalters commented Feb 4, 2019

cgwalters commented Feb 4, 2019 • edited Loading

cgwalters commented Feb 4, 2019

cgwalters commented Feb 5, 2019

cgwalters commented Feb 5, 2019

openshift-ci-robot commented Feb 5, 2019

cgwalters commented Feb 8, 2019

cgwalters commented Feb 12, 2019

abhinavdahiya commented Feb 12, 2019

deads2k commented Jan 16, 2020

openshift-ci-robot commented Jan 16, 2020

cgwalters commented Jan 16, 2020

cgwalters commented Jan 16, 2020

runcom commented Jan 16, 2020 • edited Loading

deads2k commented Jan 16, 2020

cgwalters commented Feb 4, 2019 •

edited

Loading

runcom commented Jan 16, 2020 •

edited

Loading