-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MCO-289: OCPBUGS-1324: Teach the MCO to use new format image #3317
MCO-289: OCPBUGS-1324: Teach the MCO to use new format image #3317
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks so much for putting this up! It's looking quite plausible to me; some comments below
restartPolicy: Always | ||
securityContext: | ||
runAsNonRoot: true | ||
runAsUser: 65534 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, do we need this one?
@@ -0,0 +1,11 @@ | |||
{{if .BaseOperatingSystemExtensionsContainer }} | |||
mode: 0600 | |||
path: "/etc/yum.repos.d/rhel-coreos-8-extensions.repo" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be a bit of a scramble but we could also try to add the long-overdue /run/yum.repos.d
. That'd have different tradeoffs.
@@ -14,7 +14,7 @@ contents: | | |||
Type=oneshot | |||
RemainAfterExit=yes | |||
# Disable existing repos (if any) so that OS extensions would use embedded RPMs only | |||
ExecStartPre=-/usr/bin/sh -c "sed -i 's/enabled=1/enabled=0/' /etc/yum.repos.d/*.repo" | |||
ExecStartPre=-/usr/bin/sh -c "sed -i 's/enabled=1/enabled=0/' /etc/yum.repos.d/*.repo; sed -i 's/enabled=0/enabled=1/' /etc/yum.repos.d/rhel-coreos-8-extensions.repo" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm. I wonder if we really still need this thing at all. We shouldn't have any enabled repos by default, no?
Ahhh I see this came in with 552f6d5 for OKD (FCOS) which does have repos.
Hmm. I'm thinking that if they wanted the repos disabled, they should do this process in their builds, and not here.
As is this line is going to actively fight anyone trying to add repos to RHCOS day 0, which I think is a somewhat reasonable thing to do.
Anyways, OK as is - clearly the tech debt here isn't new to this change.
pkg/daemon/rpm-ostree.go
Outdated
@@ -89,6 +89,12 @@ func NewNodeUpdaterClient() NodeUpdaterClient { | |||
// and gathering stderr into a buffer which will be returned in err | |||
// in case of error. | |||
func runRpmOstree(args ...string) error { | |||
// TODO(jkyros): update seems to want to check this too, so we probably just need to make sure it's there before we make an rpm-ostree call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah let's move this to func (r *RpmOstreeClient) Initialize() error {
@@ -563,7 +563,16 @@ func generateRenderedMachineConfig(pool *mcfgv1.MachineConfigPool, configs []*mc | |||
} | |||
} | |||
|
|||
merged, err := ctrlcommon.MergeMachineConfigs(configs, cconfig.Spec.OSImageURL) | |||
// If our new image formats are populated | |||
// TODO(jkyros): "properly" feature gate this? This is kind of a 'soft' gate based on the presence of extensions container, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK. That is indeed one way to get it tested in CI.
That said it also has the effect that CI and nightlies will behave differently...so I think I'd vote for a const enableNewImage = false
that can be flipped on. Or we just do the more proper feature gating via machineconfig.
pkg/controller/template/render.go
Outdated
@@ -294,7 +294,9 @@ func generateMachineConfigForName(config *RenderConfig, role, name, templateDir, | |||
return nil, fmt.Errorf("error creating MachineConfig from Ignition config: %w", err) | |||
} | |||
// And inject the osimageurl here | |||
mcfg.Spec.OSImageURL = config.OSImageURL | |||
// TODO(jkyros): if we're allowing overrides, this makes it hard to figure out of the user overrode it, or if it's just | |||
// "passthrough" from here since we don't mark "system" vs "user" machineconfigs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Well...presumably we'll revisit this when we add a CRD or something else.
That said really short term we could introduce hotfixOverrideOSImage
into machineconfig or so...did that ever get debated versus allowing htem to set osImageURL
?
It looks like this is getting the new image correctly in e2e-aws:
|
Long term...well, we want to cut over to a build controller, in which case we're always on the pod network. Even a bit longer term, we want to store RPMs as OCI artifacts, in which case we can push them to the registry and we could piggyback on the registry DNS hack! |
@cgwalters I'm not sure I understand why option 2 ("Schedule the extensions container in the host network namespace, use service IP to talk to it") needs hostnet. Things running on the host can reach the pods over the SDN. The reason we need to write to If you are able to create the service and look up the address it is given, then write that into the repository information, you should be fine. |
Oops yes sorry, I got wires crossed there. Edited the comment, thanks! |
oh no, that passed 😄. Alright, well that's promising, we know if we extract the extensions to disk everything seems to work. I'd prefer that the daemon not have to know about the extensions container URL but maybe I'm too picky. I'll save this, and I'm going to go back and try to get it working the service way then. |
pkg/daemon/update.go
Outdated
fipsFile = "/proc/sys/crypto/fips_enabled" | ||
extensionsRepo = "/etc/yum.repos.d/coreos-extensions.repo" | ||
osImageContentBaseDir = "/run/mco-machine-os-content/" | ||
osExtensionsContentBaseDir = "/run/mco-rhel-coreos-extensions/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
osExtensionsContentBaseDir = "/run/mco-rhel-coreos-extensions/" | |
osExtensionsContentBaseDir = "/run/mco-machine-os-extensions/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just mco-extensions
...hopefully we can retire the machine-os-content
nomenclature to make a clear break from the past.
@@ -14,7 +14,7 @@ contents: | | |||
Type=oneshot | |||
RemainAfterExit=yes | |||
# Disable existing repos (if any) so that OS extensions would use embedded RPMs only | |||
ExecStartPre=-/usr/bin/sh -c "sed -i 's/enabled=1/enabled=0/' /etc/yum.repos.d/*.repo" | |||
ExecStartPre=-/usr/bin/sh -c "sed -i 's/enabled=1/enabled=0/' /etc/yum.repos.d/*.repo; sed -i 's/enabled=0/enabled=1/' /etc/yum.repos.d/rhel-coreos-8-extensions.repo" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExecStartPre=-/usr/bin/sh -c "sed -i 's/enabled=1/enabled=0/' /etc/yum.repos.d/*.repo; sed -i 's/enabled=0/enabled=1/' /etc/yum.repos.d/rhel-coreos-8-extensions.repo" | |
ExecStartPre=-/usr/bin/sh -c "sed -i 's/enabled=1/enabled=0/' /etc/yum.repos.d/*.repo; sed -i 's/enabled=0/enabled=1/' /etc/yum.repos.d/coreos-extensions.repo" |
Removing this requirement might be a bit tricky. In OKD, we boot an FCOS node, and then rebase that to OKD's layered image. The yum repos already exist in the FCOS that is first provisioned, and the rebase won't overwrite file contents in /etc IIUC, so we have to do it here instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But do we still need to disable the repo files? Now that OKD is using layering, we're no longer installing packages on each node individually right?
IOW we can just delete this line, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh right. Looks like it can just be removed then.
/hold |
cc4ef5e
to
01e9a0b
Compare
Added extensions container field to |
@@ -12,7 +12,7 @@ data: | |||
# This (will eventually) replace the below when https://github.com/openshift/enhancements/pull/1032 | |||
# progresses towards the default. | |||
baseOperatingSystemContainer: "placeholder.url.oc.will.replace.this.org/placeholdernamespace:rhel-coreos-8" | |||
#baseOperatingSystemExtensionsContainer: "placeholder.url.oc.will.replace.this.org/placeholdernamespace:rhel-coreos-8-extensions" | |||
baseOperatingSystemExtensionsContainer: "placeholder.url.oc.will.replace.this.org/placeholdernamespace:rhel-coreos-8-extensions" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe we can renamebaseOperatingSystemContainer
to baseOSExtensionsContainer
and baseOperatingSystemContainer
to baseOSContainer
. It is also consistent with OSImageURL .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think as a general rule we should s/container/image/ - because it's relatively standard to have "container" be a running instance.
So perhaps baseOSImage
? I don't otherwise have a strong opinion on that vs e.g. baseOperatingSystemImage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though, it's tricky here because "image" can often mean virtual/physical machine image (AMI/ISO) versus "container image", and while it's usually clear from context, our use of the term image blurs those lines. Sometimes we need to spell it out fully by saying "container image" which is wordy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, baseOSImage/baseOSContainerImg/baseOSContainerImage would be more technically correct. Will leave it to John whatever he prefers among them :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with baseOSContainerImage
, it's still shorter than what we had 😄
pkg/daemon/update.go
Outdated
@@ -325,6 +336,40 @@ func ExtractOSImage(imgURL string) (osImageContentDir string, err error) { | |||
return | |||
} | |||
|
|||
// ExtractOSImage extracts OS image content in a temporary directory under /run/machine-os-content/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment need to be updated based on what ExtractOSImage does
pkg/daemon/update.go
Outdated
@@ -238,6 +239,16 @@ func addExtensionsRepo(osImageContentDir string) error { | |||
return nil | |||
} | |||
|
|||
// addExtensionsRepo adds a repo into /etc/yum.repos.d/ which we use later to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/addExtensionsRepo/addLayeredExtensionsRepo/
if _, err = pivotutils.RunExtBackground(cmdRetriesCount, "oc", args...); err != nil { | ||
// Workaround fixes for the environment where oc image extract fails. | ||
// See https://bugzilla.redhat.com/show_bug.cgi?id=1862979 | ||
glog.Infof("Falling back to using podman cp to fetch OS image content") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since extensions image is relatively small image, we can skip using oc and use podmanCopy directly too
/test e2e-agnostic-upgrade |
Since we already have e2e test for various platforms, let's get some better coverage by running on this PR. /test e2e-aws-upgrade-single-node Once, these looks good we can run payload test |
01e9a0b
to
c3cc527
Compare
Oh well that sure was fun. In hindsight I feel kind of silly, but upgrade tests were failing for this PR because:
TL;DR Starting from a fresh cluster everything is okay, but starting from a cluster where the templates have OSImageURL populated, you get stuck there forever. I'll get the rest of this cleanup done today (Friday). |
/test e2e-agnostic-upgrade |
/test e2e-aws-upgrade-single-node |
/test e2e-aws-upgrade-single-node |
c3cc527
to
1e4917b
Compare
@rphillips @cgwalters kube 1.25 PR has been merged openshift/kubernetes#1360 , can we merge this PR now? |
/hold cancel |
Control plane is failing on
This is the same issue as https://issues.redhat.com/browse/OKD-63 - we just don't have the fix on the base FCOS images yet. We'll spin a new rpm-ostree release soon which should get that sorted. Even shorter term, we could add a quick hack here (conditional on FCOS?) to chmod it? Sorry about the regression here, but what made this a bit less obvious to catch is that before this PR landed this code path wasn't the default, and now it is. |
Put up #3358 |
@jkyros: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-1324 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
/jira refresh |
@sinnykumari: Jira Issue OCPBUGS-1324 is in an unrecognized state (MODIFIED) and will not be moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Changes due to MCO adaptation to use new format image. See openshift/machine-config-operator#3317 for more info.
Changes due to MCO adaptation to use new format image. See openshift/machine-config-operator#3317 for more info.
Changes due to MCO adaptation to use new format image. See openshift/machine-config-operator#3317 for more info.
Changes due to MCO adaptation to use new format image. See openshift/machine-config-operator#3317 for more info.
Changes due to MCO adaptation to use new format image. See openshift/machine-config-operator#3317 for more info.
Changes due to MCO adaptation to use new format image. See openshift/machine-config-operator#3317 for more info.
Changes due to MCO adaptation to use new format image. See openshift/machine-config-operator#3317 for more info.
Changes due to MCO adaptation to use new format image. See openshift/machine-config-operator#3317 for more info.
Changes due to MCO adaptation to use new format image. See openshift/machine-config-operator#3317 for more info.
Changes due to MCO adaptation to use new format image. See openshift/machine-config-operator#3317 for more info. Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
Changes due to MCO adaptation to use new format image. See openshift/machine-config-operator#3317 for more info. Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
Changes due to MCO adaptation to use new format image. See openshift/machine-config-operator#3317 for more info.
Changes due to MCO adaptation to use new format image. See openshift/machine-config-operator#3317 for more info.
Changes due to MCO adaptation to use new format image. See openshift/machine-config-operator#3317 for more info.
This:
rhel-coreos-8
/rhel-coreos-8
extensions images by defaultex deploy-from-self
functionalityex deploy-from-self
accomplishes its task)(This PR experienced a lot of change as it progressed, so I'm retaining the original description below for posterity, but it is wrong -- we hit some snags using the extensions container as a service and had to adjust our approach)
This: - Makes the operator apply a service and deployment manifest for the extensions container if we have one - Adds a template for the `rhel-coreos-8-extensions.repo` - Allows the daemon to make "rebootless" changes to /etc/yum.repos.d/* so we don't trigger a reboot with our template - Re-enables kernel switching and extensions for the "layering" path - Tries to gate all of this behind the presence of the extensions containerCaveats:
To try this (because we don't have a productized extensions container yet), you can pack your own release: