Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support a framework for "reliable extensions" to the base OS #401

Closed
dustymabe opened this issue Feb 26, 2020 · 23 comments
Closed

support a framework for "reliable extensions" to the base OS #401

dustymabe opened this issue Feb 26, 2020 · 23 comments
Assignees
Labels
jira for syncing to jira

Comments

@dustymabe
Copy link
Member

dustymabe commented Feb 26, 2020

There is a valley of death between the hilltop of content we have included in the OS and the mountain of containerized applications that users run on top of the OS (services, data processing, etc). In this valley of death there are a ton of small little OS level utilities or daemons that are either hard to containerize or not desirable to containerize because of maintenance burden. We get requests all the time. Some of them it makes a lot of sense to include in the base OS. Some of them it's clear they don't belong. Some of them it's really hard to say. What we've identified is a clear need to be able to deliver some content to the host that isn't part of the base OS but also isn't necessarily its own container.

@cgwalters has a similar issue opened here #354

Some possible solutions, including some that were thrown around during the meeting today:

  • deliver a small yum repo with a curated set of packages in them alongside the OSTree content, versioned with the OS so we don't hit package layering: split versions between OSTree base vs yum repo #400 (fixing package layering: split versions between OSTree base vs yum repo #400 generically for all OSTree based Fedora derivatives could be a solution here)
  • build and deliver an addon OSTree layer that includes commonly requested packages. Enabling the addon layer would be an all or nothing operation (you get all addons or none), but would make testing and reliability easier
  • allow users to have alternative roots (more close to what is proposed in Best practices for delivering container content that executes on the host #354) that are based on the host OS but allow dnf/yum operations inside and can easily override package versions from the host OS if needed. We'd probably want to be able to systemctl enable stuff inside here or be able to find stuff from here in the $PATH. I'm thinking this would be almost like toolbox, but not actually a container. More like an overlay on top of the OS itself. This needs more brainstorming.

The solution we come up with could be generic and apply to all packages OR curated and apply to a subset of packages.

@dustymabe dustymabe changed the title "reliable extensions" support a framework for "reliable extensions" to the base OS Feb 26, 2020
@keithy
Copy link

keithy commented Feb 26, 2020

lol, thats what I am using nix for (heretical I know) Good idea though.

@dustymabe
Copy link
Member Author

Something else that would make a potential solution much more attractive: not requiring a reboot. Either enhancing support for livefs or somehow applying changes using ignition before the system comes up could possibly make this easier to achieve.

@cgwalters
Copy link
Member

cgwalters commented Feb 28, 2020

If we do something like this for FCOS, for OpenShift4/RHCOS it would likely need to manifest in a similar way to machine-os-content and kernel-rt - rather than offering an rpm-md repository we ship extra RPMs inside a container image. The rationale is we don't want to break the "lifecycle binding" that we have with the machine-os-content today - we promote container release images, users can do disconnected installs solely by mirroring container images, etc.

@jlebon
Copy link
Member

jlebon commented Feb 28, 2020

Cool, this looks really nice we should work towards something like it.

Something else that would make a potential solution much more attractive: not requiring a reboot. Either enhancing support for livefs or somehow applying changes using ignition before the system comes up could possibly make this easier to achieve.

I agree that we should include in this discussion what we want the UX for this to be, since it may guide implementation.

For example, I think we should strive to have it configurable via FCC/Ignition so that it remains canonical. E.g. we could have an FCC sugar like:

extensions:
- usbguard

Then, how this is actually implemented can be changed in the future. E.g. we could start off with just doing it in the real root and rebooting, then work towards doing it from the initrd on first boot.

@cgwalters
Copy link
Member

we could have an FCC sugar like:

Right, and a similar thing in MachineConfig.

@jlebon
Copy link
Member

jlebon commented Feb 28, 2020

If we do something like this for FCOS, for OpenShift4/RHCOS it would likely need to manifest in a similar way to machine-os-content and kernel-rt - rather than offering an rpm-md repository we ship extra RPMs inside a container image.

Yeah, makes sense. Maybe with an rpm-md repo too baked in there so we can just point rpm-ostree at it? That way, e.g. changing deps is mostly a transparent thing, and we're not hardcoding them in golang. (Hmm, a --repofrompath=-type switch would help).

@cgwalters
Copy link
Member

Something else that would make a potential solution much more attractive: not requiring a reboot.

Yeah, though for OpenShift we always apply OS updates anyways before any workloads land, so a reboot is required. Even for FCOS, I think there will be a large enough set of people who want to change kernel arguments that we should be thinking of this more as an optimization for the "no kernel args" case.

@ashcrow
Copy link
Member

ashcrow commented Apr 1, 2020

Is this still being discussed?

@jlebon
Copy link
Member

jlebon commented Apr 1, 2020

I think we have consensus on the "side yum repo" approach. We just need to discuss implementation details now.

@dustymabe
Copy link
Member Author

@jlebon I think "side yum repo" was a contingency if we couldn't get a better answer for the rest of Fedora's public yum repos being locked with our updates. Let me see if I can find the notes from that discussion.

@cgwalters
Copy link
Member

For OpenShift 4 though it seems like a no-brainer to me to extend machine-os-content for now with extra stuff, following the kernel-rt precedent. The MachineConfig fragment would look the same as the FCCT:

extensions:
  - usbguard

But, having stuff in machine-os-content versus e.g. machine-os-extensions should be an implemmentation detail.

@LorbusChris
Copy link
Contributor

That will help with extending OKD's FCOS-based machine-os-content as well, right?
(see https://github.com/openshift/release/blob/8aa0554b979434e68157329f5b7940d3fe979f53/ci-operator/jobs/openshift/release/openshift-release-release-4.5-periodics.yaml#L600-L611 for how we currently extend FCOS to build it - not very pretty)

@cgwalters
Copy link
Member

That will help with extending OKD's FCOS-based machine-os-content as well, right?

The short answer is yes, although that still leaves open the crio problem. And...if we go down this route a bit farther...one could imagine that we make RHCOS actually be just RHEL and we ship crio.rpm in a separate machine-openshift-content or so and ship a systemd unit that pkg layers it on firstboot...

@cgwalters
Copy link
Member

coreos/rpm-ostree#2055

@keithy
Copy link

keithy commented Apr 13, 2020

Just wanted to put in a word for systemd portable containers again. (not sure if SELinux issues have been fixed yet)

@ashcrow
Copy link
Member

ashcrow commented Apr 21, 2020

coreos/rpm-ostree#2055

PTAL at the proposal above.

cgwalters added a commit to cgwalters/enhancements that referenced this issue May 11, 2020
This enhancement proposes a MachineConfig fragment like:
```
extensions:
  - usbguard
```

This is the OpenShift version of the [Fedora CoreOS extension system tracker](coreos/fedora-coreos-tracker#401).

That will add additional software onto the host,
but this software will still be versioned with the host
(included as part of the OpenShift release payload) and
upgraded with the cluster.
cgwalters added a commit to cgwalters/enhancements that referenced this issue May 15, 2020
This enhancement proposes a MachineConfig fragment like:
```
extensions:
  - usbguard
```

This is the OpenShift version of the [Fedora CoreOS extension system tracker](coreos/fedora-coreos-tracker#401).

That will add additional software onto the host,
but this software will still be versioned with the host
(included as part of the OpenShift release payload) and
upgraded with the cluster.
@jlebon
Copy link
Member

jlebon commented May 26, 2020

E.g. we could start off with just doing it in the real root and rebooting, then work towards doing it from the initrd on first boot.

Esp. for live PXE/ISO there's no way but to apply the changes live.

cgwalters added a commit to cgwalters/enhancements that referenced this issue May 27, 2020
This enhancement proposes a MachineConfig fragment like:
```
extensions:
  - usbguard
```

This is the OpenShift version of the [Fedora CoreOS extension system tracker](coreos/fedora-coreos-tracker#401).

That will add additional software onto the host,
but this software will still be versioned with the host
(included as part of the OpenShift release payload) and
upgraded with the cluster.
@dustymabe
Copy link
Member Author

We had a meeting recently with releng and infra to discuss possible solutions here for all editions of Fedora that use RPM-OSTree and suffer from similar package layering problems as described in #400. I just sent an email to the IoT mailing list and copied some members of the silverblue team as well (silverblue doesn't have a mailing list).


Hello Silverblue and IoT teams. The FCOS team got together with Fedora releng last week to discuss the
issue regarding package layering that periodically plagues us (https://github.com/coreos/fedora-coreos-tracker/issues/400).
The solution we believe will help all OSTree based editions involves creating an archive repo where
any package that has made it to the Fedora stable repositories can be accessed at a later time. In
general, we think this should solve the problem because we should be able to install packages that
won't require updating the base layer.

Goals:

- help solve the same problem for Fedora CoreOS, Fedora Silverblue, and Fedora IoT
- don't add to mirror network requirements
    - i.e., store/host the content somewhere else. AWS is a candidate here.
- keep traditional systems behavior the same
    - don't enable archive repo(s) by default on non-ostree distributions

Since it can take a long time to create repos for large package sets we may end up creating more than
one repo that gets updated at different cadences. For example:

- One that gets updated weekly
    - all packages obsoleted before X date
    - large package set, so we run it once a week
- One that gets updated nightly
    - all packages obsoleted after X date
    - small package set, so we can run it nightly
    
This is still a work in progress and the design may take a few turns as we work out the details and/or
find new information.

Having this new repository will help us in Fedora CoreOS as we have a stable stream that lags behind
Fedora stable repos. It should also help Fedora Silverblue when they move to a release cadence that
doesn't match the bodhi updates repos. I'm not sure how much of a problem this currently is in Fedora
IoT, but I imagine Fedora IoT has similar problems.

Thoughts?

Dusty Mabe

@dustymabe
Copy link
Member Author

I've created a proof of concept archive repo and verified it solves the "package availability" problem, but we need a small other piece which tells the transaction solver (the brains that answers the "what rpms solve this request?" question) to leave the base layer packages alone and not update them. For that we're going to track the work over in the current work in progress PR.

There is one more problem that also needs to be solved, which is conflicts in the NFS packages that make it so you can't layer things that depend on NFS. Tracked in #572.

@dustymabe
Copy link
Member Author

For the archive repo POC I have. I'm planning to meet with the Fedora Releng/Infra team next week to see what the next steps are there.

@dustymabe
Copy link
Member Author

The meeting with the releng/infra team went well. Proposal for next steps is over in https://pagure.io/releng/issue/9717

@dustymabe
Copy link
Member Author

OK, so from the original description we ended up doing the 2nd half (the part that's mentioned in the parenthesis) of the following option:

We fixed #400 generically by adding an "archive yum repo" to the set of yum repositories on our system and enhancing rpm-ostree to be able to find a set of packages that solves the request without replacing base layer packages.

coreos/fedora-coreos-config#673

We can now close this ticket out as more or less a duplicate of #400 since we solved that problem generically, but it's worth noting the user experience here isn't quite up to par with the design goals for FCOS. We want to be able to do this layering seamlessly without the user having to come up with their own systemd units. This seamless integration bit will be tracked by #681.

@cgwalters
Copy link
Member

xref coreos/enhancements#7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira for syncing to jira
Projects
None yet
Development

No branches or pull requests

7 participants