-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rhcos: add rhcos-inject enhancement #492
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,198 @@ | ||||||
--- | ||||||
title: rhcos-inject | ||||||
authors: | ||||||
- "@crawford" | ||||||
reviewers: | ||||||
- TBD | ||||||
approvers: | ||||||
- TBD | ||||||
creation-date: 2020-10-01 | ||||||
last-updated: 2020-10-01 | ||||||
status: provisional | ||||||
--- | ||||||
|
||||||
# rhcos-inject | ||||||
|
||||||
## Release Signoff Checklist | ||||||
|
||||||
- [ ] Enhancement is `implementable` | ||||||
- [ ] Design details are appropriately documented from clear requirements | ||||||
- [ ] Test plan is defined | ||||||
- [ ] Graduation criteria for dev preview, tech preview, GA | ||||||
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) | ||||||
|
||||||
## Summary | ||||||
|
||||||
Throughout it's existence, OpenShift 4 has put a heavy influence on installation flows that make use of installer-provisioned infrastructure. This has largely been successful for predictable environments, such as AWS, Azure, or GCP, but it has (unsurprisingly) proven to make deployments into less predictable environments more difficult. These less predictable environments introduce a lot of variability into areas like network configuration, disk layout, and the life cycle of the machines and that makes it difficult or impossible for OpenShift to start from a foundation of shared assumption. To bridge this gap, admins need a way to inject a certain amount of customization before OpenShift installation can begin. | ||||||
|
||||||
## Motivation | ||||||
|
||||||
Admins need a way to inject customization into their RHCOS nodes before they are provisioned by the cluster. In most cases, this configuration is require in order for the provisioning process to complete, so the normal facilities (e.g. Machine Configuration Operator) are not yet available. Today, this is a very manual process involving an admin interactively providing that configuration at the point of running the RHCOS installer. This might work in some environments, but many others don't have easy interactive access due to technical or policy constraints. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
### Goals | ||||||
|
||||||
- Make it easy for an admin to add network configuration to each of the RHCOS nodes added both before and after OpenShift installation. | ||||||
- Allow an admin to add custom services and miscellaneous configuration to their RHCOS nodes. | ||||||
|
||||||
### Non-Goals | ||||||
|
||||||
- Inventory management | ||||||
- Bring-your-own-RHEL customization | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should call out that currently this doesn't solve the problem for IPI baremetal deployments (since that doesn't currently use the RHCOS installer iso), that will be addressed via #467 |
||||||
|
||||||
## Proposal | ||||||
|
||||||
Add a new, optional step to the installation process that allows an admin to inject network configuration and any other customization to the RHCOS installer image (and maybe other RHCOS images in the future) before invocation. This customization is performed by a new `rhcos-inject` utility that takes as input an RHCOS installer image (`rhcos-<version>-installer.<architecture>.iso`), network configuration, and a set of Ignition Configs and generates a new RHCOS installer image with all of the configuration and customization included. This new installer image can then be booted in an unsupervised manner and will complete the RHCOS installation along with any customization. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it be adding network-related config to the ignition and embedding the result in the ISO? Or would it change the ISO contents in a different way? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It sounds as though this tool does exactly the following:
Is that the idea? If so, it'd be good to spell out why this should be a new tool rather than a subcommand of coreos-installer. If we view this as sugar over Also worth noting that some of the customization functionality would be useful for PXE installs, so we should also be able to produce the live Ignition config without embedding it into an ISO. Nit: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wasn't aware that the new coreos-installer had so much functionality. It sounds like a lot of the heavy-lifting has already been done and that these new features should just be folded in. That's great news! Regarding the original question, I had assumed that the network config would happen for two contexts: the pre-pivot Ignition environment (initramfs) and the post-pivot system. This would allow for image-based deployments (e.g. QCOWs or VHDs) to fetch their Ignition Configs from the Machine Config Server running in the cluster. If we were only to configure the networking in the post-pivot system (e.g. by using Ignition to write the configs), the machine wouldn't be able to fetch its config from the MCS and would have to be provided the fully-rendered config up front. This is only easy to do if the live ISO is performing the installation though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is your thinking with regards to making a new |
||||||
|
||||||
Initially, `rhcos-inject` will primarily focus on network configuration but as the needs of customers evolve, this can be expanded to include other common customization. As always, an escape hatch is needed for unanticipated cases and this will be in the form of raw Ignition Configs. If there is a need for additional customization beyond network configuration, the admin can include that in an Ignition Config and inject that alongside everything else. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The raw config would be applied via There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that when a user provides ignition config via openshift-install, it ends up in the pointer ignition config, which means we still need networking to download the rendered config from the MCS (which ignition does before applying any config so there's potentially a chicken/egg problem wrt network config). That's why I'm proposing #467 - it means that in the IPI baremetal case we can automate providing the entire config via a data URL at runtime, and avoid the need for any networking when ignition runs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. EDIT: after discussions, I understand why we want to go the MCO way @hardys I think with this enhancement, network configuration is going to be a target of this rhcos-inject step which makes #467 not needed afaict - the need for the ignition config in this paragraph is for other scenarios but networking. The tl;dr would be that your image has the network customization needed already so anything that isn't network and can be done via MachineConfigs will just work as you're able to fetch from the MCS at that point There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @runcom No, currently baremetal IPI doesn't use the installer iso, and there's nothing in the architecture that allows for automated customization of the target OS image, so #467 is still needed. As an interim step we have https://access.redhat.com/solutions/5460671 which documents a similar iso customization to this enhancement (but the target OS image, not the installer iso), but that's not a scalable or well integrated solution IMO - instead we want to just provide the config via ignition at the point of deployment (which #467 will enable) |
||||||
|
||||||
An example of its invocation can be seen here: | ||||||
|
||||||
```console | ||||||
$ rhcos-inject \ | ||||||
--input=rhcos-installer.x86_64.iso \ | ||||||
--output=control-0.iso \ | ||||||
--openshift-config=master.ign \ | ||||||
--bond=bond0:em1,em2:mode=active-backup \ | ||||||
--ip=10.10.10.2::10.10.10.254:255.255.255.0:control0.example.com:bond0:none | ||||||
``` | ||||||
|
||||||
With a little scripting, a number of custom RHCOS installers can be quickly created: | ||||||
|
||||||
```zsh | ||||||
for i in {00..11} | ||||||
do | ||||||
rhcos-inject \ | ||||||
--input=rhcos-installer.x86_64.iso \ | ||||||
--output=worker-${i}.iso \ | ||||||
--openshift-config=worker.ign \ | ||||||
--ip=10.10.10.$((i+10))::10.10.10.254:255.255.255.0:worker${i}.example.com:enp1s0:none | ||||||
done | ||||||
``` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is basically a workaround for the fact that MCO can't do per-machine configs ref openshift/machine-config-operator#1720 right? This won't work for IPI baremetal as it's currently implemented since we don't have a way to customize the ignition files per host, we'd have to implement some equivalent customization method in the deploy workflow. Also I wonder what do we do on day-2 - to enable the configuration to me managed we'd need to also provide equivalent config via MachineConfig, but there's currently no way to do that per-machine - what happens for example if some syntax change is required on upgrade to the nic configs that result from this process? |
||||||
|
||||||
This network configuration applies to the post-pivot RHCOS installation environment and is copied to both the pre- and post-pivot installed environment by default. In the event that exotic configuration is required, this copying of configuration can be disabled and a service can be used instead (TODO(crawford) figure out systemd service ordering): | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The installation environment doesn't pivot. |
||||||
|
||||||
```console | ||||||
$ cat rhcp.ign | ||||||
{ | ||||||
"ignition": { "version": "3.0.0"}, | ||||||
"systemd": { | ||||||
"units": [{ | ||||||
"name": "random-host-configuration-protocol.service", | ||||||
"enabled": true, | ||||||
"contents": "[Service]\nType=oneshot\nExecStart=/usr/bin/env nmcli ...\n\n[Install]\nWantedBy=pre-install.target" | ||||||
}] | ||||||
} | ||||||
} | ||||||
|
||||||
$ rhcos-inject \ | ||||||
--input=rhcos-installer.x86_64.iso \ | ||||||
--output=worker.iso \ | ||||||
--openshift-config=worker.ign \ | ||||||
--installer-config=rhcp.ign \ | ||||||
--persist-networking=false | ||||||
``` | ||||||
|
||||||
This custom service runs in the context of the RHCOS installer and is responsible for writing the network configuration for the installed system (e.g. based on an external lookup by MAC address). This escape hatch can be used in environments which employ an in-house host configuration procedure. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should probably provide some sugar for generating the install hook boilerplate. |
||||||
|
||||||
### User Stories | ||||||
|
||||||
#### Static Network Configuration | ||||||
|
||||||
An RHCOS node is deployed into an environment without DHCP and needs to be explicitly configured with an IP address/mask, gateway, and hostname: | ||||||
|
||||||
```console | ||||||
$ rhcos-inject \ | ||||||
--input=rhcos-installer.x86_64.iso \ | ||||||
--output=control-0.iso \ | ||||||
--openshift-config=master.ign \ | ||||||
--ip=10.10.10.2::10.10.10.254:255.255.255.0:control0.example.com:enp1s0:none | ||||||
``` | ||||||
|
||||||
This static IP configuration is used for both the RHCOS installation and the post-installation system. | ||||||
|
||||||
#### Dedicated Provisioning Network | ||||||
|
||||||
An RHCOS node is deployed into an environment which makes use of a dedicated provisioning network which is only used during installation: | ||||||
|
||||||
```console | ||||||
$ rhcos-inject \ | ||||||
--input=rhcos-installer.x86_64.iso \ | ||||||
--output=control-0.iso \ | ||||||
--openshift-config=master.ign \ | ||||||
--ip=10.10.10.2::10.10.10.254:255.255.255.0:control0.example.com:enp1s0:none \ | ||||||
--ip=:::::enp1s1:none \ | ||||||
--persist-network=false | ||||||
``` | ||||||
|
||||||
This static IP configuration is used during the installation, but once the machine reboots into the running system, it uses a different configuration. This is likely to be paired with the following use case. | ||||||
|
||||||
#### Custom Dynamic Host Configuration | ||||||
|
||||||
An RHCOS node is deployed into an environment which uses an in-house IPAM implementation in lieu of DHCP: | ||||||
|
||||||
```console | ||||||
$ cat network.ign | ||||||
{ | ||||||
"ignition": { "version": "3.0.0"}, | ||||||
"systemd": { | ||||||
"units": [{ | ||||||
"name": "configure-networking.service", | ||||||
"enabled": true, | ||||||
"contents": "[Service]\nType=oneshot\nExecStart=/usr/bin/env nmcli ...\n\n[Install]\nWantedBy=pre-install.target" | ||||||
}] | ||||||
} | ||||||
} | ||||||
|
||||||
$ rhcos-inject \ | ||||||
--input=rhcos-installer.x86_64.iso \ | ||||||
--output=my-rhcos-installer.x86_64.iso \ | ||||||
--openshift-config=master.ign \ | ||||||
--installer-config=network.ign | ||||||
``` | ||||||
|
||||||
This service can contain just about any piece of logic needed in order to statically configure the node based on a dynamic assignment. For example, a customer may use this mechanism to configure a link-local address, request an IP from a provisioning system, reconfigure the network interfaces, and then phone-home to acknowledge a successful configuration. | ||||||
|
||||||
### Implementation Details/Notes/Constraints | ||||||
|
||||||
As with all software, it's important to consider coupling and cohesion when looking at this solution and its alternatives. Defining clear API boundaries and building upon layers of abstraction are some of the most effective techniques for avoiding pitfalls. This solution chooses to make a distinction between the networking required for an individual node to operate and for the cluster itself; respectively, the machine network and the pod network. A functioning machine network is considered a prerequisite to installation, as is power, cooling, and many others. The pod network, on the other hand, is something created and managed by OpenShift. Working backward from this assumption, it's clear that the solution to the problem of pre-installation network configuration should not be solved by the cluster. | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Network configuration isn't just an installation time problem - there may be changes over time which impact the configuration created by this workflow - how do we envisage customers dealing with that? It seems like the answer is probably ssh/scp or ansible - and that's quite inconsistent with the user experience for everything else (including secondary nics which aren't needed during initial deployment). |
||||||
When thinking about where this functionality should live, `openshift-install` may seem like an obvious choice. This is a poor fit, however. `openshift-install` is only used during installation and destruction of the cluster, whereas this functionality would also be necessary post-installation (e.g. during a scaling event). Additionally, there are a number of existing and future components which would benefit from this functionality, but may not want to carry the full weight of `openshift-install` (368 MiB at the time writing). Even further, `openshift-install` needs to continue to support MacOS, but it wouldn't be feasible to do the necessary Linux file system operations from that environment. It's going to be most flexible to implement this new functionality in a stand-alone, Linux-only utility. Future improvements may include running this utility in the cluster so that an admin can perform customizations from the comfort of the browser and then simply downloading the result. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How would |
||||||
|
||||||
It hasn't been explicitly mentioned yet, but implicit in this proposal is a slight rework to the RHCOS installer. In order for a user to effectively be able to leverage custom systemd services, there will need to be a simple and well-defined ordering of targets. There will also likely be some amount of changes necessary so that operations which depend on one another can communicate success and failure and so that the overall installation can be easily monitored. | ||||||
|
||||||
### Risks and Mitigations | ||||||
|
||||||
Since this is a new component entirely, there is very little risk to the existing installation procedures. The biggest risk appears to be the escape hatch; the concern being that it will be heavily abused to solve any machine customization challenges, including ones that should be tackled by the Machine Config Operator. | ||||||
|
||||||
## Design Details | ||||||
|
||||||
### Open Questions | ||||||
|
||||||
#### Flag Names | ||||||
|
||||||
I hate the classic `ip=:::::::::::::` syntax but I didn't want to redesign that in this initial pass. We should rethink the specific representation of these options before implementation. This will also allow for a more expressive invocation that can be used for things like Dot1Q, VLAN-tagging, teaming, etc. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be very nice, since we have an enhancement doing it for day 2 networking, that it was configurable with the NodeNetworkConfigurationPolicy syntax (either from a cli parameter of from stdin):
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We're already using kernel-argument syntax in the |
||||||
|
||||||
#### Predictable (vs Consistent) Interface Names | ||||||
|
||||||
There are going to be a lot of environments where a machine only has the one NIC and the admin just wants to configure it. Rather than requiring that they know the exact name of the interface, it would be helpful if there was a more flexible specifier that we can use. | ||||||
|
||||||
#### Service Orchestration | ||||||
|
||||||
What systemd targets are needed to guide the ordering of services that are injected into the installation environment? I presume we'll want something to ensure services run before installation begins and after that completes. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are documented sequencing directives for both cases. |
||||||
|
||||||
#### Raw Ignition Configs | ||||||
|
||||||
Support for injecting raw Ignition Configs is very much an escape hatch - no human should be spending any significant amount of time reading or writing Ignition configs. If users find utility in this mechanism (as I suspect they will), they are immediately going to be met with frustration when simple syntax errors prevent nodes from installing correctly. We should consider jumping directly to a more user-friendly approach of integrating a config transpiler (e.g. https://github.com/coreos/container-linux-config-transpiler) so that admins have a better experience. The obvious downside to this approach is that it will be easier to make customizations using this mechanism versus the preferred paradigm of Machine Configs. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, it might actually make more sense for all of the sugar described here to be implemented directly in FCCT. As described, |
||||||
|
||||||
## Implementation History | ||||||
|
||||||
TBD | ||||||
|
||||||
## Drawbacks | ||||||
|
||||||
- This approach taints the pristine RHCOS assets that come from our build system. One of the early goals of OpenShift 4 was to avoid the use of "golden images" and to push customers to make use of declarative configuration instead. This is a notable departure from that stance and opens the door to misuse and obfuscation of the installation environment. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That ship has already sailed. However, coreos-installer provides |
||||||
|
||||||
## Alternatives | ||||||
|
||||||
- https://github.com/openshift/enhancements/pull/399 | ||||||
- https://github.com/openshift/enhancements/pull/467 | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This isn't an alternative to 467, it won't work for IPI baremetal in its current form. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it's an alternative to 399 either - that's about declarative network configuration via the MCO, but this about a non-declarative way to inject install-time configuration, explicitly not managed by the MCO? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/require/required/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit but I'd say "In some cases" as IME it's a fairly small subset of early-network (and perhaps disk) configuration that we keep running into that can't easily be handled via MachineConfig/Ignition.
Also, IMO the issue isn't that users don't have easy interactive access - as the owner of some hardware I probably do have that access, the issue is that solution isn't something I can automate, so it's not workable in anything other than very small-scale PoC type situations.