-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automated Policy-based Disk Encryption on Boot Images #15
Automated Policy-based Disk Encryption on Boot Images #15
Conversation
/assign @childsb |
I started writing some code around the ignition+ostree redeployment bit: coreos/ignition-dracut#107 |
/unassign @chancez |
As noted in the Ignition Dracut PR, Ignition does not support LUKS creation. |
Let me try to rephrase/capture my concerns. I'd like to see this proposal define a very high level way (e.g. via The reason is: this proposal creates yet another substantial difference between FCOS and RHCOS. Further, it creates that divergence as a short-term fix for one point in larger problem space around root filesystem configuration. This proposal is also implicitly saying that root filesystem encryption is e.g. more important than switching RHCOS to Ignition spec 3 and adding LUKS support to Ignition - that may be the case, but we should understand that. (For example, we want to support LUKS on RAID, LVM etc.) I'd like to ensure that whatever we do here can be transparently upgraded to full Ignition 3 support later. Now, this gets into my concern around exactly what admins can configure. My 2c on this is that we start from the very simplest: by default, the root filesystem is encrypted with a key that is not accessible after boot. If a TPM is available, we use that. If a TPM is not available...well, we can only easily do hacks - or we do something like have an in-cluster Tang server, i.e. we do NBDE by default. We need to clearly call out the disaster recovery implications of this - we need to make clear, there's no "recovery key" or anything like that. (Or, alternatively we require the admin specify a recovery key?) We know we want to support external-to-the cluster Tang servers - that's an easy case. But again, let's define what admins write in an |
Unpacking @cgwalters concerns here (which I greatly appreciate!)
That's fair. I spend the time defining the OS level perspective and I wanted to get the base idea out for review first.
FCOS is for general clusters and container workloads, while RHCOS is single-purpose in running Openshift. Alignment is highly desirable; divergence should be tolerable when it enables key features for that single-use.
Calling this a "short-term" does not do the proposal justice. To our users, encrypted data-at-risk is very germane and it can make the difference between adoption. Users are asking for this now and its an adoption blocker. For example, without encryption, FIPS 140-2 compliance cannot happen. Simply put, our users do not care about Ignition or Ignition specs unless they have to; that's a developer concern. The users care about encryption. What this proposal does is to us developers give a PATH and TIME at the cost of short-term divergence in on-disk format. This proposal solves the problem in a universal way, and gives developers time to ensure that we do the Ignition work well.
One of the merits of this idea is that it does not require Ignition or MCO support today; both would be channels for delivering the Cleivs config. I have spoken to the Ignition maintainers about the migration path and did so specifically before submitting this enhancement. Once we have LUKS support in Ignition, we can use a default Ignition configuration that replicates the behavior. I would argue that since the change to a LUKS container is for the boot image, this gives a path forward since it does not have a surface in Ignition.
By key, I would assume you mean the LUKS passphrase? The proposal state the key is not accessible after boot without doing per-node spelunking. The random passphrase is provisioned only for the purpose of re-encrypting the LUKS container. After the Clevis binding happens, the key-slot with the random passphrase is destroyed, and the only way to access it is through the token written by Clevis.
In-cluster Tang was considered but rejected due to the complexity of the idea. How do you deal with a rolling reboot or a power-outage? By "hack" are you referencing the proposed plain-text pin for Clevis? I would agree its less than perfect -- and the proposal calls out it specifically -- but it solves the situation where a cluster is bootstrapped and then an admin on day N wishes to move to a NBDE model.
Under the "Risk and Mitigations," the proposal illustrates how recovery would look if it was booted into the initramfs:
I work on that tomorrow |
MachineConfigs embed Ignition - so Ignition is very much user facing. We want to dig ourselves out of the pit we're in now where searching for Ignition stuff lands on coreos.com and one gets confused about spec 2 vs 3 and whether it's talking about Container Linux or... etc. I think we definitely want administrators to write Ignition (or MachineConfigs) and finish the vision of supporting re-laying out the rootfs. Let's be sure we're on the same page on that! This document seems to more describe changing Ignition as "out of scope". Agreed to change the wording there to make it clear that e.g. in the future, RHCOS will switch back to unifying with FCOS and not come as a null cipher LUKS by default? |
BTW, another reason we want the full Ignition re-partitioning support is that today one can provide an Ignition config which creates |
FCOS being the upstream of RHCOS is super important, but it can't be the reason we don't move forward adding encryption at rest support for OCP.
100%
I'm not sure we all are on that same page. Allowing (and in some cases encouraging) MC editing was not originally what we wanted admins to do. However, it became a necessity. We also now have I'm a bit confused on how writing MCs are strongly linked to re-laying out rootfs. Is this meaning re-formatting? Encryption/re-encrption? etc..
True. People can do that, but they can do a lot of things with the awesome tool set Linux and OCP provide. Do you see |
For sure there's been debate on this. But ultimately we shipped OpenShift 4.1, and the way to configure the host is fcct as you know only applies to FCOS. However I see MachineConfigs as also increasingly playing the role of FCCT in offering "high level sugar" around Ignition for RHCOS/OpenShift.
Well, this hasn't been implemented or designed fully, but the way I was thinking of this is that an administrator should be able to create a |
Approved from my POV 👍 |
In prow terms, There's other edits/improvements to make here, but the broad outline LGTM. We probably need to be sure architects are good with this as well before trying to revise a lot of the details. |
/assign @smarterclayton @imcleod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an excellent starting point for the data encryption story. 👍
Since there are explicit requirements on the Installer for supporting this, let's get @abhinavdahiya and @wking aware of this, too. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ashcrow, cgwalters, darkmuggle, miabbott The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Part of: openshift/enhancements#15 We added FIPS to the MCO a while ago: openshift/machine-config-operator#889 However, during some discussion it became clear that the main use case for FIPS is "day 1" - it doesn't make sense to turn it on "day 2" because the standard requires that e.g. long-term key material was created with FIPS enabled. Further, it's unlikely that admins will want to turn it *off* if they ever had it on. This is a good candidate for an install config.
Today for IaaS clouds, we default to an instance type, which then in turn usually provides a default size. RHCOS resizes on boot to that size, distinct from its "default" 16G size. However, libvirt installs were inheriting our default size. We'd like to shrink it because we plan to land encryption: openshift/enhancements#15 And the less data we need to encrypt, the better. (In the future I'd like to make this configurable with a variable, but let's just prepare for the encryption work now)
Part of: openshift/enhancements#15 We added FIPS to the MCO a while ago: openshift/machine-config-operator#889 However, during some discussion it became clear that the main use case for FIPS is "day 1" - it doesn't make sense to turn it on "day 2" because the standard requires that e.g. long-term key material was created with FIPS enabled. Further, it's unlikely that admins will want to turn it *off* if they ever had it on. This is a good candidate for an install config.
Part of: openshift/enhancements#15 We added FIPS to the MCO a while ago: openshift/machine-config-operator#889 However, during some discussion it became clear that the main use case for FIPS is "day 1" - it doesn't make sense to turn it on "day 2" because the standard requires that e.g. long-term key material was created with FIPS enabled. Further, it's unlikely that admins will want to turn it *off* if they ever had it on. This is a good candidate for an install config.
Starting with OpenShift 4.3, CoreOS boot images will be encryption-ready. On first boot either administator provided or automated policy will be used for the root disk encryption. Bootstrapping and node provisioning MUST fail when the policy cannot be applied unless the administrator has opted out.