-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for TPM- and Tang-based disk encryption #83
Conversation
Looks good! Can be merged when the tests pass. This replaces #65 |
In flatcar/mantle#493 I see that we explicitly want to support an encrypted rootfs. If that's the case we should make sure that the rootfs gets unlocked and mounted before |
RequiresMountsFor=/sysusr/usr/ | ||
# dracut-initqueue will wait for /dev/disk/by-label/ROOT, so we must decrypt the root disk before dracut-initqueue. | ||
Before=dracut-initqueue.service | ||
Wants=systemd-networkd.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clevis itself doesn't really require networking, or? Only when used with tang. Having networking in every initrd boot slows things down and it also may not work with a static network setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use a flag file or cmdline option to opt-into clevis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add ConditionKernelCommandLine=|flatcar.clevis_initrd_unlock=1
to the clevis-unlock.service
unit file and require the user to ensure that this command-line option is set if needed, by either setting it via Ignition kernelArguments
or in their PXE boot config. I think this should work, but, from the user perspective, it'd be nicer if configuring disk encryption in Ignition just worked, without needing to remember to change some other bit of configuration somewhere else.
Another solution that doesn't require the user to set boot args would be to change clevis-unlock.service
as follows: 1) remove Require=systemd-networkd
, 2) do not let it run the unlocker script directly, but instead let it run a script that first checks whether there are any disks to decrypt (e.g., via lsblk); if there are no disks to decrypt, let it terminate, but if there are disks to decrypt, let it ensure systemd-networkd is started and then call the unlocker script. This is a little more complicated for us than the kernel command line option but maybe better from the user perspective.
My hunch is that the second solution is preferable (if it's possible to get this to work). What do you think---shall I give it a shot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The disk can also be unlocked directly by systemd, e.g., when the systemd kernel cmdline is set up for that. In this case we don't need network and don't want clevis to also try to unlock at the same time I guess. I'm also interested in an automated way but for now I would stick to the kernel cmdline parameter (there is an ignition section to configure kernel args).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say a commandline that requests networking is ok, but we should support clevis without networking too (only tpm).
Look at the clevis repo, there are examples there that iterate over luks devices and check whether any of them have a tang pin. We can have a generator do that and then pull in systemd-networkd as required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, so we want to be able to 1) not run clevis in the initramfs at all, 2) run clevis without network (tpm only), 3) run clevis with network. I implemented this as follows
rd.clevis_unlock=network
runs clevis with network dependenciesrd.clevis_unlock=no_network
runs clevis without network dependencies (TPM only)- If neither is set, then clevis doesn't run at all.
Happy to change the name of the command line parameter if you'd prefer different names.
@jepio Good idea! For now, I used command line parameters to control in the generator whether systemd-networkd is pulled in, but a generator that looks for Tang-encrypted disks would work as well. Slight complication: generators are run before sysusr-usr.mount
and we currently overwrite the clevis dependencies with symlinks in the initramfs (maybe to keep the initramfs image small, @pothos ?). If we want to use a generator that reads the LUKS headers, we probably need to have some of the actual clevis files rather than symlinks in the initramfs image.
After adding the necessary dependencies, changing the generator to something like this should do the job:
#!/bin/bash
set -e
UNIT_DIR="$1"
function exists_tang_encrypted_disk() {
local _d
for _d in $(lsblk -o PATH,FSTYPE,RM \
| awk '$2 == "crypto_LUKS" && $3 == "0" { print $1 }');
do
if bindings="$(clevis-luks-list -d "${_d}" 2>/dev/null)" \
&& [[ "${bindings}" == *tang* ]]; then
return 0
fi
done
return 1
}
function add_clevis_network_dependencies() {
mkdir -p "${UNIT_DIR}/clevis_unlock.service.d"
cat >${UNIT_DIR}/clevis_unlock.service.d/networkd_dependency.conf <<EOF
# Automatically generated by tang-network-generator
[Unit]
Wants=systemd-networkd.service systemd-resolved.service network-online.target
After=systemd-networkd.service systemd-resolved.service network-online.target
EOF
}
if exists_tang_encrypted_disk ; then
add_clevis_network_dependencies
fi
Since the user now has to explicitly enable clevis unlock anyways, having the user specify whether they need network or not seems not so bad from a UX perspective, and maybe it's not worth the extra complexity to detect this automatically at this point. I don't have a strong opinion here and would both be happy to stick with the non-automated, current solution, or to change it to the automated one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could also be a helper script in ExecStartPre
that starts networking when needed (systemctl start --quiet systemd-networkd systemd-resolved
).
I see it as an optimization for a follow-up and we can focus on getting it merged as is, first.
/usr/bin/tpm2_pcrread \ | ||
/usr/bin/tpm2_unseal \ | ||
/usr/lib/systemd-reply-password \ | ||
/usr/local/libexec/clevis-luks-askpass \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the note from the upstream module: https://github.com/coreos/ignition/blob/main/dracut/30ignition/module-setup.sh, more precisely https://github.com/coreos/ignition/blob/main/dracut/30ignition/module-setup.sh#L49. Can we reuse the upstream clevis module for this purpose in order to not diverge from the upstream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we could add a comment here that points to the upstream clevis module to have a chance of finding divergence more easily .
If we import the clevis module we would still add the same contents we have here, I'm both ok with having it imported and patched or with having things here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we import the clevis module
To clarify, the clevis
module for dracut from app-crypt/clevis is included in the initramfs. One way to see this is that the module 40clevis-unlock
currently lists it as a dependency here. Unfortunately, the clevis
module will not unlock disks in Flatcar's initramfs though as explained in this comment. That is why we have our own 40clevis-unlock
dracut module in addition the clevis
module. The lines of code you commented on replace files installed by the clevis
module, such as /usr/bin/tpm2_unseal
, with symlinks to the same files under /sysusr/
. That is, we can remove the symlinking logic here and things will work just fine. My understanding was that replacing files with symlinks was meant to avoid blowing up the initramfs image unnecessarily, given that the files we need are in /sysusr/
anyways. But I started working on this PR after the symlinking had been introduced and maybe I am misunderstanding and the symlinking was originally introduced under the assumption that the clevis
module would not be installed, in which case the symlinks would be essential for the unlocking to work.
Some options on how to proceed:
- keep both
clevis
and40clevis-unlock
as they currently are, and add a comment noting that we diverge from upstream30ignition
here - keep both
clevis
and40clevis-unlock
as they currently are, but do not overwrite the real files with symlinks, instead using the files from theclevis
module; this would keep us us closer to upstream30ignition
(if I understood correctly, this was @ader1990's suggestion) - do not install the
clevis
module and instead make sure that all files that40clevis-unlock
depends on are present in/sysusr/usr
and symlinked - do not install
40clevis-unlock
and instead extend the Flatcar patch ofapp-crypt/clevis
so that theclevis
module manages to unlock disks on its own by registering a dracut hook
I do not have particularly strong opinions on how to proceed, so let me know which option you prefer. I am not sure how straightforward option 4 would be, and once we use a dracut hook rather than our own systemd unit for unlocking, we might not be able to switch off initramfs unlocking using the kernel cmdline (without further patching of app-crypt/clevis).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option 1 sounds ok, or? We should keep the symlinks for reducing the initrd size impact of this addition. About the systemd-cryptsetup integration: I think that can also be triggered through a luks.uuid=
cmdline parameter instead of /etc/crypttab
- maybe it would work to write this out to grub.cfg
through a helper? Jeremi also noted that https://www.freedesktop.org/software/systemd/man/latest/systemd-gpt-auto-generator.html exists and could work. Anyway, I think the current way is ok even if we iterate on it. Once we document things we ideally shouldn't change them in a backwards-incompatible way anymore, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re symlinks: Agreed. I will add a comment explaining that we diverge from 30ignition
by symlinking to save space. I will also check that we really symlink everything we can.
Re how to unlock in initramfs: I like the idea of using rd.luks.uuid/name
or rd.auto
to trigger disk unlock through systemd-cryptsetup in the initramfs. I tried setting these cmdline parameter instead of rd.clevis_unlock
on an image built on the current version in this PR, and the initramfs unlock worked fine.
Suggestion:
- Remove the custom
rd.clevis_unlock
parameter - Remove the custom initramfs systemd unit to unlock disks
- Keep the generator that enables systemd-networkd, but rename the parameter to something more general, such as
rd.network=1
, and make the generator add a dependency from dracut-initqueue (or some other unit that runs early enough) to systemd-networkd, systemd-resolved, etc. - For now, do not automatically add the command line parameters for initramfs unlocking to grub.cfg. Instead, require the appropriate parameters to be set in the Ignition config by the user.
I'm pretty sure I can push those changes until tomorrow end of day, so it shouldn't delay the merge too much. I think this approach is better than the current approach because
- we stick to standard
rd.*
parameters rather than introducing a Flatcar-specificrd.clevis_unlock
parameter, - we get rid of the custom systemd unit in our initramfs, so less code for us to maintain,
- the current solution also requires the user to set command line arguments to enable initramfs unlock, so nothing is lost through the proposed switch to
rd.*
params
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With an encrypted rootfs it didn't work, and I still had to pass
luks.uuid=
or it would hang.
The problem seems to be that the gpt auto generator only runs as "late" generator.
Edit: Maybe this is the problem:
This generator will only look for the root partition on the same physical disk where the EFI System Partition (ESP) is located. Note that support from the boot loader is required: the EFI variable LoaderDevicePartUUID
of the 4a67b082-0a4c-41cf-b6c7-440b29bb8c4f vendor UUID is used to determine from which partition, and hence the disk, from which the system was booted. If the boot loader does not set this variable, this generator
will not be able to detect the root partition. See the Boot Loader Interface[3] for detail
But it looks like it might work once we have this bli module https://wiki.archlinux.org/title/GRUB#LoaderDevicePartUUID
But root=
must also not be specified on the cmdline.
… so I guess we could write a custom generator that runs the systemd gpt auto generator with the right environment in a mount namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it instead of adding a hack to run the gpt auto generator we could have a custom generator that that looks up the luks UDID based on the by-partlabel/ROOT
entry and then runs the systemd-cryptsetup-generator
with a kernel cmdline bind-mounted in a mount namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed commits that implement the switch to using the systemd-cryptsetup unlocking in the initramfs and rd.*
arguments, as described here.
From the user's perspective, here's how it works:
- for non-root disks, the user simply adds this to their Butane config:
luks:
- name: dataencrypted
device: "/dev/disk/by-partlabel/data"
clevis:
tpm2: true
- for root disks, the user must additionally set the UUID of the LUKS device and add a kernel parameter to trigger unlock in the initramfs, and, if network-bound encryption is used, a kernel parameter to start systemd-networkd in the initramfs:
kernel_arguments:
should_exist:
- rd.luks.name=12345678-9abc-def0-1234-56789abcdef0=rootencrypted
- rd.networkd=1
luks:
- name: rootencrypted
device: "/dev/disk/by-partlabel/root"
uuid: 12345678-9abc-def0-1234-56789abcdef0
clevis:
tang:
- url: "http://tang.mycompany.com"
thumbprint: "Hk1VN2eKhzaVqWhXtXxXIGF5LRZt4cBWWb07I1-a0NX"
See also the changes to the automated tests in the mantle repository that I just pushed here. These tests pass with the new version of the bootengine.
From an implementation perspective, the only bootengine code we now add is the generator that implements adding a dependency on systemd-networkd
etc. if rd.networkd=1
is set. I deleted the custom unlock service.
Personally, I feel better about this solution than about the previous one; I think this solution is user-friendly and adds less complexity on the Flatcar side, by reusing systemd-cryptsetup.
From my perspective, the only thing left to do (other than getting the initramfs image size under control---see the thread below) is to consider removing the need for the user to set kernel_arguments
in their Butane for root disk encryption. I think it'd probably be okay to merge as is and then later remove the need for kernel_arguments as an optimization, but I'd also be happy to push another commit that adds this optimization before we merge. Let me know what you think.
Regarding how to optimize, what you suggest would be a way to solve the problem, I think:
we could have a custom generator that that looks up the luks UDID based on the by-partlabel/ROOT entry and then runs the systemd-cryptsetup-generator with a kernel cmdline bind-mounted in a mount namespace
I'm not sure how big of a problem that is, but note that this solution would not work for Butane configs that do not label their root partition at all or label it something other than ROOT
. As far as I know, it is no requirement to use this label for the root partition. Of course, we could document that users need to use this label for unlocking in the initramfs to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, that looks good.
The partition label already is ROOT and won't change, when the partition is reformatted with luks, or? In any case it's easy to require using a partition label and it's nicer than having to provide the uuid upfront. Anyway, I think this is a follow-up improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The partition label already is ROOT and won't change, when the partition is reformatted with luks, or?
I was thinking about the case where the root partition is moved to a new partition on a disk other than the OS disk, e.g. as described here in the documentation. In the case where the root partition on the OS disk is used, you are right.
In any case, I 100% agree with you that we could just require the user to use this partition label.
[Unit] | ||
Wants=systemd-networkd.service systemd-resolved.service network-online.target | ||
After=systemd-networkd.service systemd-resolved.service network-online.target | ||
EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also use this generator to avoid having to mask systemd-cryptsetup@rootencrypted.service
. Because generators should only write to the UNIT_DIR I guess the easiest "masking" would be to generate a drop-in like
ExecStart=
ExecStart=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we mask systemd-cryptsetup@rootencrypted.service
in the generator, using a hard-coded name systemd-cryptsetup@rootencrypted.service
, then once a user changes their Butane from
luks:
- name: rootencrypted
device: "/dev/disk/by-partlabel/root"
...
to
luks:
- name: cryptoroot
device: "/dev/disk/by-partlabel/root"
...
this will stop working because systemd-cryptsetup@cryptoroot
isn't masked. I worry that having the masking depend on a magic luks.name
will be confusing to users. Personally, I would prefer to have a Butane config that explicitly masks the relevant cryptsetup unit than a Butane config in which there is a hidden dependency between the value of luks.name
and whether the relevant cryptsetup unit is masked.
We could try to detect the name of the dm-crypt device that contains the filesystem labeled ROOT in the generator. I can't think of a robust way to do this though. The Ignition config contains this information, but it is deleted after the first boot and so won't be available to the generator. Otherwise, only the filesystem metadata contains label information, but we would have to decrypt the dm-crypt devices in the generator to access it.
You might know a good way to achieve the masking independently of the luks.name
---if so, let me know and I'm happy to implement it. Otherwise, I'd vote for leaving it as is and requiring explicit masking. Happy to hear other opinions though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for the info, if I understand the relation to the other topic, the best way would be to make clevis work with the systemd-cryptsetup generated units? We could try that by adding a luks.uuid=
cmdline arg and not passing the custom kernel arg we defined for the unlock service added here, or?
Anyway, I'm fine with the current approach - I'll make a final test to see if we can merge it as is. Improvements can be done later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pothos @simoncampion please check the discussion related to the initrd size before merging it. Merging this PR might increase the size of the initrd over the 50% threshold. flatcar/Flatcar#1381
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that if luks.uuid
is present on the cmdline, systemd will only try to unlock those partitions whose UUIDs are specified. If the user (or some automation we built) sets rd.luks.uuid
and luks.uuid
appropriately, there should be no need to mask units. (Unfortunately, just setting rd.auto=1
doesn't cut it though, if I see correctly; systemd will unlock in initramfs and still try to unlock again in late user-space and fail.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid we cannot merge this because it will increase the initramfs size to over 50% of the 128MB boot partition. 3850.1.0 with the changed bootengine
has >50% use on /boot
:
emergency@flatcarvm ~ $ df
Filesystem 1K-blocks Used Available Use% Mounted on
...
/dev/vda1 129039 64602 64437 51% /boot
...
I'll rebase bootengine
and scripts
on the main branches and build and check again, but I wouldn't be surprised if we're still above 50%.
Is it worth the effort to try to reduce the initramfs impact of this PR to stay under 50% or should we hold off the merge until there is a more general solution to the 64MB issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is ongoing effort to reduce the size under 50%. This is our best bet for the moment: flatcar/scripts#1734.
What is the increase of this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is ongoing effort to reduce the size under 50%. This is our best bet for the moment: flatcar/scripts#1734.
What is the increase of this change?
To make a good check, you need to rebase the flatcar/scripts PR, as alot has happened in the meantime and the /boot partition size had ~300KB of runway a few days ago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with the reduction in flatcar/scripts#1734 we should be able to both merge this PR and the systemd 255 update, and then we need to look for further reduction opportunities again.
Funny, I wanted to check if systemd-cryptenroll works equally but then found out that we don't have the TPM2 build flag enabled. |
Hello, This PR merge is currently blocking systemd upgrade PR, as the HEAD of the bootengine cannot be used with flatcar/scripts (it will fail to build because initrd binaries are not there yet). Also, note that this PR introduces an increase of around 2MB to the compressed initrd, leaving around 2MB of runway left. If the PR flatcar/scripts#1560 gets merged, the systemd upgrade flatcar/scripts#1679 needs to be re-evalued and properly retested with this new feature, which will take some time. @sayanchowdhury @t-lo I would suggest to postpone the systemd upgrade for the next release if this feature is prefered to be merged first, to have enough time to re-test the changes. |
Are you sure this breaks anything? It's only installing wrappers as far as I can see. And where does the 2MB size increase come from? |
The flatcar/scripts master will not find these files anymore: https://github.com/flatcar/scripts/blob/main/sdk_container/src/third_party/coreos-overlay/sys-kernel/bootengine/bootengine-9999.ebuild#L44 See: https://github.com/flatcar/bootengine/pull/83/files#diff-362bfe2c7282fafca415455746991b5b7313c8c8c914639c554e7f88d94fceed I have seen these value using the amd64 image from https://github.com/flatcar/scripts/actions/runs/8266361594: 62047KB used, 66993 KB free for the /boot partition. ~2MB size increase versus flatcar/scripts#1734 |
The size increase in the initrd image comes from the accompanying PR in the scripts repo. It installs the package @ader1990 correctly points out that merging this PR without the accompanying PR in the To proceed, we need to decide how we feel about the initrd increase and the relative priority of disk encryption and the systemd upgrade. I don't feel qualified to make these decisions, and I am open to all options, incl. reverting the changes in this PR and re-evaluating initrd size after the systemd upgrade is merged. |
we can proceed with maybe both at the same time, if the initrd increase is acceptable (I am neutral on this part). I created a branch with both PRs included, to have a more comprehensive test result https://github.com/flatcar/scripts/tree/ader1990/upgrade-to-systemd-255-tpm-tang. Currently, running this locally to see if there are any hiccups. |
Merging both branches together is normally easy when a maintainer can click merge, this time we had a night in between though ;) |
Add support for TPM- and Tang-based disk encryption
This PR adds support for TPM- and Tang-based disk encryption by Ignition. It also adds support for decrypting encrypted ROOT partitions in the initramfs with a dracut module. It includes work by @krishjainx towards supporting these features, plus a few further changes that I needed to make for the features to fully work. I left changes by @krishjainx largely untouched where no modifications were needed to make it work, but I'm happy to address comments about any of the changes in this PR.
How to use
See the accompanying PR in the
scripts
repository.Testing done
I tested these changes in combination with the changes in
scripts
using the new tests in this PR.changelog/
directory (user-facing change, bug fix, security fix, update)/boot
and/usr
size, packages, list files for any missing binaries, kernel modules, config files, kernel modules, etc.