systemd initrd decryption prompt times out after 90 seconds #250003

lopsided98 · 2023-08-18T18:19:17Z

Describe the bug

With systemd initrd and an encrypted rootfs, the system enters emergency mode if the decryption password is not entered within 90 seconds. This occurs because systemd device units time out by default after 90 seconds. Additionally, ZFS on LUKS (I haven't tested native ZFS encryption) times out after 60 seconds because of a hardcoded timeout in zfs-import-${pool}.service.

To work around the first issue, I added x-systemd.device-timeout=0 to the root filesystem options. To fix ZFS, I added the decrypted rootfs device unit as a dependency of zfs-import-root.service, so the 60 second timeout doesn't start until the device is decrypted. I also set JobTimeoutSec=infinity on the device unit (see here).

I don't see an obvious way to turn these workarounds into something that can be automatically configured in nixpkgs, but I think we should find some solution, because the current behavior is unexpected and annoying.

Notify maintainers

@ElvishJerricco

The text was updated successfully, but these errors were encountered:

surfaceflinger · 2023-08-24T16:07:21Z

~~Can confirm that it's broken with zfs native encryption too~~
not anymore, i often wake up, turn on my pc and leave it on the zfs decryption prompt for ~hour before getting back to it

ElvishJerricco · 2023-09-19T01:43:58Z

So this is a tricky problem. For non-ZFS, systemd is supposed to handle this. I've simulated the systemd-cryptsetup-generator and the systemd-fstab-generator and look at this:

$ cat etc/fstab
/dev/mapper/virt /foo ext4 defaults 0 0

$ cat etc/crypttab
virt LABEL=phys

$ unshare -U -r -m bash -c 'mount --bind /nix/store ./nix/store; mount --bind /proc ./proc; chroot . /run/current-system/systemd/lib/systemd/system-generators/systemd-fstab-generator /run/systemd/generator /run/systemd/generator.early /run/systemd/generator.late'

$ unshare -U -r -m bash -c 'mount --bind /nix/store ./nix/store; mount --bind /proc ./proc; chroot . /run/current-system/systemd/lib/systemd/system-generators/systemd-cryptsetup-generator /run/systemd/generator /run/systemd/generator.early /run/systemd/generator.late'

$ find run/systemd/
run/systemd
run/systemd/generator
run/systemd/generator/cryptsetup.target.requires
run/systemd/generator/cryptsetup.target.requires/systemd-cryptsetup@virt.service
run/systemd/generator/dev-mapper-virt.device.d
run/systemd/generator/dev-mapper-virt.device.d/40-device-timeout.conf
run/systemd/generator/systemd-cryptsetup@virt.service
run/systemd/generator/local-fs.target.requires
run/systemd/generator/local-fs.target.requires/foo.mount
run/systemd/generator/foo.mount
run/systemd/generator/local-fs.target.wants
run/systemd/generator/local-fs.target.wants/systemd-remount-fs.service
run/systemd/generator/dev-mapper-virt.device.requires
run/systemd/generator/dev-mapper-virt.device.requires/systemd-cryptsetup@virt.service
run/systemd/generator.late
run/systemd/generator.early

$ cat run/systemd/generator/dev-mapper-virt.device.d/40-device-timeout.conf
# Automatically generated by systemd-cryptsetup-generator

[Unit]
JobTimeoutSec=infinity

So what's happening here is that the file system foo.mount won't be started until dev-mapper-virt.device appears, but because of dev-mapper-virt.device.d/40-device-timeout.conf, that device will never timeout. The device timeout works correctly because systemd-cryptsetup@virt.service requires and is ordered after dev-disk-by\x2dlabel-phys.device, so the timeout on dev-disk-by\x2dlabel-phys.device will cause cascading failures in the event that the physical device fails to show up.

Now, for those having this problem in the non-ZFS case: This means you should set your file system device to /dev/mapper/foo, not /dev/disk/by-whatever/whatever. This will ensure that it only times out when the actual physical device fails to appear, not when you fail to enter the passphrase in time. By depending on anything other than /dev/mapper/foo, you're failing to get the timeout override that makes this all work.

As for ZFS, the problem is analogous. The import service is not depending on the device that it needs to import, so it's starting too early and assuming the device has failed to appear. In an ideal world, ZFS would have udev rules that makes a device indicative of the pool name only appear once the pool's drives are all available, so that we could order the import service after said device. For now, the best alternative is probably to order the import service after cryptsetup.target, but this is not without caveats. For instance, what about users who have crypttab devices stored on ZFS zvols? The import service actually needs to come before cryptsetup.target in that case. Not sure what exactly to do here.

systemd-cryptsetup-generator automatically applies an infinite timeout to crypto devices, but only if they are referenced as /dev/mapper/<name>. See: NixOS/nixpkgs#250003 (comment)

lopsided98 · 2023-09-19T23:44:25Z

Thank you for looking into this; I can confirm that this fixes the non-ZFS case. Quickly looking at the code, it seems like it would be relatively simple for systemd to apply the drop in to the by-uuid path as well, but then I guess you could still break it by using one of the the by-id paths or /dev/dm-0.

See NixOS/nixpkgs#250003

SuperSandro2000 · 2023-12-16T04:24:02Z

I've also implemented https://github.com/lopsided98/nixos-config/blob/master/machines/HP-Z420/default.nix#L121-L134 which I think we should be able to auto generate. If it is not generic solvable, then at least behind an option which may be by default on.

Also we should treat this issue with a bit of priority because I ended up several times in emergency mode when the unlocking failed even after already entering the normal system.

ElvishJerricco · 2023-12-18T18:48:28Z

@SuperSandro2000 Don't create custom .device units like that. Just order the zfs service against the mapper names, as I described above, instead of /dev/disk/by-uuid names. The LUKS logic is already taking care of finding disks by UUID if that's what you care about.

There isn't an option that could be turned on by default, because the disks required differ from system to system. The only common thing that could be done by default is ordering after cryptsetup.target, but as I said before, this breaks other setups that have LUKS devices on the zpool.

SuperSandro2000 · 2023-12-18T22:01:22Z

You mean like?

boot.initrd.systemd.services."zfs-import-root" = let
  zfsPools = [
    "dev-mapper-machine\\x2deins.device"
    "dev-mapper-machine\\x2dzwei.device"
  ];
  in {
    wants = zfsPools;
    after = zfsPools;
  };

ElvishJerricco · 2023-12-18T23:20:32Z

Yep. That way you don't need the custom timeout in the units = ... stuff.

preisi · 2023-12-27T20:59:38Z

Now, for those having this problem in the non-ZFS case: This means you should set your file system device to /dev/mapper/foo, not /dev/disk/by-whatever/whatever. This will ensure that it only times out when the actual physical device fails to appear, not when you fail to enter the passphrase in time. By depending on anything other than /dev/mapper/foo, you're failing to get the timeout override that makes this all work.

Thank you very much for looking into this issue. However, even with your recommended fix the password prompt times out after 90 seconds (the result being the emergency mode). It seems that mounting the /boot partition is the problem as it is the only remaining file system device still being supplied as /dev/disk/by-uuid/.... (Also, it fails to mount when NixOS tries to keep on booting after decryption after the timeout).
Then again, /boot is not encrypted in my specific case, my setup is rather simple with an unencrypted boot-partition (vfat) + LVM-over-Luks for the remainder (which are supplied via /dev/mapper/...).

Do you have any pointers/ideas on how to further debug this or even a solution at hand?

ElvishJerricco · 2024-03-28T01:42:06Z

@preisi Sorry for taking a while to get back to you.

It seems that mounting the /boot partition is the problem

The /boot partition is not something stage 1 handles. If that's causing delays, it's during stage 2, and it's a separate issue.

nixos-discourse · 2024-05-25T02:06:34Z

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/unlocking-multiple-luks-devices-with-same-passphrase/45856/4

NixOS/nixpkgs#250003 (comment)

devurandom · 2024-10-17T22:20:56Z

Will this make it into 24.11?

ElvishJerricco · 2024-10-17T23:43:45Z

@devurandom What do you mean by "this"? systemd initrd is already available, it's just not enabled by default. We don't have a solution for this issue yet; though I think what we're likely to do is just disable the timeouts altogether by default.

devurandom · 2024-10-18T08:33:22Z

I had read #344920 (comment) and indeed meant "disable the timeouts altogether by default" when I wrote "this". Thanks!

lopsided98 added the 0.kind: bug Something is broken label Aug 18, 2023

Lykos153 added a commit to Lykos153/nixos that referenced this issue Nov 6, 2023

system leilasus: fix systemd timeout for luks

a49e037

See NixOS/nixpkgs#250003

Lykos153 added a commit to Lykos153/nixos that referenced this issue Nov 6, 2023

system leilasus: fix systemd timeout for luks

47d8263

See NixOS/nixpkgs#250003

ElvishJerricco mentioned this issue Apr 2, 2024

boot.initrd.systemd.enable breaks the boot.initrd.network.ssh.authorizedKeys #294032

Closed

ElvishJerricco mentioned this issue Apr 15, 2024

luksroot: Use keyctl over ramfs #273591

Open

JohnRTitor added this to systemd in Stage 1 Jun 21, 2024

JohnRTitor moved this to To Do in systemd in Stage 1 Jun 21, 2024

bartoszwjn added a commit to bartoszwjn/config that referenced this issue Aug 10, 2024

Fix systemd unit timeouts during stage 1

105c6d8

NixOS/nixpkgs#250003 (comment)

ElvishJerricco mentioned this issue Sep 28, 2024

NixOS 24.11 - Feature Freeze & Release Blockers #344920

Closed

40 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

systemd initrd decryption prompt times out after 90 seconds #250003

systemd initrd decryption prompt times out after 90 seconds #250003

lopsided98 commented Aug 18, 2023

surfaceflinger commented Aug 24, 2023 •

edited

Loading

ElvishJerricco commented Sep 19, 2023 •

edited

Loading

lopsided98 commented Sep 19, 2023

SuperSandro2000 commented Dec 16, 2023

ElvishJerricco commented Dec 18, 2023

SuperSandro2000 commented Dec 18, 2023

ElvishJerricco commented Dec 18, 2023

preisi commented Dec 27, 2023

ElvishJerricco commented Mar 28, 2024

nixos-discourse commented May 25, 2024

devurandom commented Oct 17, 2024

ElvishJerricco commented Oct 17, 2024

devurandom commented Oct 18, 2024

systemd initrd decryption prompt times out after 90 seconds #250003

systemd initrd decryption prompt times out after 90 seconds #250003

Comments

lopsided98 commented Aug 18, 2023

Describe the bug

Notify maintainers

surfaceflinger commented Aug 24, 2023 • edited Loading

ElvishJerricco commented Sep 19, 2023 • edited Loading

lopsided98 commented Sep 19, 2023

SuperSandro2000 commented Dec 16, 2023

ElvishJerricco commented Dec 18, 2023

SuperSandro2000 commented Dec 18, 2023

ElvishJerricco commented Dec 18, 2023

preisi commented Dec 27, 2023

ElvishJerricco commented Mar 28, 2024

nixos-discourse commented May 25, 2024

devurandom commented Oct 17, 2024

ElvishJerricco commented Oct 17, 2024

devurandom commented Oct 18, 2024

surfaceflinger commented Aug 24, 2023 •

edited

Loading

ElvishJerricco commented Sep 19, 2023 •

edited

Loading