Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in dracut-zfs (90zfs/zfs-load-key.sh) for keylocation on USB dongle #12065

Closed
sxc731 opened this issue May 17, 2021 · 5 comments
Closed
Labels
Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@sxc731
Copy link

sxc731 commented May 17, 2021

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 21.04
Linux Kernel 5.11.0-17-generic
Architecture amd64
ZFS Version 2.0.2-1ubuntu5
SPL Version 2.0.2-1ubuntu5

Describe the problem you're observing

After upgrading Ubuntu 20.10 to 21.04 (OpenZFS 0.8.x to 2.0.x), a server consistently fails to boot with the following displayed on the console:

[  OK  ]  Finished dracut pre-mount hook.
[    5.8665487] dracut-pre-mount[835]: Key load error: Failed to open key material file
[FAILED] Failed to mount /sysroot
See 'systemctl status sysroot.mount' for details
[DEPEND] Dependency failed for Initrd Root File System.
[DEPEND] Dependency failed for Reload Confirugation from the Real Root.
...
Entering emergency mode.  Exit the shell to continue.

This server loads its decryption key for its ZFS root filesystem from a removable USB dongle (keylocation=file:///dev/disk/by-partlabel/xxxxxx).

Note that this is a pure root on ZFS deployment (with /boot collocated with the rest of the root filesystem), booted from ZFSbootmenu rather than grub. I also use dracut rather than initramfs-tools (the latter is more common on Ubuntu deployments) for ease of ZFSbootmenu mgmt. I don't think any of this has a bearing on the issue here but I include the info in order to avoid any confusion.

Describe how to reproduce the problem

Attempt to boot a system with keylocation on a USB dongle as shown above (following this method). This used to work fine with the previous Ubuntu release but it may just have been luck as there were no major changes in 90zfs/zfs-load-key.sh and the issue appears to be a race-condition per below.

Investigations

Typically, just exiting the emergency shell a few seconds after the failure (typing CTRL-D at the prompt) successfully boots the system. Instrumentation of 90zfs/zfs-load-key.sh (a simple ls /dev/disk/by-partlabel/ before zfs load-key "${ENCRYPTIONROOT}" fails with No such file or directory), suggesting a race condition with udev.

Workaround

Wrap zfs load-key "${ENCRYPTIONROOT}" within a loop; similar to what's done with the interactive password prompt case, eg:

        # if key is stored in a file, do not prompt
        if ! [ "${KEYLOCATION}" = "prompt" ]; then
            for _ in 1 2 3 4 5; do
		echo "Attempting: zfs load-key ${ENCRYPTIONROOT} for '${BOOTFS}' (attempt# $_)..."
                zfs load-key "${ENCRYPTIONROOT}" && break
		sleep 0.5s
	    done
        else

This successfully boots my system (headlessly) at the 2nd attempt (ie: a 0.5s delay).

Other notes

Shouldn't this script fail with an exit code in case of failure with zfs load-key (whatever the reason)?

@sxc731 sxc731 added Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang) labels May 17, 2021
@sxc731
Copy link
Author

sxc731 commented May 17, 2021

PS: happy to contribute pull-request with above workaround if people think it's acceptable.

@nabijaczleweli
Copy link
Contributor

nabijaczleweli commented May 23, 2021

Does udevadm settle create the device link for you? I.e. if you do ls /dev/disk/by-partlabel/; udevadm settle; ls /dev/disk/by-partlabel/ instead of your single ls, do you get "No such file or directory", then the listing?

nabijaczleweli added a commit to nabijaczleweli/zfs that referenced this issue May 23, 2021
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#12065
@nabijaczleweli
Copy link
Contributor

Oh, actually, can you try the fourth patch from #12108 (also referenced above here)? It's mostly a logical extension of your proposed update.

nabijaczleweli added a commit to nabijaczleweli/zfs that referenced this issue May 23, 2021
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#12065
nabijaczleweli added a commit to nabijaczleweli/zfs that referenced this issue May 23, 2021
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#12065
nabijaczleweli added a commit to nabijaczleweli/zfs that referenced this issue May 23, 2021
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#12065
@sxc731
Copy link
Author

sxc731 commented May 24, 2021

Patch tested OK, thank you very much! To answer your earlier question, it looks like udevadm settle isn't sufficient as I'm seeing Waiting for key ${KEYFILE} for ${ENCRYPTIONROOT}... (switched it to warn to ascertain).

And one suggestion: perhaps sleep 1 could be reduced to sleep 0.5s - and the loop extended correspondingly if deemed necessary?

nabijaczleweli added a commit to nabijaczleweli/zfs that referenced this issue May 24, 2021
Also reduce password retries to 3 to match i-t

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#12065
nabijaczleweli added a commit to nabijaczleweli/zfs that referenced this issue May 24, 2021
Also reduce password retries to 3 to match i-t

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#12065
@nabijaczleweli
Copy link
Contributor

nabijaczleweli commented May 24, 2021

Great news, thanks for testing! Just settling being insufficient is odd, but not necessarily unexpected, it'd seem from other reports.

Sure, updated the PR to split the 10 seconds into half-second intervals.

nabijaczleweli added a commit to nabijaczleweli/zfs that referenced this issue Jun 1, 2021
Also reduce password retries to 3 to match i-t

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#12065
rkitover pushed a commit to rkitover/zfs that referenced this issue Jun 2, 2021
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#12065
nabijaczleweli added a commit to nabijaczleweli/zfs that referenced this issue Jun 4, 2021
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#12065
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Feb 10, 2022
Also reduce password retries to 3 to match i-t

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#12065
Closes openzfs#12108
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Feb 14, 2022
Also reduce password retries to 3 to match i-t

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#12065
Closes openzfs#12108
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Feb 16, 2022
Also reduce password retries to 3 to match i-t

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#12065
Closes openzfs#12108
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Feb 17, 2022
Also reduce password retries to 3 to match i-t

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#12065
Closes openzfs#12108
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants