Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New 90overlayfs module does not set up the overlay in three cases where it happened before (breaks Fedora/RHEL installer image boot) #2232

Open
AdamWill opened this issue Feb 23, 2023 · 9 comments · Fixed by #2233
Labels
bug Our bugs dmsquash-live Issues related to the dmsquash-live module regression

Comments

@AdamWill
Copy link
Contributor

Describe the bug
In dracut 058/059, with commit 8caaad4 and follow-up 40dd5c9 , overlayfs setup was moved from being done in-line at the end of dmsquash-live-root.sh to being part of a separate module. There were four codepaths in which overlayfs setup was reached in dmsquash-live-root.sh, but with the move to a module, only one of those paths now results in overlayfs setup happening. On the other three paths, it no longer happens, and boot fails.

Distribution used
Fedora 38 and Rawhide.

Dracut version
058 and 059.

Init system
systemd.

To Reproduce
Boot a Fedora installer image built with dracut 058/059 without the offending commits reverted. They were reverted in https://koji.fedoraproject.org/koji/buildinfo?buildID=2157455 and we caught this before it reached a nightly compose, so you would have to build your own image using the older dracut-059-1 build; the images built by openQA (which caught this bug) have been garbage-collected now. Affected images get stuck in a loop, showing the error "mount: /sysroot: special device LiveOS_rootfs does not exist."

Expected behavior
The image should boot normally.

Additional context
I did a detailed root cause of the problem in the downstream bug. Basically, if you look at dmsquash-live-root.sh, the overlayfs setup previously happened near the end of the file, any time the variable $overlayfs was a non-zero-length string. There were four paths on which that happened: one where it was set to "yes" (when rd.live.overlay.overlayfs is on the cmdline), and three where it was set to "required" (two inside the block that starts if [ -z "$setup" -a -n "$devspec" -a -n "$pathspec" -a -n "$overlay" ]; then, and one in the block that starts if [ -e /run/initramfs/live/${live_dir}/${squash_image} ]; then).

When the overlayfs setup was moved to a module, only the first of these paths was respected: the new module only actually does the overlayfs setup if the cmdline parameter set. In all other cases it exits after doing nothing. So on those three other paths, the overlayfs is not set up and boot will likely fail. Fedora's installer images go down the squashfs path, I don't know in what situations the other two paths are hit.

For now in Fedora we have reverted the relevant commits. @lnykryn suggested having dmsquash-live-root.sh do echo "rd.live.overlay.overlayfs=1" > /etc/cmdline.d/dracut-need-overlay.conf on the affected paths, which...I guess could work, but seems very hacky. I was kinda assuming there must be a better way to do this, some canonical way for such a script to signal to a module that its action is required?

@AdamWill
Copy link
Contributor Author

CC @LaszloGombos @aafeijoo-suse

@LaszloGombos
Copy link
Collaborator

LaszloGombos commented Feb 23, 2023

Thanks @AdamWill . Plan to to take a look ASAP.

CC @FGrose @Conan-Kudo

@LaszloGombos
Copy link
Collaborator

LaszloGombos commented Feb 23, 2023

@AdamWill

I understand this is a regression and non-compat change, but I expect the installer already sets a lot of command line argument.

One way forward would be for the installer to set "rd.live.overlay.overlayfs" if it intends to use the overlayfs module.

On a second look it seems installer already sets in it in live.py in certain conditions. Why only on certain conditions ?

From https://man7.org/linux/man-pages/man7/dracut.cmdline.7.html

rd.live.overlay.overlayfs=1 - Enables the use of the OverlayFS kernel module

@AdamWill
Copy link
Contributor Author

It could do that, sure, but as you say, this was a regression, not backwards compatible, and was not communicated as such in the release notes, so it broke our images unexpectedly and I had to spend a day trawling through the code to figure out why.

If you are for some reason heavily opposed to making this work the way it used to we would probably change our images to do that, but then who knows what other person/project was relying on the affected codepaths and will have to go through the same process? It doesn't seem like it should be impossible to make it work like it always has, so, why not do it?

@LaszloGombos
Copy link
Collaborator

LaszloGombos commented Feb 23, 2023

@AdamWill

If you are for some reason heavily opposed

No I am not opposed.

I do not want to go against the documentation mainly. The doc does seem to require this argument, even if that was not the case before. Here is a commit from 2017 on where this doc and when this option was added.

I just want to discuss and consider how much work we want to do for a case that is not in the documentation.

Also I want to future proof the Fedora/RHEL installer even if we address the compatibility issue. You might agree thats a good idea not to rely on a feature thats used a bit differently that the documentation seem to require.

FGrose added a commit to FGrose/dracut that referenced this issue Feb 27, 2023
Override a missing or unneeded rd.live.overlay.overlayfs parameter.

Fixes dracutdevs#2232 ~regression on OverlayFS setup~.
Replaces dracutdevs#2233 'restore compatibility...'.
Follow-up to dracutdevs#1934 add overlayfs module.

Override a missing or an unneeded rd.live.overlay.overlayfs parameter.

Do this by employing a shell xor mask ($xor_overlayfs) on the
overlayfs variable.  This method is also used with
dmsquash-generator.sh to adjust behaviour on systemctl daemon-reload.

The /etc/cmdline.d/*.conf method prepends all values to those in
/proc/cmdline and thus cannot override an unneeded or an erroneous
kernel command line parameter.

This method allows boot configuration based on image content as well
as configuration file and command line content.
FGrose added a commit to FGrose/dracut that referenced this issue Feb 27, 2023
Override a missing or an unneeded rd.live.overlay.overlayfs parameter.

Fixes dracutdevs#2232 (a regression on OverlayFS setup).
Replaces dracutdevs#2233 'restore compatibility...'.
Follow-up to dracutdevs#1934 add new module overlayfs.

Do this by employing a shell xor mask ($xor_overlayfs) on the
overlayfs variable.  This method is also used with
dmsquash-generator.sh to adjust behaviour on systemctl daemon-reload.

The /etc/cmdline.d/*.conf method prepends all values to those in
/proc/cmdline and thus cannot override an unneeded or an erroneous
kernel command line parameter.

This method allows boot configuration based on image content as well
as configuration file and command line content.
FGrose added a commit to FGrose/dracut that referenced this issue Feb 27, 2023
Override a missing or an unneeded rd.live.overlay.overlayfs parameter.

Fixes dracutdevs#2232 (a regression on OverlayFS setup).
Replaces dracutdevs#2233 'restore compatibility...'.
Follow-up to dracutdevs#1934 add new module overlayfs.

Do this by employing a shell xor mask ($xor_overlayfs) on the
overlayfs variable.  This method is also used with
dmsquash-generator.sh to adjust behaviour on systemctl daemon-reload.

The /etc/cmdline.d/*.conf method prepends all values to those in
/proc/cmdline and thus cannot override an unneeded or an erroneous
kernel command line parameter.

This method allows boot configuration based on image content as well
as configuration file and command line content.
@FGrose
Copy link
Contributor

FGrose commented Feb 27, 2023

See dracut.cmdline.7.asc at line 1219 https://github.com/dracutdevs/dracut/blob/master/man/dracut.cmdline.7.asc#L1219

If a persistent overlay is detected at the standard LiveOS path, the overlay &
overlay type detected, whether OverlayFS or Device-mapper, will be used.

The implications of image content driven configuration may be some of the confusion.

Since ea28824 from PR #107, at least, the system boot can be configured by the image content in addition to configuration file and kernel command line content.

@LaszloGombos
Copy link
Collaborator

LaszloGombos commented Feb 28, 2023

The implications of image content driven configuration may be some of the confusion.

I find it very confusing that the documentation seems to suggest that if overlay image is already present and it is not Device-mapper type, but rd.live.overlay.overlayfs is not set, than OverlayFS would be used anyways.

This means that dracut is already auto discovering what is the best overlay to use and yet the documentation seems to suggest that rd.live.overlay.overlayfs=1 has to be set to enable OverlayFS.

I think command line should have preference over image content driven configuration when there is a conflict between the two.

@LaszloGombos LaszloGombos added the dmsquash-live Issues related to the dmsquash-live module label Feb 28, 2023
@aafeijoo-suse
Copy link
Member

I think command line should have preference over image content driven configuration when there is a conflict between the two.

I agree with that.

@LaszloGombos
Copy link
Collaborator

Reopen to discuss updating documentation to reflect the latest upstream code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Our bugs dmsquash-live Issues related to the dmsquash-live module regression
Projects
None yet
4 participants