Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop "The ZFS modules are not loaded" after upgrade to Fedora 33 #11128

Closed
Alexandero89 opened this issue Oct 29, 2020 · 8 comments
Closed
Labels
Component: Packaging custom packages

Comments

@Alexandero89
Copy link

Alexandero89 commented Oct 29, 2020

System information

Type Version/Name
Distribution Name Fedora
Distribution Version 33
Linux Kernel 5.8.16
Architecture x86_64
ZFS Version 0.8.5-1
SPL Version 0.8.5-1

Describe the problem you're observing

With Fedora 22 and kernel 5.8.16-200.fc32.x86_64 everything is working fine.
Now i start the upgrade to fedora 33, everything seems to work fine and the system reboots.
Now there is the new fedora 33 kernel 5.8.16-300.fc33.x86_64 and it doesnt boot anymore.

After the Logmessage that has something with "*** BOOTFS***" in it (the logs were too fast to see everything) it gets into an infinite loop of printing this messages:

[    1.844579] dracut-pre-mount[322]: The ZFS modules are not loaded.
[    1.845640] dracut-pre-mount[322]: Try running '/sbin/modprobe zfs' as root to load them.
[    1.844579] dracut-pre-mount[322]: The ZFS modules are not loaded.
[    1.845640] dracut-pre-mount[322]: Try running '/sbin/modprobe zfs' as root to load them.

and so on

I thought its an kernel bug, so i started with the older 5.8.16-200.fc32.x86_64 one, but still the same problem.
I rebuild the initramfs and also checked with lsinitrd that zfs is inside.

So i rollbacked my zfs pool to the version before update, but still this error message.
It only started booting again after rolling back the /boot and /boot/efi partition from before the update

Describe how to reproduce the problem

Install Fedora 22 with root on zfs and upgrade to fedora 33

If you can't reproduce this error i could start the update again and hopefully (?) will run into the same error.
Please just tell me which logs/files you want to have in this situation.

@Alexandero89 Alexandero89 added Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang) labels Oct 29, 2020
@Alexandero89
Copy link
Author

Also found this bug #10854 , that seems to have kind of the same error log

@behlendorf behlendorf added Component: Packaging custom packages and removed Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang) labels Nov 3, 2020
@gregory-lee-bartholomew
Copy link
Contributor

FWIW, I think a workaround for this issue is to add rd.driver.pre=zfs to the list of kernel parameters. I saw a similar error some time ago on an old server and that parameter "fixed" the issue. Also, it might not be needed on a newer kernel. I upgraded my PC to Fedora 33 just a couple of days ago and I did not have to add that parameter. My PC is running kernel 5.8.17-300.fc33.x86_64.

@yougotborked
Copy link

yougotborked commented Nov 17, 2020

I'm seeing the same thing, just running fedora 32, and did a routine dnf upgrade. I only see the infinite log msg loop on 5.8.16. I do not see the msgs on 5.9.8, but I do still see the issue, it just seems to log an infinite loop of "dots"
I tried the listed workaround rd.driver.pre=zfs, and it did not seem to work.

When i run modprobe zfs it works just fine in the dracut env.

I'm able to continue the boot sequence easily by just

zpool import -f root -o altroot=/root
exit

@gregory-lee-bartholomew
Copy link
Contributor

gregory-lee-bartholomew commented Nov 17, 2020

I'm able to continue the boot sequence easily by just

zpool import -f root -o altroot=/root
exit

If you have to add the -f flag, it may be a different issue. It may be that the hostid in the initramfs is inconsistent with the hostid of the root file system or at least the last one that the pool was imported to. You may need to export the file system once to sync things back up again and you may need to rebuild the initramfs to incorporate the new hostid. Store a fixed id in /etc/hostid (e.g. echo -n "bork" > /etc/hostid) and then regenerate all your initramfses to be sure that the hostid stays consistent on the system. Note that /etc/hostid should be exactly 4 bytes in size but otherwise, any random data is fine.

The above is just a guess as to what the problem could be and a suggestion as to how you might go about fixing it.

@yougotborked
Copy link

I just ran a test, and I did not need to use the -f. It imported and mounted fine to /root and after exiting the dracut, continued booting successfully.

@gregory-lee-bartholomew
Copy link
Contributor

I just ran into this problem while applying updates to an old server. The problem appears to be caused by a dkms feature called weak modules. I was able to resolve the problem by doing the following:

  1. Boot the system using the previous kernel.

  2. Run the following script as root to remove the weak modules.

for i in spl icp zavl zfs znvpair zunicode zcommon zlua; do
    find /lib/modules -name $i.ko.xz | /usr/sbin/weak-modules --remove-modules
done
  1. Determine which version of the weak module has been packaged in the faulty initramfs.
# lsinitrd /boot/aa981bf51778461287d3d409502f2e29/5.7.17-200.fc32.x86_64/initrd | grep zfs.ko
-rw-r--r--   1 root     root       776040 May 29 13:35 usr/lib/modules/5.7.16-200.fc32.x86_64/extra/zfs.ko.xz

Note the mismatch between the kernel versions in the above lines.

  1. Use the dkms command to remove the two zfs modules corresponding to the two kernel versions listed in the previous step.
# dkms remove -m zfs -v 0.8.5 -k 5.7.16-200.fc32.x86_64
# dkms remove -m zfs -v 0.8.5 -k 5.7.17-200.fc32.x86_64

Note that you can run rpm -q zfs to determine the zfs version to use in the dkms commands.

  1. Reinstall the zfs module for the kernel version(s) that you need.
# dkms install -m zfs -v 0.8.5 -k 5.7.17-200.fc32.x86_64
  1. Regenerate the problematic initramfs.
# dracut -f /boot/aa981bf51778461287d3d409502f2e29/5.7.17-200.fc32.x86_64/initrd 5.7.17-200.fc32.x86_64
  1. Verify that the version of the zfs module is correct for the given initramfs.
# lsinitrd /boot/aa981bf51778461287d3d409502f2e29/5.7.17-200.fc32.x86_64/initrd | grep zfs.ko
-rw-r--r--   1 root     root       773488 May 29 13:35 usr/lib/modules/5.7.17-200.fc32.x86_64/extra/zfs.ko.xz
  1. Reboot and everything should work properly now.

There may be a better method and not all the above steps may be necessary. This is just what I found worked when I ran into this problem just now.

@yougotborked
Copy link

what I found worked when I ran into this problem just now.

Yes these steps generally worked for me too. It did take a couple tries though on one system for some reason.
I have run into the problem on three separate fc32 systems after doing a routine "dnf upgrade" going from the 5.8.16 to 5.9.10 kernel.

My zfs roots were based on this csparks method https://www.csparks.com/BootFedoraZFS/index.md. Not sure if that matters at all.

@gregory-lee-bartholomew
Copy link
Contributor

Doing a quick search just now, I've found other references to this weak module problem. One of the earlier references I found is here: #9891 I also found the following comment in the dkms man page which appears to indicate how this feature might be disabled:

NO_WEAK_MODULES=
The NO_WEAK_MODULES parameter prevents dkms from creating a symlink into the weak-updates directory, which is the default on Red Hat derivatives. The weak modules facility was designed to eliminate the need to rebuild kernel modules when kernel upgrades occur and relies on the symbols within the kABI.

Fedora does not guaranteed a stable kABI so it should be disabled in the specific module override by setting it to "yes". For example, for an Nvidia DKMS module you would set the following in /etc/dkms/nvidia.conf:

NO_WEAK_MODULES="yes"

behlendorf pushed a commit that referenced this issue Dec 23, 2020
Fedora does not guarantee a stable kABI, so weak modules should be dis-
abled. See the dkms man page for a more detailed explanation of the weak
module feature.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #9891
Closes #11128
Closes #11242
Closes #11335
jsai20 pushed a commit to jsai20/zfs that referenced this issue Mar 30, 2021
Fedora does not guarantee a stable kABI, so weak modules should be dis-
abled. See the dkms man page for a more detailed explanation of the weak
module feature.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes openzfs#9891
Closes openzfs#11128
Closes openzfs#11242
Closes openzfs#11335
sempervictus pushed a commit to sempervictus/zfs that referenced this issue May 31, 2021
Fedora does not guarantee a stable kABI, so weak modules should be dis-
abled. See the dkms man page for a more detailed explanation of the weak
module feature.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes openzfs#9891
Closes openzfs#11128
Closes openzfs#11242
Closes openzfs#11335
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Packaging custom packages
Projects
None yet
Development

No branches or pull requests

4 participants