-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure RDMA service loads modules in initrd #1481
Conversation
Ensure that the rdma-load-modules@.service is started as part of the initrd. Fixes: 39fa824 ("redhat: add udev/systemd/etc infrastructure bits") Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Ensure that the rdma-load-modules@.service is started as part of the initrd. Fixes: 7752410 ("suse: fix dracut support") Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
rdma-load-modules@.service run in the initrd. However, it gets terminated when initrd-cleanup.service isolates for initrd-switch-root.target. The termination can occur in the middle of the IPoIB initialization, leading to a failure to load netdevices. Include 'Before=initrd.target' to ensure that the services are not being killed when initrd-cleanup.service isolates to initrd-switch-root.target. Kernel log: workqueue: Failed to create a rescuer kthread for wq "ipoib_wq": -EINTR Cleaning Up and Shutting Down Daemons ib0: failed to allocate device WQ mlx5_0: failed to initialize device: ib0 port 1 (ret = -12) mlx5_0: couldn't register ipoib port 1; error -12 workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR ibp6s0f1, 1: ipoib_intf_alloc failed -12 workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR ibp6s0f2, 1: ipoib_intf_alloc failed -12 workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR ibp6s0f3, 1: ipoib_intf_alloc failed -12 Stopped Load RDMA modules …/rdma/modules/infiniband.conf Stopped Load RDMA modules …m /etc/rdma/modules/rdma.conf Fixes: 2f4fb9f ("Common infrastructure for auto loading rdma modules") Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Can i ask why are you running this service in initrd? |
We run this service in initrd to load RDMA modules early in the boot process. It prevents issues like failed NFS mounts over RDMA due to missing modules. |
I talked to Chuck and he told me that NFS module autoloading works fine, even checked it out for me. So why is it failing in the initrd? |
The rdma service is terminated when initrd-cleanup.service isolates for initrd-switch-root.target because initrd.target lacks the dependency on the rdma service. |
Does this actually work out of the box? I fill like SUSE/RH spec file (at least) are missing a call to dracut somewhere? |
I don't think so, we saw this failure in our regressions where we run dracut anyway. |
Fixed an issue where the RDMA service was killed during switch to root. Added a wants symlink for initrd.target in the dracut arrangement and Before=initrd.target to the systemd service to ensure it runs and completes during initrd.