Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New dracut 95nbft module #2620

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

tbzatek
Copy link
Contributor

@tbzatek tbzatek commented Dec 13, 2024

Cc: @mwilck
-- Please check the included README.md first

This started as an experiment from an idea coming from our inhouse dracut and systemd developers. I kinda like the result even though there are many loose ends - read on. There's an unpublished internal plan to slim dracut down in the mid-term and while I can't speak of the details one major argument was something like "dracut was born before systemd" and "systemd nowadays can take over many tasks dracut is doing". Add embedded, automotive and otherwise resource-restricted environments and the push for minimization grows further.

This experiment was about creating specially crafted systemd units injected at the right place with as little dracut involvement as possible. I admit all this work may speak itself a little of a NIH syndrome, still I hope it becomes useful.

There are some loose ends here:

  • This is all based around NetworkManager with the hope to spark an interest of other distributors adding support for more network management frameworks.
  • Need to find a way how to play nice with the existing 95nvmf dracut module. Haven't approached dracut upstream yet.
  • Interface renaming is the thing. Originally planned to be part of the NetworkManager NBFT patch but NM folks wanted to keep their initramfs clean and minimal (for a good reason). So this landed as an extra nbft plugin command, at least for now. We can move this somewhere else as the only entry point is the systemd service here. Also the commandline arguments and help strings may need some tweaks. It was strongly suggested to use systemd link files instead of udev rules for interface renaming. However during my experiments with VLANs this appears to have some drawbacks, so this is still undecided and may opt for udev rules instead. TBD.
  • I have proposed slightly different nbft interface naming, geared towards predictable naming scheme.
  • Nothing is set in stone yet - the design, file and function naming can change.
  • As NetworkManager is a new consumer of the libnvme nbft API, we need to be more careful with future API additions and changes.

Needs #2614 (already merged).
This work also gives answer for #2179 and includes the files we've kept downstream so far.

And some more wild ideas:

  • There is a growing interest to split the libnvme NBFT parser in a separate, independent library (just like the -mi one) that would be small and suitable for initramfs, or statically linked. One suggestion was to have native udev support for nbft interface naming within builtin-net_id.c however the parser dependencies were a problem. The ACPI NBFT table is not a fixed-size struct and is not so trivial to parse if somebody wanted to reimplement the parser just to extract the HFI descriptors.

Intended to create udev network link files (see systemd.link(5))
from ACPI NBFT tables present in the system, typically called
during early boot process.

A semi-predictable interface naming is proposed to accurately
identify HFI that the interface is set up for. The syntax is as
follows: nbftXhY where X is the NBFT table index (filename, defaults
to 0) and Y is the HFI record index.

Signed-off-by: Tomas Bzatek <tbzatek@redhat.com>
This is a minimal dracut module providing Boot from NVMe/TCP
functionality. Consisting of two simple systemd units to set up
networking and perform actual connections afterwards.

Requires the NetworkManager nm-initrd-generator NBFT support.

Signed-off-by: Tomas Bzatek <tbzatek@redhat.com>
In case the `nvme connect-all --nbft` call fails, respawn the service
after 10 seconds and try again. Depends on nvme-cli to return non-zero
status in case of any of the SSNS records that are not marked
as 'unavailable' fails to connect.

The respawn cycle is broken by stopping the unit once rootfs
is mounted and system switches the root.

Signed-off-by: Tomas Bzatek <tbzatek@redhat.com>
@tbzatek tbzatek force-pushed the nbft-boot-refactor-1 branch from 5d23b62 to 3fdb4b9 Compare December 13, 2024 16:34
@mwilck
Copy link
Contributor

mwilck commented Dec 13, 2024

Very interesting. The timing is kind of unfortunate for me. I'll not be able to do an in-depth review until the beginning of next year.

Some quick questions / remarks. I've only glanced about this quickly, so I hope I'm not getting it all wrong.

  • Why do you submit this here rather than for dracut itself?
  • Am I assuming correctly that this module can't coexist with the existing dracut 95nvmf module?
  • Given that this is nvme-cli upstream, I find it questionable to include code that's solely focused on NetworkManager. The current dracut module is agnostic of the network management tool. I would prefer an implementation-agnostic approach here in nvme-cli, too. Simply putting the responsibility for that on "other distributions" isn't enough. Perhaps you could think of some sort of API that can be served by the nm nbft plugin, but alternatively by other plugins, too. Note the "wicked" already has an nbft extension, which is a simple shell script that prints interface configuration in an XML format that wicked understands.
  • I'm not too happy about the new interface naming scheme, given that we already introduced nbft$N. I'm not sure how important it is to have predictability. As you said yourself in the README, NBFT variables can change between reboots, by design. What matters is that the NVMe targets are found, and that it's obvious that the related interfaces have NBFT (aka firmware) origin, not necessarily which HFI they correspond to. There should at least be an option to fall back to the previous convention.
  • About "interface renaming is the thing" – was anything wrong with the way this was done in the existing dracut module?

@mwilck
Copy link
Contributor

mwilck commented Dec 13, 2024

In my mind, the current "API" of nvme-cli wrt NBFT is the JSON format. I am aware that that's subobtimal if you want to build a minimal initramfs, as it requires either a JSON library or something like jq, which is rather big. Still, discuss carefully if we want replace this by something else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants