Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NBFT parser SSNS flags #2215

Merged
merged 6 commits into from
Mar 19, 2024
Merged

NBFT parser SSNS flags #2215

merged 6 commits into from
Mar 19, 2024

Conversation

tbzatek
Copy link
Contributor

@tbzatek tbzatek commented Feb 14, 2024

This adds support for the newly introduced unavailable SSNS flag. Certain firmware (UEFI) implementations may include SSNS records in the ACPI NBFT table that failed to connect and optionally provide a detailed reason (TP-8029, not subject to this pull request).

In userspace I thought it might be beneficial to do another connection attempt, with expectation of failure. Such failure should not be reported nor should fail the whole connection process (nvme connect-all --nbft). Failure messages would still be reported in verbose mode.

As currently designed, three connection attempts are made: the pre-OS/UEFI phase, dracut and another try via the nvmf-connect-nbft.service systemd unit. Between these attempts networking may change, new routes added, tunnels established, etc.

Needs linux-nvme/libnvme#781 and linux-nvme/libnvme#782

@tbzatek
Copy link
Contributor Author

tbzatek commented Feb 14, 2024

Note that connection failures are still being reported by the kernel, resulting in some confusing clutter on the console:

dracut

@tbzatek tbzatek force-pushed the nbft-flags-1 branch 2 times, most recently from e896130 to 056cd4b Compare February 19, 2024 15:34
@tbzatek tbzatek force-pushed the nbft-flags-1 branch 3 times, most recently from 75b8410 to 432758b Compare March 8, 2024 17:10
@tbzatek tbzatek marked this pull request as ready for review March 8, 2024 17:15
@tbzatek tbzatek changed the title [WIP] NBFT parser SSNS flags NBFT parser SSNS flags Mar 8, 2024
@tbzatek
Copy link
Contributor Author

tbzatek commented Mar 8, 2024

Ready for review. The second commit is slightly controversial since it creates a dummy nvme_root_t but I find it useful for exposing debugging messages from the libnvme NBFT parser. Although we may ask customers for a binary copy of the ACPI NBFT table and analyze it privately, it may be used as a quick verification in the field.

Cc: @mwilck

Copy link
Contributor

@mwilck mwilck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with one minor nit

nbft.c Outdated

/*
* In case this SSNS is marked as 'unavailable',
* make another connection attempt.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean "do not make another connection attempt"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this is for discussion, for now all my changes mimic the existing behaviour, i.e. connecting to anything defined. With TP8029 in place we can decide in which circumstances a connection attempt should be or not be made.

Another question outlined in #2219 is about making two-phase connection attempts where unavailable records will only get attempted to be reconnected sometime after switchroot, e.g. via a systemd unit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the SSNS record is unavailable that's because the primary boot phase set it that way. I think unavailable records should be retried by nvme-cli.

Yes, we want to make another connection attempt.

Copy link
Contributor

@mwilck mwilck Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tbzatek, I am not suggesting changes to your code.

I was just referring to the comment: "make another connection attempt", but will execute a continue and will not make another attempt if the unavailable flag is set, unless I'm misreading the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry 😮, right, the comment was slightly confusing, reworded.

Copy link
Contributor

@johnmeneghini johnmeneghini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit can be squashed into the previous commit.

@igaw
Copy link
Collaborator

igaw commented Mar 19, 2024

just a rebase to include the libnvme changes.

@igaw
Copy link
Collaborator

igaw commented Mar 19, 2024

@tbzatek The compilers are not happy. It seems these versions are not able to track dependencies correctly. Maybe initialized the to a default value?

tbzatek added 6 commits March 19, 2024 15:22
New 'unavailable' and 'discovered' flags.
JSON only for the moment.

Signed-off-by: Tomas Bzatek <tbzatek@redhat.com>
Useful for debugging the libnvme NBFT parser.
Specify '-vv' for debugging messages.

Signed-off-by: Tomas Bzatek <tbzatek@redhat.com>
nvme-cli logging offers multiple levels now.

Signed-off-by: Tomas Bzatek <tbzatek@redhat.com>
In case a SSNS is marked as 'unavailable' by the firmware,
let's give it another try but silence any errors.

Signed-off-by: Tomas Bzatek <tbzatek@redhat.com>
For unavailable SSNSs, pause logging to make connection
attempts truly silent.

Kernel errors are still being logged, e.g.
  nvme nvme8: failed to connect socket: -110

Signed-off-by: Tomas Bzatek <tbzatek@redhat.com>
In case of complex NBFT tables, multiple messages printed out
start to get confusing. Better to print SSNS indexes for easier
identification in further debugging.

Signed-off-by: Tomas Bzatek <tbzatek@redhat.com>
@tbzatek
Copy link
Contributor Author

tbzatek commented Mar 19, 2024

@tbzatek The compilers are not happy. It seems these versions are not able to track dependencies correctly. Maybe initialized the to a default value?

Hmm, this is not the first time important compiler warnings were not reported on my side 😠 --> #2253

Anyway, fixed.

@igaw igaw merged commit 89372aa into linux-nvme:master Mar 19, 2024
16 checks passed
@igaw
Copy link
Collaborator

igaw commented Mar 19, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants