Nvmeof_Attempt* variables are only effective on warm boot (not on cold boot) #11

LennySzubowicz · 2023-03-16T15:52:45Z

The Nvmeof_Attemptconfig and NvmeofGlobalData variables that are created by NvmeOfCli.efi are persistent EFI boot-time variables. However, it appears that they are only acted upon by the Nvme-oF/TCP driver stack on a warm reset and not on a cold boot of the EFI system.

The persistent Nvmeof_Attempt variables should be effective on both warm and cold boots.

LennySzubowicz · 2023-03-17T03:28:04Z

To clarify, by "cold boot" I mean of start of a qemu process on the host os to create a vm running ovmf with a previously set up ovmf vars file.

In contrast, by "warm boot" I mean invoking the "reset" command from the EFI shell with no other qualifiers. From EFI's point of view, that might be a "cold reset." The ovmf firmware needs to reboot. But it's doing so in the context of the same qemu process that it was running in before.

Steps to reproduce were:

Boot the target vm and run start-tcp-target.sh
Boot the host vm and use the EFI Boot Device Selection (BDS) menu to boot to the efi shell and allow it to run the built startup.nsh, which runs nvmeofcli.efi with a crafted config file.
rename the startup.nsh file so it won't get invoked again
using dmpstore -b -all nvme* observe the presence of the expected EFI persistent (NV+BS) nvme of attempt variables
EFI shell reset command
On the reboot use EFI BDS to boot to the EFI shell again.
Observe that the nvmeof namespace device is now present along with fs1:
dmpstore -b -all nvme* shows nvme attempt variables are still there
you could now exit from the EFI shell back to EFI BDS and boot Fedora from the boot entry for the nvmeof device

The above is all good and demonstrates the working case.

quit out of the host-vm, i.e. shut it down
restart the host vm and use BDS to boot the efi shell
observe the continued presence of the expected nvme attempt variables.
But the nvmeof device is not present.

This demonstrates the problem. If one now goes back to step 5, then the nvmeof boot of Fedora works. But this reset step should not be necessary if the nvmeof attempt variables were previously defined and are still present.

LennySzubowicz · 2023-03-17T03:56:47Z

The bootlog from: -debugcon file:bootlog -global isa-debugcon.iobase=0x402

Attempt variables are already defined from a prior boot.
First a cold boot, stopping in EFI shell. Then reset (evidence of that at around line 2690). The boot after reset got all the way into grub before I interrupted it with ESC and exited to the EFI shell:

bootlog-cold-and-reset.txt

The output of devices, drivers after cold boot to EFI shell:

devices-cold.txt
drivers-cold.txt

The output of devices, drivers, and map after reset to EFI shell:

devices-reset.txt
drivers-reset.txt
map-reset.txt

Douglas-Farley · 2023-03-20T13:57:25Z

The persistent Nvmeof_Attempt variables should be effective on both warm and cold boots.

We attempted to previously fix this with masking | EFI_VARIABLE_NON_VOLATILE but I suspect that didn't quite cover this case.

Douglas-Farley · 2023-03-22T12:39:20Z

@Ajay-Khadolia / @swamy-kadaba - Have you observed this by chance?

amit-jain9 · 2023-03-24T11:15:22Z

The bootlog from: -debugcon file:bootlog -global isa-debugcon.iobase=0x402

Attempt variables are already defined from a prior boot. First a cold boot, stopping in EFI shell. Then reset (evidence of that at around line 2690). The boot after reset got all the way into grub before I interrupted it with ESC and exited to the EFI shell:

bootlog-cold-and-reset.txt

The output of devices, drivers after cold boot to EFI shell:

devices-cold.txt drivers-cold.txt

The output of devices, drivers, and map after reset to EFI shell:

devices-reset.txt drivers-reset.txt map-reset.txt

From the attached logs for cold boot it looks like, a connection to the target is attempted by the NVMe-oF driver. The socket connection looks to be aborted due to a network transmit failure. The logs are as below:

Line no 1929 to 1934 i.e., before reset at line no 2690:

Probe/Connect NQN: nqn.2014-08.org.nvmexpress:uuid:0c468c4d-a385-47e0-8299-6e95051277db
NVMeOFLog:892:spdk_nvme_probe:NVMe target address: 192.168.101.20
NVMeOFLog:1530:spdk_nvme_probe_async:trid trtype 3
NVMeOFLog:807:nvme_probe_internal:trid trstring TCP
NVMeOFLog:148:nvme_transport_ctrlr_scan:trid trstring TCP
Attaching to 192.168.101.20

Line no 2007 to 2014:

TcpTxCallback: Tx error reported: No mapping
NVMeOFLog:228:edk_sock_connect:TcpIoConnect error: 21
NVMeOFLog:1750:nvme_tcp_ctrlr_connect_qpair:sock connection error of tqpair=7E0CA018 with addr=192.168.101.20, port=4420
NVMeOFLog:1863:nvme_tcp_ctrlr_construct:failed to connect admin qpair
NVMeOFLog:674:nvme_ctrlr_probe:Failed to construct NVMe controller for SSD: 192.168.101.20
NVMeOFLog:818:nvme_probe_internal:NVMe ctrlr scan failed
NVMeOFLog:896:spdk_nvme_probe:Create probe context failed
spdk_nvme_probe() failed for 192.168.101.20

Does this happen always when we try to do a cold boot?
We run Qemu on ubuntu machine to boot to UEFI shell and are unable to reproduce this across resets or multiple invocations of the Qemu.
We have not tested this using a host vm to boot to UEFI shell directly, instead we run Qemu and then boot to UEFI shell.

Douglas-Farley · 2023-03-24T11:19:36Z

hi @amit-jain9

Does this happen always when we try to do a cold boot?

yes, during the Timberland call yesterday it was reported both Redhat and SUSE POCs from this repo were experiencing this - roughly with the pattern:

cold boot to startup.nsh -> set config -> verify connections -> power off -> cold boot -> doesnt create connections -> warm reset -> reads vars and creates connections

Ajay-Khadolia · 2023-04-11T10:57:16Z

We run Qemu on ubuntu machine using below command:

sudo qemu-system-x86_64 --bios bios/OVMF.fd -m 8G -netdev tap,id=mynet0,ifname=tap1,script=no --device virtio-net-pci,netdev=mynet0,id=tap1,mac=52:54:00:12:34:56,romfile=empty.rom -drive file=file.qcow2 -cpu host -debugcon file:debug.log -global isa-debugcon.iobase=0x402 -enable-kvm

we have tested following scenario:

QEMU -> Machine -> options -> reset
Kill the QEMU session and restarted
reset command from QEMU command line
Restarted the ubuntu VM

Attaching the logs for reference.
We are unable to reproduce using these scenarios. Please suggest.

Douglas-Farley added the bug Something isn't working label Mar 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nvmeof_Attempt* variables are only effective on warm boot (not on cold boot) #11

Nvmeof_Attempt* variables are only effective on warm boot (not on cold boot) #11

LennySzubowicz commented Mar 16, 2023

LennySzubowicz commented Mar 17, 2023 •

edited

Loading

LennySzubowicz commented Mar 17, 2023

Douglas-Farley commented Mar 20, 2023

Douglas-Farley commented Mar 22, 2023

amit-jain9 commented Mar 24, 2023

Douglas-Farley commented Mar 24, 2023

Ajay-Khadolia commented Apr 11, 2023

Nvmeof_Attempt* variables are only effective on warm boot (not on cold boot) #11

Nvmeof_Attempt* variables are only effective on warm boot (not on cold boot) #11

Comments

LennySzubowicz commented Mar 16, 2023

LennySzubowicz commented Mar 17, 2023 • edited Loading

LennySzubowicz commented Mar 17, 2023

Douglas-Farley commented Mar 20, 2023

Douglas-Farley commented Mar 22, 2023

amit-jain9 commented Mar 24, 2023

Douglas-Farley commented Mar 24, 2023

Ajay-Khadolia commented Apr 11, 2023

LennySzubowicz commented Mar 17, 2023 •

edited

Loading