-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change controller loss timeout for NBFT connections? #7
Comments
@johnmeneghini @LennySzubowicz @Douglas-Farley @rahlvers @igaw for information. |
It's not just boot, correct? If the rootfs is on that connection and another connection to that NID hasn't been established rootfs can hang post boot to, right? Conversely, this is only a problem for all-paths down too, correct?
The NBFT can't even discriminate between those that are necessary or not. At best administratively some might have the
Today this seems like the most reasonable solution in my mind. A more complex approach might be:
This is complicated though because it would require pre-processing all the SNSS records and grouping them. In the future, we might need a policy like "infinite timeout for named NID of an attempt" or something convoluted in the future; but attempt data is a EDK UI reference construct not exactly a NBFT-ism. We could propose some longer term code that only assigns the non-bootable flag to namespaces that are not directly referenced (assuming an attempt defines a namespace NID only). cc: @charles-rose |
Yes, of course. Sorry for not stating that more clearly.
Right, I was considering a single-path setup here. For multipath, you want to set
I think that initially it'd be ok to assume that all NBFT defined connections are necessary. But the initramfs may also contain
Indeed. If we want to go down that route, I think it should be done in |
For completeness, in the SUSE bug there was also the idea to add a field for the timeout in the NBFT itself. But IMHO we don't want to do that. |
Unfortunately, there is a lot of complexity involved with all these different timeouts. The nvme subsystem doesn't As far I understood the question here, it's about the |
True, but what else can we do if the controller is necessary for accessing the root FS? In the multipath case, it's suboptimal, if there are other healthy paths. But it is out of scope for us here in the Timberland project to discuss generic timeout issues for NVMe-multipath. |
Obviously, I wanted to say
The error case handling depends on the requirements. Is the system expected retry for ever? Without a timeout the system will behave the same as with a NFS mounted root filesystem. If this is the expectation I don't know of any other way to achieve this but disabling the |
To avoid confusion (/me had been confused): For NVMe over TCP (and RDMA), |
Right. And I think that that's what most users would expect. If a timeout strikes, the root FS will see IO errors and go read-only and depending on system settings, the system may halt, reboot, or even panic. It's worth discussing whether the behavior with timeout is actually more reasonable than the system just getting stalled and unresponsive. This is another question that goes beyond the scope of the Timberland group. But it makes me realize that we should not hard-code an infinite timeout. Some users will actually prefer something different, even without multipath. My current approach for SUSE was to use the existing Users who want a finite timeout even for controllers that support the root FS could add their own udev rules for changing the timeout after the system came up. I think that's sufficient flexibility. |
Please look at the BTG TPAR 8029 then with any objections, because it is proposing that actually. |
This is certainly what comes to mind to me when I think of a single path only system failure or a all paths down failure on the root fs.
I'm agreed here. I view our role in Timberland and from BTG to help the ecosystem align on a common standard way to converge the administrative domain from pre-OS to OS interop. I think handling of namespaces attached as part of that can be something we help make a better experience for via standardization but choosing the "what is the right default policy for xyz app" is out of our hands here. |
@Douglas-Farley: I re-read the paragraph about the "non-bootable entry flag" in the NVMe boot spec. The wording in the spec says that if this flag is |
The typical pre-OS driver doesn't know much about a namespace, only if it contains a ESP. We could add in logic via UEFI Hii for administrative flags to in turn influence the NBFT, but we don't have a lot of options yet. Certainly a space for NVME.org to add features if we can clearly define them and their utility. |
Problem1
If a NVMe connection to a subsystem is unavailable during boot for more than 10 minutes, the kernel will cease doing connect attempts after the default timeout of 600s (
ctrl_loss_tmo
). This will cause booting to fail, even if the subsystem becomes online later.Should we enforce an infinite timeout for NBFT-specified subsystem connections? If yes, how should it be done?
nvme connect-all
supports the--ctrl-loss-tmo
option, which can be used to override the setting manually. We could change our dracut code to use this option by default2. dracut could use the standardrd.timeout
option3 to make sure that the NVMe timeout and the user-defined timeout are consistent.connect-all --nbft --ctrl-loss-tmo=$X
is not available.connect-all --nbft
is invoked? Or should we rather require that the user explicitly uses--ctrl-loss-tmo
?Summary of discussion (WIP)4
rd.timeout
for this. Do we need a separate parameter?discovery.conf
, but that's a minor issue, because we'll normally just callconnect-all --nbft
from dracut.Non-bootable Entry flag
could be leveraged to determine whether a controller is "required". To be clarified whether that would work, and how. Probably the flag can only be used from insidenvme connect-nbft
; else the dracut logic would become exceedingly complex.--ctrl-loss-tmo
will apply to both IO controllers and discovery controllers. For the initial connect attempt, it makes sense to use the same value, but at runtime, an infinite timeout for a discovery controller seems weird. Minor issue.Footnotes
this problem was originallreported to SUSE by Dell in private bug 1211080. ↩
the
connect-nbft
command did not support this option. ↩rd.timeout
defaults to inifinity. ↩I will try to maintain a summary here, lest people must read the entire therad. ↩
The text was updated successfully, but these errors were encountered: