-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abort on restart after maximum resets (Linux w/ mismatched restart request/restart type) #2127
Comments
Response from cfs-community mailing list:
|
I was able to confirm this one fairly simply (no code change required!):
The first order problem here is that CFE was started in processor reset mode, which did NOT concur with the boot record which requested a poweron reset. This is simply because it was launched with This seems like a questionable design decision -- somewhat like the age-old idiom of trying the exact same thing over again but expecting a different result. The PSP is did not adhere to the restart type, why would a different result be anticipated by calling My recommendation is that this attempt to restart again should be removed. The fact is that CFE is running, just with the wrong reset type. That is arguably better (and more recoverable by an operator) than a system that has gotten into a boot loop and fails to start at all. Instead, it should just be noted through event reporting that the system started with the wrong reset type. |
If we end up in a processor reset loop, perhaps due to some third party "fault protection" type application, I would like to see something make that POR attempt. Perhaps this decision should be deferred to that same third party application, but it was my understanding that CFE_PLATFORM_ES_MAX_PROCESSOR_RESETS was intended to manage this particular functionality. |
The PR/PO reset logic is really outside the scope of CFE, handled by whatever scripts/tools provide the system integration. In the case of "pc-linux", if the CFS is started at boot, this would rely on the wrapper/init script (e.g. systemd unit) doing the right thing - that is, if it is systemd-based then do a full system restart for "poweron" or just restart the CFE service for "processor" reset. This should then pass the right option to the However when just running on a desktop/command line, none of that actually happens. The PSP will just do the steps according to the passed in To summarize though, YES there should be something in a wrapper/startup script that ensures CFE gets started with But if that isn't there or isn't working, having the possibility of a reboot loop doesn't seem wise. |
Works for me. |
Describe the bug
After exceeding the maximum number of unplanned resets allowed per CFE_PLATFORM_ES_MAX_PROCESSOR_RESETS, the system attempts to perform a POR instead of a PROCESSOR reset. Unfortunately this orderly reset fails due to an apparent deadlock and the system eventually times out and calls Abort.
Note that this does not occur when using CFE_ES_ResetCFE, only with CFE_PSP_Restart(CFE_PSP_RST_TYPE_PROCESSOR).
To Reproduce
Steps to reproduce the behavior:
Modify any app to call CFE_PSP_Restart(CFE_PSP_RST_TYPE_PROCESSOR) on command
Issue the command to trigger the restart and then re-spawn the executable
Repeat the restart until the system falls back to a POR reset
Expected behavior
Expect a clean POR restart without the 10 second timeout and abort
Code snips
System observed on:
Intel i7-10870H
64 GB RAM
Linux -------- 5.18.10-76051810-generic #202207071639
165725231021.10~7d5e891 SMP PREEMPT_DYNAMIC Fri J x86_64 x86_64 x86_64 GNU/LinuxLatest cFS distribution as of July 28, 2022, modification to sample_app to call CFE_PSP_Restart.
Additional context
Stack Trace from running threads at the time of the abort
Reporter Info
Lorn Miller
Red Canyon Engineering & Software
The text was updated successfully, but these errors were encountered: