Skip to content

STM32L4: HardFault in Idle Thread Possibly Caused by #7813 #8343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mattbrown015 opened this issue Oct 8, 2018 · 13 comments
Closed

STM32L4: HardFault in Idle Thread Possibly Caused by #7813 #8343

mattbrown015 opened this issue Oct 8, 2018 · 13 comments

Comments

@mattbrown015
Copy link
Contributor

mattbrown015 commented Oct 8, 2018

Description

I tried to update to the latest master this morning and was immediately greeted by a HardFault. I haven't had any of these recently so I'm almost sure it is because of an mbed-os change.

At present I believe it is STM32L4: Fix sleep implementation #7813 that is causing the problem.

Here's the actual fault report although I'm not sure it helps. The thread is the idle thread.

++ MbedOS Fault Handler ++

FaultType: HardFault

Context:
R0   : 4D11B537
R1   : 40000000
R2   : 00000004
R3   : 00000000
R4   : 00004000
R5   : 40021000
R6   : 10000000
R7   : 00000000
R8   : 00000000
R9   : 00000000
R10  : 00000000
R11  : 00000000
R12  : 007F7F7F
SP   : 20004BF8
LR   : 08014D2B
PC   : 08014D2E
xPSR : 01000000
PSP  : 20004BD8
MSP  : 2000FFC0
CPUID: 410FC241
HFSR : 40000000
MMFSR: 00000000
BFSR : 00000082
UFSR : 00000000
DFSR : 00000000
AFSR : 00000000
BFAR : 4D11B56F
Mode : Thread
Priv : Privileged
Stack: PSP

-- MbedOS Fault Handler --

++ MbedOS Error Info ++
Error Status: 0x80FF013D Code: 317 Module: 255
Error Message: Fault exception
Location: 0x8010AEB
Error Value: 0x8014D2E
Current Thread: Id: 0x2000473C Entry: 0x8010CED StackSize: 0x200 StackMem: 0x20004A80 SP: 0x2000FF58
For more info, visit: https://armmbed.github.io/mbedos-error/?error=0x80FF013D
-- MbedOS Error Info --

Of course the hard fault is not completely repeatable, it doesn't always happen, it can happen at different points in the application. Although, I believe if it is going to happen the app doesn't run for long.

I created a branch from #6f338f8 and didn't see the hard fault after a few attempts. I appreciate this is not proof that this version is OK.

I also created a branch from #232543a and I saw the hard fault after a few resets. I agree this change looks good but I'm almost certain something is going wrong!

We're not doing anything with the run mode in our app. I.e. we should only be in the sleep and deepsleep modes selected by mbed-os.

I'm not sure if it is relevant but I'm using the LPTICKER in RTC mode; I might try switching to LPTIM to see if that makes the hard fault go away.

I've tried a few other things like increasing the idle thread stack size but didn't find anything that helped.

Issue request type

[ ] Question
[ ] Enhancement
[X] Bug

@mattbrown015
Copy link
Contributor Author

Sometimes it just locks up. I.e. no hard fault but I think it's spinning in a loop.

If I prevent attempting to select low-power sleep mode by changing HAL_PWR_EnterSLEEPMode(PWR_LOWPOWERREGULATOR_ON, PWR_SLEEPENTRY_WFI) to HAL_PWR_EnterSLEEPMode(PWR_MAINREGULATOR_ON, PWR_SLEEPENTRY_WFI) my problems seem to go away.

This doesn't make much sense to me as I don't understand why low-power mode, PWR->CR1 & PWR_CR1_LPR, is ever selected.

It's not because I'm using the LPTICKER in RTC mode, I tried changing to LPTIM and continued to get problems.

@mattbrown015
Copy link
Contributor Author

I missed some important pieces of information...

Since 27th Sept we've been using #37654e5 without any problems.

We're building with ARM GCC 7.3.1.

@LMESTM
Copy link
Contributor

LMESTM commented Oct 9, 2018

@mattbrown015 out of curiosity - aren't you using CPU HEAP statistics as well ?

@mattbrown015
Copy link
Contributor Author

Hi @LMESTM,

I'm not sure exactly what you mean because the range, type and implementation of the various statistics has changed since we first started. In other words, I think I enabled the stats that were available when we started circa 5.9.1.

mbed_app.json looks like this:

       ,"MBED_HEAP_STATS_ENABLED=1"
       ,"MBED_STACK_STATS_ENABLED=1"
       ,"MBED_TICKLESS"
       ,"MBED_TRACE_MAX_LEVEL=TRACE_LEVEL_DEBUG"

if that answers your question.

@LMESTM
Copy link
Contributor

LMESTM commented Oct 9, 2018

So I'm not sure at all, but you may try out the fix from #8013 , at least for your L4 target and see if that helps at all ?

@jeromecoutant
Copy link
Collaborator

Hi
For me there is a bug in the #7813 fix...

it should be:

if (lowPowerMode) {
-    HAL_PWR_EnterSLEEPMode(PWR_MAINREGULATOR_ON, PWR_SLEEPENTRY_WFI);
+    HAL_PWR_EnterSLEEPMode(PWR_LOWPOWERREGULATOR_ON, PWR_SLEEPENTRY_WFI);
} else {
-    HAL_PWR_EnterSLEEPMode(PWR_LOWPOWERREGULATOR_ON, PWR_SLEEPENTRY_WFI);
+    HAL_PWR_EnterSLEEPMode(PWR_MAINREGULATOR_ON, PWR_SLEEPENTRY_WFI);
}

What do you think ?

@MSiglreithmaierRB

@MSiglreithmaierRB
Copy link
Contributor

@jeromecoutant but then we are hitting again that mentioned code path which caused the issue in our code (spin-loop for exiting LPR mode). I'll try to get access to a device again and try to reproduce the hardfault in our application.

@MSiglreithmaierRB
Copy link
Contributor

I tried running it several times now in our application using the patch in the PR (using mbed 5.8 atm) but I'm not able to reproduce it here.

@mattbrown015
Copy link
Contributor Author

I've been doing something else for a couple of days but now I'm back on this!

I tried again with the latest master i.e. #6d7b655 which has moved on a bit but the issue is still present.

There's definitely something wrong. I had a few goes and the app always hung until I applied this patch (i.e. what @jeromecoutant said):

diff --git a/targets/TARGET_STM/sleep.c b/targets/TARGET_STM/sleep.c
index 0d71fcc93..99dc8a621 100644
--- a/targets/TARGET_STM/sleep.c
+++ b/targets/TARGET_STM/sleep.c
@@ -163,9 +163,9 @@ void hal_sleep(void)
     // 	LPR: When this bit is set, the regulator is switched from main mode (MR) to low-power mode (LPR).
     int lowPowerMode = PWR->CR1 & PWR_CR1_LPR;
     if (lowPowerMode) {
-        HAL_PWR_EnterSLEEPMode(PWR_MAINREGULATOR_ON, PWR_SLEEPENTRY_WFI);
-    } else {
         HAL_PWR_EnterSLEEPMode(PWR_LOWPOWERREGULATOR_ON, PWR_SLEEPENTRY_WFI);
+    } else {
+        HAL_PWR_EnterSLEEPMode(PWR_MAINREGULATOR_ON, PWR_SLEEPENTRY_WFI);
     }
 #else
     HAL_PWR_EnterSLEEPMode(PWR_MAINREGULATOR_ON, PWR_SLEEPENTRY_WFI);

I'm trying to think what could be different about what I'm doing in comparison to what @MSiglreithmaierRB is doing...

I am using tickless but this change is in hal_sleep so for some reason the app must be sleeping but with deepsleep locked when the app gets hung.

@MSiglreithmaierRB mentioned mbed-os 5.8, that seems like a long time ago. I'm bang up-to-date and a great deal has changed since mbed-os 5.8, there could easily be a reason why hal_sleep isn't being called in the same way.

@LMESTM
Copy link
Contributor

LMESTM commented Oct 12, 2018

So you've tested with #8013 as you're on top of master, right?
I understand that you have a custom target, have you applied the same fix as in #8013 or do you inherit from it (depends on your target) ?
Last question / idea for today: have you tried to use tools/debug_tools/crash_log_parser/crash_log_parser.py
to better locate the hard fault, and see if it's always the same point which is hit ?

@mattbrown015
Copy link
Contributor Author

I've updated our trunk from to SD - Add required header file and namespace element instead add all #8006 which is the PR before STM32L4: Fix sleep implementation #7813. I've done a clean build, installed it on 3 of our devices, it's been running for a couple of hours now and I've done numerous resets.

As far as I'm concerned, everything is OK with #8006 (#6f338f8).

So you've tested with #8013 as you're on top of master, right?
I understand that you have a custom target, have you applied the same fix as in #8013 or do you inherit from it (depends on your target) ?

I tried the latest master earlier which would have included #8013. Our custom target inherits from NUCLEO_L432KC and we're using the linker script inherited from NUCLEO_L432KC.

Just now I did a slightly different test.

I applied #8013 to #8006. The .data and .bss sections did move a bit and I ran it to prove this is OK. Then I applied the change from #7813 and immediately got the hard fault.

Last question / idea for today: have you tried to use tools/debug_tools/crash_log_parser/crash_log_parser.py
to better locate the hard fault, and see if it's always the same point which is hit ?

I've tried crash_log_parser.py with one hard fault:

python crash_log_parser.py crash.log
ELF or MAP file missing, logging raw values.

Crash Info:
        Crash location = <unknown-symbol> [0x0801548A] (based on PC value)
        Caller location = <unknown-symbol> [0x0801541F] (based on LR value)
        Stack Pointer at the time of crash = [20004C50]
        Target and Fault Info:
                Processor Arch: ARM-V7M or above
                Processor Variant: C24
                Forced exception, a fault with configurable priority has been escalated to HardFault
                Unaligned access error has occurred

I thought it was interesting that it was an unaligned access but I'm not sure what I was expecting.

I didn't get on very well with crash_log_parser.py. I tried passing the elf file but it never seemed to recognise it; it either continued to say "ELF or MAP file missing" or threw a Python file error.

The build output doesn't seem to include a .map file. I tried specifying the .csv or .json files but that didn't seem to work.

I tried building with the debug profile but ran out of flash!

I wonder if crash_log_parser.py is unhappy because I'm using the wrong compiler i.e. ARM GCC V7.x??

@LMESTM
Copy link
Contributor

LMESTM commented Oct 15, 2018

I applied #8013 to #8006. The .data and .bss sections did move a bit and I ran it to prove this is OK. Then I applied the change from #7813 and immediately got the hard fault.

Ok so this clearly needs to be fixed. Thanks. Let's move on with @jeromecoutant proposal

I wonder if crash_log_parser.py is unhappy because I'm using the wrong compiler i.e. ARM GCC V7.x??

@0xc0170 any idea ?

@kjbracey
Copy link
Contributor

The build output doesn't seem to include a .map file

As of recent build tools, it comes out as "xxx.map.old", presumably due to a last-build-stage rename to support the memory-change output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants