STM32L4: HardFault in Idle Thread Possibly Caused by #7813 #8343

mattbrown015 · 2018-10-08T16:27:23Z

Description

I tried to update to the latest master this morning and was immediately greeted by a HardFault. I haven't had any of these recently so I'm almost sure it is because of an mbed-os change.

At present I believe it is STM32L4: Fix sleep implementation #7813 that is causing the problem.

Here's the actual fault report although I'm not sure it helps. The thread is the idle thread.

++ MbedOS Fault Handler ++

FaultType: HardFault

Context:
R0   : 4D11B537
R1   : 40000000
R2   : 00000004
R3   : 00000000
R4   : 00004000
R5   : 40021000
R6   : 10000000
R7   : 00000000
R8   : 00000000
R9   : 00000000
R10  : 00000000
R11  : 00000000
R12  : 007F7F7F
SP   : 20004BF8
LR   : 08014D2B
PC   : 08014D2E
xPSR : 01000000
PSP  : 20004BD8
MSP  : 2000FFC0
CPUID: 410FC241
HFSR : 40000000
MMFSR: 00000000
BFSR : 00000082
UFSR : 00000000
DFSR : 00000000
AFSR : 00000000
BFAR : 4D11B56F
Mode : Thread
Priv : Privileged
Stack: PSP

-- MbedOS Fault Handler --

++ MbedOS Error Info ++
Error Status: 0x80FF013D Code: 317 Module: 255
Error Message: Fault exception
Location: 0x8010AEB
Error Value: 0x8014D2E
Current Thread: Id: 0x2000473C Entry: 0x8010CED StackSize: 0x200 StackMem: 0x20004A80 SP: 0x2000FF58
For more info, visit: https://armmbed.github.io/mbedos-error/?error=0x80FF013D
-- MbedOS Error Info --

Of course the hard fault is not completely repeatable, it doesn't always happen, it can happen at different points in the application. Although, I believe if it is going to happen the app doesn't run for long.

I created a branch from #6f338f8 and didn't see the hard fault after a few attempts. I appreciate this is not proof that this version is OK.

I also created a branch from #232543a and I saw the hard fault after a few resets. I agree this change looks good but I'm almost certain something is going wrong!

We're not doing anything with the run mode in our app. I.e. we should only be in the sleep and deepsleep modes selected by mbed-os.

I'm not sure if it is relevant but I'm using the LPTICKER in RTC mode; I might try switching to LPTIM to see if that makes the hard fault go away.

I've tried a few other things like increasing the idle thread stack size but didn't find anything that helped.

Issue request type

[ ] Question
[ ] Enhancement
[X] Bug

The text was updated successfully, but these errors were encountered:

mattbrown015 · 2018-10-08T16:51:05Z

Sometimes it just locks up. I.e. no hard fault but I think it's spinning in a loop.

If I prevent attempting to select low-power sleep mode by changing HAL_PWR_EnterSLEEPMode(PWR_LOWPOWERREGULATOR_ON, PWR_SLEEPENTRY_WFI) to HAL_PWR_EnterSLEEPMode(PWR_MAINREGULATOR_ON, PWR_SLEEPENTRY_WFI) my problems seem to go away.

This doesn't make much sense to me as I don't understand why low-power mode, PWR->CR1 & PWR_CR1_LPR, is ever selected.

It's not because I'm using the LPTICKER in RTC mode, I tried changing to LPTIM and continued to get problems.

mattbrown015 · 2018-10-09T07:58:40Z

I missed some important pieces of information...

Since 27th Sept we've been using #37654e5 without any problems.

We're building with ARM GCC 7.3.1.

LMESTM · 2018-10-09T08:44:00Z

@mattbrown015 out of curiosity - aren't you using CPU HEAP statistics as well ?

mattbrown015 · 2018-10-09T08:50:53Z

Hi @LMESTM,

I'm not sure exactly what you mean because the range, type and implementation of the various statistics has changed since we first started. In other words, I think I enabled the stats that were available when we started circa 5.9.1.

mbed_app.json looks like this:

       ,"MBED_HEAP_STATS_ENABLED=1"
       ,"MBED_STACK_STATS_ENABLED=1"
       ,"MBED_TICKLESS"
       ,"MBED_TRACE_MAX_LEVEL=TRACE_LEVEL_DEBUG"

if that answers your question.

LMESTM · 2018-10-09T09:01:06Z

So I'm not sure at all, but you may try out the fix from #8013 , at least for your L4 target and see if that helps at all ?

jeromecoutant · 2018-10-09T13:55:09Z

Hi
For me there is a bug in the #7813 fix...

it should be:

if (lowPowerMode) {
-    HAL_PWR_EnterSLEEPMode(PWR_MAINREGULATOR_ON, PWR_SLEEPENTRY_WFI);
+    HAL_PWR_EnterSLEEPMode(PWR_LOWPOWERREGULATOR_ON, PWR_SLEEPENTRY_WFI);
} else {
-    HAL_PWR_EnterSLEEPMode(PWR_LOWPOWERREGULATOR_ON, PWR_SLEEPENTRY_WFI);
+    HAL_PWR_EnterSLEEPMode(PWR_MAINREGULATOR_ON, PWR_SLEEPENTRY_WFI);
}

What do you think ?

@MSiglreithmaierRB

MSiglreithmaierRB · 2018-10-09T14:41:16Z

@jeromecoutant but then we are hitting again that mentioned code path which caused the issue in our code (spin-loop for exiting LPR mode). I'll try to get access to a device again and try to reproduce the hardfault in our application.

MSiglreithmaierRB · 2018-10-11T09:39:54Z

I tried running it several times now in our application using the patch in the PR (using mbed 5.8 atm) but I'm not able to reproduce it here.

mattbrown015 · 2018-10-12T15:26:49Z

I've been doing something else for a couple of days but now I'm back on this!

I tried again with the latest master i.e. #6d7b655 which has moved on a bit but the issue is still present.

There's definitely something wrong. I had a few goes and the app always hung until I applied this patch (i.e. what @jeromecoutant said):

diff --git a/targets/TARGET_STM/sleep.c b/targets/TARGET_STM/sleep.c
index 0d71fcc93..99dc8a621 100644
--- a/targets/TARGET_STM/sleep.c
+++ b/targets/TARGET_STM/sleep.c
@@ -163,9 +163,9 @@ void hal_sleep(void)
     // 	LPR: When this bit is set, the regulator is switched from main mode (MR) to low-power mode (LPR).
     int lowPowerMode = PWR->CR1 & PWR_CR1_LPR;
     if (lowPowerMode) {
-        HAL_PWR_EnterSLEEPMode(PWR_MAINREGULATOR_ON, PWR_SLEEPENTRY_WFI);
-    } else {
         HAL_PWR_EnterSLEEPMode(PWR_LOWPOWERREGULATOR_ON, PWR_SLEEPENTRY_WFI);
+    } else {
+        HAL_PWR_EnterSLEEPMode(PWR_MAINREGULATOR_ON, PWR_SLEEPENTRY_WFI);
     }
 #else
     HAL_PWR_EnterSLEEPMode(PWR_MAINREGULATOR_ON, PWR_SLEEPENTRY_WFI);

I'm trying to think what could be different about what I'm doing in comparison to what @MSiglreithmaierRB is doing...

I am using tickless but this change is in hal_sleep so for some reason the app must be sleeping but with deepsleep locked when the app gets hung.

@MSiglreithmaierRB mentioned mbed-os 5.8, that seems like a long time ago. I'm bang up-to-date and a great deal has changed since mbed-os 5.8, there could easily be a reason why hal_sleep isn't being called in the same way.

LMESTM · 2018-10-12T15:34:44Z

So you've tested with #8013 as you're on top of master, right?
I understand that you have a custom target, have you applied the same fix as in #8013 or do you inherit from it (depends on your target) ?
Last question / idea for today: have you tried to use tools/debug_tools/crash_log_parser/crash_log_parser.py
to better locate the hard fault, and see if it's always the same point which is hit ?

mattbrown015 · 2018-10-12T17:54:04Z

I've updated our trunk from to SD - Add required header file and namespace element instead add all #8006 which is the PR before STM32L4: Fix sleep implementation #7813. I've done a clean build, installed it on 3 of our devices, it's been running for a couple of hours now and I've done numerous resets.

As far as I'm concerned, everything is OK with #8006 (#6f338f8).

So you've tested with #8013 as you're on top of master, right?
I understand that you have a custom target, have you applied the same fix as in #8013 or do you inherit from it (depends on your target) ?

I tried the latest master earlier which would have included #8013. Our custom target inherits from NUCLEO_L432KC and we're using the linker script inherited from NUCLEO_L432KC.

Just now I did a slightly different test.

I applied #8013 to #8006. The .data and .bss sections did move a bit and I ran it to prove this is OK. Then I applied the change from #7813 and immediately got the hard fault.

Last question / idea for today: have you tried to use tools/debug_tools/crash_log_parser/crash_log_parser.py
to better locate the hard fault, and see if it's always the same point which is hit ?

I've tried crash_log_parser.py with one hard fault:

python crash_log_parser.py crash.log
ELF or MAP file missing, logging raw values.

Crash Info:
        Crash location = <unknown-symbol> [0x0801548A] (based on PC value)
        Caller location = <unknown-symbol> [0x0801541F] (based on LR value)
        Stack Pointer at the time of crash = [20004C50]
        Target and Fault Info:
                Processor Arch: ARM-V7M or above
                Processor Variant: C24
                Forced exception, a fault with configurable priority has been escalated to HardFault
                Unaligned access error has occurred

I thought it was interesting that it was an unaligned access but I'm not sure what I was expecting.

I didn't get on very well with crash_log_parser.py. I tried passing the elf file but it never seemed to recognise it; it either continued to say "ELF or MAP file missing" or threw a Python file error.

The build output doesn't seem to include a .map file. I tried specifying the .csv or .json files but that didn't seem to work.

I tried building with the debug profile but ran out of flash!

I wonder if crash_log_parser.py is unhappy because I'm using the wrong compiler i.e. ARM GCC V7.x??

LMESTM · 2018-10-15T07:30:46Z

I applied #8013 to #8006. The .data and .bss sections did move a bit and I ran it to prove this is OK. Then I applied the change from #7813 and immediately got the hard fault.

Ok so this clearly needs to be fixed. Thanks. Let's move on with @jeromecoutant proposal

I wonder if crash_log_parser.py is unhappy because I'm using the wrong compiler i.e. ARM GCC V7.x??

@0xc0170 any idea ?

kjbracey · 2018-10-15T09:58:58Z

The build output doesn't seem to include a .map file

As of recent build tools, it comes out as "xxx.map.old", presumably due to a last-build-stage rename to support the memory-change output.

yennster added type: bug devices: st labels Oct 8, 2018

jeromecoutant mentioned this issue Oct 15, 2018

STM32L4 : sleep issue #8427

Merged

cmonr closed this as completed in #8427 Oct 16, 2018

mattbrown015 mentioned this issue Oct 23, 2018

MBED_ASSERT No Longer Calls mbed_die #8496

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STM32L4: HardFault in Idle Thread Possibly Caused by #7813 #8343

STM32L4: HardFault in Idle Thread Possibly Caused by #7813 #8343

mattbrown015 commented Oct 8, 2018 •

edited

Loading

mattbrown015 commented Oct 8, 2018

mattbrown015 commented Oct 9, 2018

LMESTM commented Oct 9, 2018

mattbrown015 commented Oct 9, 2018

LMESTM commented Oct 9, 2018

jeromecoutant commented Oct 9, 2018

MSiglreithmaierRB commented Oct 9, 2018

MSiglreithmaierRB commented Oct 11, 2018

mattbrown015 commented Oct 12, 2018

LMESTM commented Oct 12, 2018

mattbrown015 commented Oct 12, 2018

LMESTM commented Oct 15, 2018

kjbracey commented Oct 15, 2018

STM32L4: HardFault in Idle Thread Possibly Caused by #7813 #8343

STM32L4: HardFault in Idle Thread Possibly Caused by #7813 #8343

Comments

mattbrown015 commented Oct 8, 2018 • edited Loading

Description

Issue request type

mattbrown015 commented Oct 8, 2018

mattbrown015 commented Oct 9, 2018

LMESTM commented Oct 9, 2018

mattbrown015 commented Oct 9, 2018

LMESTM commented Oct 9, 2018

jeromecoutant commented Oct 9, 2018

MSiglreithmaierRB commented Oct 9, 2018

MSiglreithmaierRB commented Oct 11, 2018

mattbrown015 commented Oct 12, 2018

LMESTM commented Oct 12, 2018

mattbrown015 commented Oct 12, 2018

LMESTM commented Oct 15, 2018

kjbracey commented Oct 15, 2018

mattbrown015 commented Oct 8, 2018 •

edited

Loading