Make HAL & US tickers IRQ and idle safe #4768

betzw · 2017-07-18T08:09:21Z

Description

The new RTX5 seems to call hal_sleep() with interrupts enabled, therefore we need to protect this function against interrupt to guarantee that a thread reschedule does not leave the system with HAL ticks suspended.

Status

READY

Migrations

NO

People

@bcostm @adustm @LMESTM

0xc0170 · 2017-07-18T08:15:39Z

targets/TARGET_STM/sleep.c

@@ -40,17 +40,25 @@ extern void HAL_ResumeTick(void);

 void hal_sleep(void)
 {
+    // Disable IRQs
+    core_util_critical_section_enter();


Wouldn't this be a case for all platforms, thus sleep and deepsleep in upper layer should include enter/exit critical section ?

Presumably yes, but it depends on what these platforms are doing in hal_sleep(). If they are calling just a __WFI and have no other constraints, then there is no need for a critical section.

In my experience (although that's mainly on ARM-A), any WFI-type sleep function should always be called with interrupts disabled.

Otherwise there's a race condition - you may have been idle, which is why you were going into sleep, but as you start running the sleep routine, an interrupt wakes a thread, but then sleep goes ahead and does WFI, so we block until the next interrupt, and the woken thread doesn't run immediately.

If interrupts are disabled all the way from the "are we idle" check to the "WFI", then the sleep routine will correctly return immediately due to a pending interrupt.

So I think 0xc0170 is right - hal_sleep() should probably always be entered with interrupts disabled. (And they should have been disabled before checking if we were idle).

So I think 0xc0170 is right - hal_sleep() should probably always be entered with interrupts
disabled. (And they should have been disabled before checking if we were idle).

Unfortunatley, in this moment hal_sleep() does not get called with interrupts already disabled! Not sure if this is a choice or a bug, up to you to decide.
Anyway, as it is right now, mbed-os-5.5 is not working correctly for non-debug builds, at least for STM-NUCLEO platforms.

Useless to say, that this is a critical/blocking situation.

Can you confirm this is a change in behaviour since the previous RTX? And why the distinction between debug and non-debug builds?

We'll need someone more familiar with the RTX port to comment. My analysis above was assuming a dedicated "idle" handler in an OS scheduler - it may be somewhat different if hal_sleep is just called as "normal" code in a minimum-priority idle thread.

I can confirm that since mbed-os-5.5 I am facing this issue, which I could resolve with the patch of this PR. I never checked RTX (neither v4 nor v5) source code to have a real confirmation for my assumption, though.

I further dare to confirm that:

hal_sleep is just called as "normal" code in a minimum-priority idle thread

And the debug/non-debug distinction is that hal_sleep is not called at all in debug builds, for whatever reason. (Something to do with breaking semihosting mentioned in comments?)

Given that it's just called in a continuous loop from a normal minimum-priority thread, I believe that the correct pattern would be for hal_sleep to either be a simple WFE with interrupts enabled, else a more complete shutdown within a critical section using WFI - as per this patch.

So I think entry with interrupts enabled isn't necessarily wrong, but it feels like there's a clear trap for porters here. Doubly so given that hal_sleep isn't called in debug builds.

I still don't know what the change since 5.4 is though.

Okay, change since 5.4 identified.

"release" build profile has always slept, "debug" build profile has never slept.

The behaviour of the "develop" (ie default) build profile changed between 5.4 and 5.5. In 5.4 it did not sleep - it was switched on NDEBUG. In 5.5 it does sleep - it is switched on MBED_DEBUG.

I've discussed the sleep a bit with @bulislaw, I think my previous comment is correct, so I'll recheck this and probably approve.

bulislaw

LGTM

0xc0170 · 2017-07-18T14:29:36Z

targets/TARGET_STM/us_ticker_32b.c

    return TIM_MST->CNT;
 }

 void us_ticker_set_interrupt(timestamp_t timestamp)
 {
-    TimMasterHandle.Instance = TIM_MST;
+    /* Disable global IRQs */
+    core_util_critical_section_enter();


The sleep addition is now clear. why is it then here to this us ticker implementation? The commit message specifies sleep, so why us ticker requires these changes?

I assume that set interrupt is already called within critical section, or not?

not lp ticker also ? different ticker implementation that does not require it?

Please refer to the diffs under Files changed, there has been an update.

Well, I believe that us_ticker_set_interrupt() might also be called not from interrupt context and out of a critical section, e.g. see here.

Furthermore, I believe that us_ticker_set_interrupt() might also be called not from interrupt context and out of a critical section, e.g. see here.

Calling also __HAL_TIM_DISABLE_IT(&TimMasterHandle, TIM_IT_CC1) and __HAL_TIM_ENABLE_IT(&TimMasterHandle, TIM_IT_CC1) seems to be redundant, though.

Well, I believe that us_ticker_set_interrupt() might also be called not from interrupt context and out of a critical section, e.g. see here.

HAL is not thread safe, thus a protection should happen above. As it is in mbed ticker implementation. This should be stated in tickers HAL API.

Calling also __HAL_TIM_DISABLE_IT(&TimMasterHandle, TIM_IT_CC1) and __HAL_TIM_ENABLE_IT(&TimMasterHandle, TIM_IT_CC1) seems to be redundant, though.

If we keep critical section then yes.

Please refer to the diffs under Files changed, there has been an update.

I see now the update. It would be helpful to provide more information in the commit messages about changes (why we adding critical section to us ticker implementation , why to hal_sleep/deepsleep).

My problem with __HAL_TIM_DISABLE_IT() and __HAL_TIM_DISABLE_IT() is, that these calls are neither IRQ- nor thread-safe ... 😳
So we need to find a common synchronization scheme which gets used everywhere these macros get used.

@betzw There is no need to add synchronization mechanism at that level. It is acknowledge that the function us_ticker_set_interrupt is not thread safe and not irq safe.

However, synchronization is handled by the generic ticker API whenever events are inserted or removed.

If it is guaranteed that us_ticker_set_interrupt() gets always called with IRQs globally disabled, than we can obviously remove the calls to core_util_critical_section_enter() & core_util_critical_section_exit().
Maybe we should add an MBED_ASSERT checking for IRQs being globally disabled at the beginning of the function!

betzw · 2017-07-19T07:47:05Z

@nikapov-ST

0xc0170 · 2017-07-19T17:18:07Z

@c1728p9 Please review

LMESTM

Ok with the content - thanks @betzw .
I'd prefer to see 2 commits, one related to sleep management. the other one for protecting TIMERs registers access in us_ticker. Also 16b files also need the same update and I'd like to have the changes in the same PR so that we can start non-reg testing here.

LMESTM · 2017-07-20T06:49:55Z

targets/TARGET_STM/us_ticker_32b.c

-    TimMasterHandle.Instance = TIM_MST;
-
-    HAL_InitTick(0); // The passed value is not used
+    /* NOTE: assuming that HAL tick has already been initialized! */


This was not the case for a long time, we need to be sure that mbed_sdk_init is called before C++ objects creations happens, otherwise a timer/ticker object creation would fail. @0xc0170 I think that the above assumption is now ensured on all toolchains thanks to the recent rework on boot sequence - do you confirm ?

I think that the above assumption is now ensured on all toolchains thanks to the recent rework on boot sequence - do you confirm ?

Yes, that is correct. As you are referring to mbed OS 5 boot sequence.

mbed 2 should also be fine. We shall add a test for this to be certain that sdk is called prior C++ ctor

We shall run MBED2 tests with this change and we will see if this fails. - this will be part of non-reg when we think the PR is ready for testing

betzw · 2017-07-20T07:11:14Z

@LMESTM pls. feel free to improve and extend my PRs or use them as inspiration for other PRs however you prefer.
In general, I prefer to send out PRs for the things I discover (in parts of the system which are not of my competence) just because these might indicate more clearly a possible way to solve issues.

P.S.: Maybe, before merging derived or modified PRs give me the chance to review them.

LMESTM · 2017-07-20T07:29:32Z

@betzw in the current case your PR is already there and looks good, if you could simply extend it with 16b files modifications then we would be happy to handle the testing of it. that would save time overall. If not I'll do the 16b changes later, but I can't now ... :-(

betzw · 2017-07-20T07:34:39Z

Let's maybe first wait for (above all) @0xc0170 to agree upon how the modifications should look like and than extend these to 16b.
@LMESTM, as these things are in general under the responsibility of MCD, I would prefer you to pick up my suggestions and either approve those or perform the necessary modifications before approving them. What do you think?

LMESTM · 2017-07-20T07:42:20Z

@betzw just to avoid us doing things twice and gain on overall efforts - of course I'll take over from where you stop

c1728p9

PR looks good to me. The critical section in sleep.c can be removed in the future once this is fixed in the common layer.

betzw · 2017-07-21T06:34:37Z

Well, this is very interesting news, indeed!
Pls. this time let us know when this or other things like this happen ...

betzw · 2017-07-21T06:38:22Z

@pan- & @0xc0170
Any advice on how to proceed regarding us_ticker_set_interrupt()? This also because I plan to extend these modfications also to the 16b us timer.

0xc0170 · 2017-07-21T08:30:01Z

Any advice on how to proceed regarding us_ticker_set_interrupt()? This also because I plan to extend these modfications also to the 16b us timer.

Let's add a note to the us ticker/lp ticker API for setting the interrupt that is not thread/IRQ safe, and caller should be aware. As mentioned above, this is known and how it is used.
Then we can remove those critical section in this patch. Sounds good?

betzw · 2017-07-21T12:26:20Z

OK, I have updated the PR with the outcome of the discussion here.
Hope I didn't miss or misunderstand anything.

…ications to 16bit timer Note: This is the result of discussions on GitHub with Martin Kojtal, Vincent Coubard, et al. (see ARMmbed#4768)

LMESTM

Agree with the changes overall.
Just would prefer to keep __HAL_TIM_CLEAR_FLAG usage. (and small "!" typo)

LMESTM · 2017-07-21T12:28:38Z

targets/TARGET_STM/us_ticker_16b.c

 }

-#endif // TIM_MST_16BIT
+#endif // !TIM_MST_16BIT


why this "!" change here ?

LMESTM · 2017-07-21T12:38:31Z

targets/TARGET_STM/us_ticker_16b.c

-    TimMasterHandle.Instance = TIM_MST;
-    __HAL_TIM_CLEAR_FLAG(&TimMasterHandle, TIM_FLAG_CC1);
+    core_util_critical_section_enter();
+    __HAL_TIM_CLEAR_IT(&TimMasterHandle, TIM_IT_CC1);


Is this MACRO change also required ?
Looking at the code both MACROS actually access to the same Status Register (SR), where it will clear the bit "CC1IF: Capture/Compare 1 interrupt flag" - As the erased bit is a flag, it seems better to actually use __HAL_TIM_CLEAR_FLAG.

Well, it came in just because us_ticker_32b.c uses __HAL_TIM_CLEAR_IT

Ok - better not change it here as this is not related to your change.
As I said, they actually access the same bit as far as I've seen, but I'll change it later for 32b to align code.

Just remember that when you are gonna change it for 32b to revert it again also for 16b.

…ications to 16bit timer Note: This is the result of discussions on GitHub with Martin Kojtal, Vincent Coubard, et al. (see ARMmbed#4768)

0xc0170 · 2017-07-24T11:32:15Z

Thanks @betzw for the update. do we still need the rest of changes in us ticker? Clear/disable interrupts are not currently even used, and if they were, we should not need to have enter/exit critical section in each HAL implementation. Shall we remove, this patch then becomes just hal sleep/deepslep changes

betzw · 2017-07-24T12:34:44Z

Ok for me!

Someone (not me) should just remember to clean up and comment a bit the us/hal ticker stuff regarding IRQ safeness and initialization.

betzw · 2017-07-25T11:29:04Z

See PR #4808 & #4809!

Make HAL & US tickers IRQ and idle safe

3202841

betzw mentioned this pull request Jul 18, 2017

Network problems with 7 or more clients connected to the mbed Device Connector ARMmbed/mbed-os-example-client#266

Closed

0xc0170 reviewed Jul 18, 2017

View reviewed changes

0xc0170 added the needs: review label Jul 18, 2017

kjbracey approved these changes Jul 18, 2017

View reviewed changes

bulislaw approved these changes Jul 18, 2017

View reviewed changes

0xc0170 reviewed Jul 18, 2017

View reviewed changes

betzw mentioned this pull request Jul 19, 2017

Ethernet interface connect fails in Release on Nucleo STM32F #4773

Closed

LMESTM suggested changes Jul 20, 2017

View reviewed changes

c1728p9 approved these changes Jul 20, 2017

View reviewed changes

betzw force-pushed the betzw_wb_idle_github branch from 3de10ba to c6d91c7 Compare July 21, 2017 12:24

Remove critical section in us_ticker_set_interrupt() & extend modif…

a1707a0

…ications to 16bit timer Note: This is the result of discussions on GitHub with Martin Kojtal, Vincent Coubard, et al. (see ARMmbed#4768)

betzw force-pushed the betzw_wb_idle_github branch from c6d91c7 to a1707a0 Compare July 21, 2017 12:35

LMESTM approved these changes Jul 21, 2017

View reviewed changes

theotherjimmy added the needs: work label Jul 24, 2017

theotherjimmy removed the needs: review label Jul 24, 2017

This was referenced Jul 25, 2017

Make HAL & US tickers idle safe #4808

Merged

STM32: Align HAL & US tickers #4809

Merged

betzw closed this Jul 25, 2017

sg- removed the needs: work label Jul 25, 2017

Make HAL & US tickers IRQ and idle safe #4768

Make HAL & US tickers IRQ and idle safe #4768

Uh oh!

Conversation

betzw commented Jul 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Status

Migrations

People

Uh oh!

Choose a reason for hiding this comment

Uh oh!

betzw Jul 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kjbracey Jul 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

betzw Jul 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

betzw Jul 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kjbracey Jul 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bulislaw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

betzw Jul 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

betzw commented Jul 19, 2017

Uh oh!

0xc0170 commented Jul 19, 2017

Uh oh!

LMESTM left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

betzw commented Jul 20, 2017

Uh oh!

LMESTM commented Jul 20, 2017

betzw commented Jul 18, 2017 •

edited

Loading

betzw Jul 18, 2017 •

edited

Loading

kjbracey Jul 18, 2017 •

edited

Loading

betzw Jul 18, 2017 •

edited

Loading

betzw Jul 18, 2017 •

edited

Loading

kjbracey Jul 18, 2017 •

edited

Loading

betzw Jul 19, 2017 •

edited

Loading

betzw commented Jul 20, 2017 •

edited

Loading

betzw commented Jul 21, 2017 •

edited

Loading

betzw commented Jul 21, 2017 •

edited

Loading

LMESTM Jul 21, 2017 •

edited

Loading

betzw Jul 21, 2017 •

edited

Loading