-
Notifications
You must be signed in to change notification settings - Fork 3k
Make HAL & US tickers IRQ and idle safe #4768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -40,17 +40,25 @@ extern void HAL_ResumeTick(void); | |||
|
|||
void hal_sleep(void) | |||
{ | |||
// Disable IRQs | |||
core_util_critical_section_enter(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this be a case for all platforms, thus sleep and deepsleep in upper layer should include enter/exit critical section ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably yes, but it depends on what these platforms are doing in hal_sleep()
. If they are calling just a __WFI
and have no other constraints, then there is no need for a critical section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my experience (although that's mainly on ARM-A), any WFI-type sleep function should always be called with interrupts disabled.
Otherwise there's a race condition - you may have been idle, which is why you were going into sleep, but as you start running the sleep routine, an interrupt wakes a thread, but then sleep goes ahead and does WFI, so we block until the next interrupt, and the woken thread doesn't run immediately.
If interrupts are disabled all the way from the "are we idle" check to the "WFI", then the sleep routine will correctly return immediately due to a pending interrupt.
So I think 0xc0170 is right - hal_sleep() should probably always be entered with interrupts disabled. (And they should have been disabled before checking if we were idle).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think 0xc0170 is right - hal_sleep() should probably always be entered with interrupts
disabled. (And they should have been disabled before checking if we were idle).
Unfortunatley, in this moment hal_sleep()
does not get called with interrupts already disabled! Not sure if this is a choice or a bug, up to you to decide.
Anyway, as it is right now, mbed-os-5.5
is not working correctly for non-debug builds, at least for STM-NUCLEO platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useless to say, that this is a critical/blocking situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm this is a change in behaviour since the previous RTX? And why the distinction between debug and non-debug builds?
We'll need someone more familiar with the RTX port to comment. My analysis above was assuming a dedicated "idle" handler in an OS scheduler - it may be somewhat different if hal_sleep is just called as "normal" code in a minimum-priority idle thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that since mbed-os-5.5
I am facing this issue, which I could resolve with the patch of this PR. I never checked RTX (neither v4 nor v5) source code to have a real confirmation for my assumption, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I further dare to confirm that:
hal_sleep is just called as "normal" code in a minimum-priority idle thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the debug/non-debug distinction is that hal_sleep is not called at all in debug builds, for whatever reason. (Something to do with breaking semihosting mentioned in comments?)
Given that it's just called in a continuous loop from a normal minimum-priority thread, I believe that the correct pattern would be for hal_sleep to either be a simple WFE with interrupts enabled, else a more complete shutdown within a critical section using WFI - as per this patch.
So I think entry with interrupts enabled isn't necessarily wrong, but it feels like there's a clear trap for porters here. Doubly so given that hal_sleep isn't called in debug builds.
I still don't know what the change since 5.4 is though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, change since 5.4 identified.
"release" build profile has always slept, "debug" build profile has never slept.
The behaviour of the "develop" (ie default) build profile changed between 5.4 and 5.5. In 5.4 it did not sleep - it was switched on NDEBUG. In 5.5 it does sleep - it is switched on MBED_DEBUG.
I've discussed the sleep a bit with @bulislaw, I think my previous comment is correct, so I'll recheck this and probably approve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
targets/TARGET_STM/us_ticker_32b.c
Outdated
return TIM_MST->CNT; | ||
} | ||
|
||
void us_ticker_set_interrupt(timestamp_t timestamp) | ||
{ | ||
TimMasterHandle.Instance = TIM_MST; | ||
/* Disable global IRQs */ | ||
core_util_critical_section_enter(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sleep addition is now clear. why is it then here to this us ticker implementation? The commit message specifies sleep, so why us ticker requires these changes?
I assume that set interrupt is already called within critical section, or not?
not lp ticker also ? different ticker implementation that does not require it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to the diffs under Files changed, there has been an update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I believe that us_ticker_set_interrupt()
might also be called not from interrupt context and out of a critical section, e.g. see here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore, I believe that us_ticker_set_interrupt()
might also be called not from interrupt context and out of a critical section, e.g. see here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling also __HAL_TIM_DISABLE_IT(&TimMasterHandle, TIM_IT_CC1)
and __HAL_TIM_ENABLE_IT(&TimMasterHandle, TIM_IT_CC1)
seems to be redundant, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I believe that us_ticker_set_interrupt() might also be called not from interrupt context and out of a critical section, e.g. see here.
HAL is not thread safe, thus a protection should happen above. As it is in mbed ticker implementation. This should be stated in tickers HAL API.
Calling also __HAL_TIM_DISABLE_IT(&TimMasterHandle, TIM_IT_CC1) and __HAL_TIM_ENABLE_IT(&TimMasterHandle, TIM_IT_CC1) seems to be redundant, though.
If we keep critical section then yes.
Please refer to the diffs under Files changed, there has been an update.
I see now the update. It would be helpful to provide more information in the commit messages about changes (why we adding critical section to us ticker implementation , why to hal_sleep/deepsleep).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My problem with __HAL_TIM_DISABLE_IT()
and __HAL_TIM_DISABLE_IT()
is, that these calls are neither IRQ- nor thread-safe ... 😳
So we need to find a common synchronization scheme which gets used everywhere these macros get used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is guaranteed that us_ticker_set_interrupt()
gets always called with IRQs globally disabled, than we can obviously remove the calls to core_util_critical_section_enter()
& core_util_critical_section_exit()
.
Maybe we should add an MBED_ASSERT
checking for IRQs being globally disabled at the beginning of the function!
@c1728p9 Please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok with the content - thanks @betzw .
I'd prefer to see 2 commits, one related to sleep management. the other one for protecting TIMERs registers access in us_ticker. Also 16b files also need the same update and I'd like to have the changes in the same PR so that we can start non-reg testing here.
TimMasterHandle.Instance = TIM_MST; | ||
|
||
HAL_InitTick(0); // The passed value is not used | ||
/* NOTE: assuming that HAL tick has already been initialized! */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was not the case for a long time, we need to be sure that mbed_sdk_init is called before C++ objects creations happens, otherwise a timer/ticker object creation would fail. @0xc0170 I think that the above assumption is now ensured on all toolchains thanks to the recent rework on boot sequence - do you confirm ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the above assumption is now ensured on all toolchains thanks to the recent rework on boot sequence - do you confirm ?
Yes, that is correct. As you are referring to mbed OS 5 boot sequence.
mbed 2 should also be fine. We shall add a test for this to be certain that sdk is called prior C++ ctor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shall run MBED2 tests with this change and we will see if this fails. - this will be part of non-reg when we think the PR is ready for testing
@LMESTM pls. feel free to improve and extend my PRs or use them as inspiration for other PRs however you prefer. P.S.: Maybe, before merging derived or modified PRs give me the chance to review them. |
@betzw in the current case your PR is already there and looks good, if you could simply extend it with 16b files modifications then we would be happy to handle the testing of it. that would save time overall. If not I'll do the 16b changes later, but I can't now ... :-( |
Let's maybe first wait for (above all) @0xc0170 to agree upon how the modifications should look like and than extend these to |
@betzw just to avoid us doing things twice and gain on overall efforts - of course I'll take over from where you stop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR looks good to me. The critical section in sleep.c can be removed in the future once this is fixed in the common layer.
Well, this is very interesting news, indeed! |
Let's add a note to the us ticker/lp ticker API for setting the interrupt that is not thread/IRQ safe, and caller should be aware. As mentioned above, this is known and how it is used. |
3de10ba
to
c6d91c7
Compare
OK, I have updated the PR with the outcome of the discussion here. |
…ications to 16bit timer Note: This is the result of discussions on GitHub with Martin Kojtal, Vincent Coubard, et al. (see ARMmbed#4768)
c6d91c7
to
a1707a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with the changes overall.
Just would prefer to keep __HAL_TIM_CLEAR_FLAG usage. (and small "!" typo)
targets/TARGET_STM/us_ticker_16b.c
Outdated
} | ||
|
||
#endif // TIM_MST_16BIT | ||
#endif // !TIM_MST_16BIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this "!" change here ?
TimMasterHandle.Instance = TIM_MST; | ||
__HAL_TIM_CLEAR_FLAG(&TimMasterHandle, TIM_FLAG_CC1); | ||
core_util_critical_section_enter(); | ||
__HAL_TIM_CLEAR_IT(&TimMasterHandle, TIM_IT_CC1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this MACRO change also required ?
Looking at the code both MACROS actually access to the same Status Register (SR), where it will clear the bit "CC1IF: Capture/Compare 1 interrupt flag" - As the erased bit is a flag, it seems better to actually use __HAL_TIM_CLEAR_FLAG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it came in just because us_ticker_32b.c
uses __HAL_TIM_CLEAR_IT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - better not change it here as this is not related to your change.
As I said, they actually access the same bit as far as I've seen, but I'll change it later for 32b to align code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just remember that when you are gonna change it for 32b to revert it again also for 16b.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
…ications to 16bit timer Note: This is the result of discussions on GitHub with Martin Kojtal, Vincent Coubard, et al. (see ARMmbed#4768)
Thanks @betzw for the update. do we still need the rest of changes in us ticker? Clear/disable interrupts are not currently even used, and if they were, we should not need to have enter/exit critical section in each HAL implementation. Shall we remove, this patch then becomes just hal sleep/deepslep changes |
Ok for me! Someone (not me) should just remember to clean up and comment a bit the us/hal ticker stuff regarding IRQ safeness and initialization. |
Description
The new RTX5 seems to call
hal_sleep()
with interrupts enabled, therefore we need to protect this function against interrupt to guarantee that a thread reschedule does not leave the system with HAL ticks suspended.Status
READY
Migrations
NO
People
@bcostm @adustm @LMESTM