-
Notifications
You must be signed in to change notification settings - Fork 3k
Fix timing issues found in "Flash - clock and cache test" #4666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix timing issues found in "Flash - clock and cache test" #4666
Conversation
ARMCC seemed to be inlining time_cpu_cycles() but with a different number of clock cycles in the loop, GCC worked fine.
Does IAR compile ? I would assume attribute is not supported there.
Can you share assembly how it differs ? What exactly is causing this, if you can summarize it here regarding this test. |
Quoting @theotherjimmy
I haven't checked over the assembly after this change.
No, but it doesn't appear to be to do with me...
|
What about Regarding failures with IAR, what version are you using, 7.8 should be OK (earlier versions had some bugs) |
@chrissnow Could you get it to compile on IAR? |
@c1728p9 Could you take a look? |
The safest way to handle this would probably be to re-write this into assermbly. Something like this should work: #ifdef __CC_ARM
MBED_FORCENOINLINE
__asm static void delay_loop(uint32_t count)
{
1
SUBS a1, a1, #1
BCS %BT1
BX lr
}
#elif defined (__ICCARM__)
MBED_FORCENOINLINE
static void delay_loop(uint32_t count)
{
__asm volatile(
"loop: \n"
" SUBS %0, %0, #1 \n"
" BCS.n loop\n"
: "+r" (count)
:
: "cc"
);
}
#else // GCC
MBED_FORCENOINLINE
static void delay_loop(uint32_t count)
{
__asm__ volatile (
"%=:\n\t"
#if defined(__thumb__) && !defined(__thumb2__)
"SUB %0, #1\n\t"
#else
"SUBS %0, %0, #1\n\t"
#endif
"BCS %=b\n\t"
: "+l" (count)
:
: "cc"
);
}
#endif You'll also need to add MBED_FORCENOINLINE or something similar to mbed_toolchain.h. For IAR you should be able to use this syntax: #define MBED_FORCEINLINE _Pragma("inline=never") |
@0xc0170 Maybe you missed my point. The LOOP is 1 cycle off, making every pass through it 1 cycle more than it was before. This corresponds to 9 cycles taken for every 8 seen previously, making the ratio between after/before 1.125. That's 12.5% difference, not 1 cycle. |
That would tie up roughly with the results of the test. |
Yeah I think your test was about 12.495 % off. So close that the difference can be chalked up to measurement error. |
Given we don't care how long the test actually takes so long as the way it is executed doesn't affect it do we need to go down the assembly route? I will sort out the IAR build with the MBED_FORCENOINLINE suggestion. |
@chrissnow The advantage of assembly would be to remove the store instruction. Removing the store might eliminate another cause of difference: different stack depth at the call site. This would affect execution time on a target where different RAM banks had different speeds, and the stack was on the border of two continuous banks. It's a small possibility, but then again so was the different ROM location bug. |
I have implemented the above suggestion and it seems to work nicely,
Which is 12.5 PPM, well within the 1000 allowed. |
@0xc0170 I was using 7.2, using 8 I still have issues
Which I can find no definition of anywhere. edit:
|
@chrissnow We use 7.8, so IAR 8 may not work. |
@@ -61,8 +102,7 @@ static int time_cpu_cycles(uint32_t cycles) | |||
int timer_start = timer.read_us(); | |||
|
|||
volatile uint32_t delay = (volatile uint32_t)cycles; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not need volatile anymore
@theotherjimmy I couldn't find 7.8 so used the trial of 8. |
@chrissnow The version of IAR Embedded workbench that has EWARM 7.8 is Embedded Workbench 7.5, I think. |
/morph test |
Result: SUCCESSYour command has finished executing! Here's what you wrote!
OutputAll builds and test passed! |
@c1728p9 Happy with the update ( to use inline assembly delay loop as proposed above) ? |
This looks good to me! Thanks for the changes @chrissnow |
MBED_NOINLINE | ||
__asm static void delay_loop(uint32_t count) | ||
{ | ||
1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got a question regarding this 1, is it required? I came across this code today, and wondering if this is line is needed?
Description
ARMCC seemed to be inlining time_cpu_cycles() but with a different number of clock cycles in the loop, GCC worked fine.
Status
READY
Migrations
NO
Related PRs
#4640
Steps to test or reproduce
Outline the steps to test or reproduce the PR here.