Skip to content

Fix timing issues found in "Flash - clock and cache test" #4666

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 13, 2017

Conversation

chrissnow
Copy link
Contributor

Description

ARMCC seemed to be inlining time_cpu_cycles() but with a different number of clock cycles in the loop, GCC worked fine.

Status

READY

Migrations

NO

Related PRs

#4640

Steps to test or reproduce

Outline the steps to test or reproduce the PR here.

ARMCC seemed to be inlining time_cpu_cycles() but with a different number of clock cycles in the loop, GCC worked fine.
@0xc0170
Copy link
Contributor

0xc0170 commented Jun 29, 2017

Does IAR compile ? I would assume attribute is not supported there.

ARMCC seemed to be inlining time_cpu_cycles() but with a different number of clock cycles in the loop

Can you share assembly how it differs ? What exactly is causing this, if you can summarize it here regarding this test.

@chrissnow
Copy link
Contributor Author

chrissnow commented Jun 29, 2017

Quoting @theotherjimmy

It looks like your inner timing loop is taking 8 cycles the first time, and 9 the second. Looking at the disassembly of my version of that test binary, the timing function time_cpu_cycles is being inlined at both call sites. This may be causing the problem. The patch below prevents the inlining of that critical timing function at each call site:

For reference the inner loop is made up of the following instructions:

 80058f0:       1e88            subs    r0, r1, #2  
 80058f2:       1e89            subs    r1, r1, #2
 80058f4:       9008            str     r0, [sp, #32]
 80058f6:       2900            cmp     r1, #0
 80058f8:       dcfa            bgt.n   80058f0 <_Z15time_cpu_cyclesj+0x34>

2 cycles from the subs + 2 from the str + 1 from the cmp + 1+ P (1-3) for the branch instruction. If P is 2 for most times through the loop when inlined at the flash_init_test call site (8 cycles), and P is 3 for the other call site (9 cycles), then the difference should go away when inlining is removed.

I haven't checked over the assembly after this change.

Does IAR compile ?

No, but it doesn't appear to be to do with me...

  * XDOT_L151CC::IAR::MBED-BUILD
        Building library mbed-build (XDOT_L151CC, IAR)
        Scan: mbed-os-new
        Compile [ 64.5%]: rtx_lib.c
        [Error] rtx_lib.c@50,2: [Pe065]: expected a ";"
        [Error] rtx_lib.c@116,2: [Pe065]: expected a ";"
        [Error] rtx_lib.c@120,2: [Pe065]: expected a ";"
        [Error] rtx_lib.c@164,2: [Pe065]: expected a ";"
        [Error] rtx_lib.c@168,2: [Pe065]: expected a ";"
        [Error] rtx_lib.c@184,2: [Pe065]: expected a ";"
        [Error] rtx_lib.c@188,16: [Pe065]: expected a ";"
        [Error] rtx_lib.c@335,2: [Pe079]: expected a type specifier
        [Error] rtx_lib.c@335,2: [Pe260]: explicit type is missing ("int" assumed)
        [Error] rtx_lib.c@335,2: [Pe141]: unnamed prototyped parameters not allowed when body is present
        [Error] rtx_lib.c@336,4: [Pe130]: expected a "{"
        [Warning] rtx_lib.c@337,4: [Pe174]: expression has no effect
        [Warning] rtx_lib.c@337,4: [Pe174]: expression has no effect
        [Error] rtx_lib.c@354,4: [Pe029]: expected an expression
        [Warning] rtx_lib.c@421,4: [Pe012]: parsing restarts here after previous syntax error
        [Warning] rtx_lib.c@337,4: [Pe174]: expression has no effect
        [Error] rtx_lib.c@421,5: [Pe065]: expected a ";"
        [Error] rtx_lib.c@421,3: [Pe169]: expected a declaration
        [Warning] rtx_lib.c@433,15: [Pe012]: parsing restarts here after previous syntax error
        [Warning] rtx_lib.c@49,30: [Pe177]: variable "os_isr_queue" was declared but never referenced
        [Warning] rtx_lib.c@123,30: [Pe177]: variable "os_idle_thread_attr" was declared but never referenced
        [Warning] rtx_lib.c@171,36: [Pe177]: variable "os_timer_thread_attr" was declared but never referenced
        [Warning] rtx_lib.c@191,0: [Pe177]: variable "os_timer_mq_attr" was declared but never referenced

@0xc0170
Copy link
Contributor

0xc0170 commented Jun 29, 2017

What about #define ALLOWED_DRIFT_PPM 1000 //0.1%, this one is not too strict if one more cpu cycle makes the test fail?

Regarding failures with IAR, what version are you using, 7.8 should be OK (earlier versions had some bugs)

@theotherjimmy
Copy link
Contributor

@chrissnow Could you get it to compile on IAR?

@theotherjimmy
Copy link
Contributor

@c1728p9 Could you take a look?

@c1728p9
Copy link
Contributor

c1728p9 commented Jun 29, 2017

The safest way to handle this would probably be to re-write this into assermbly. Something like this should work:

#ifdef __CC_ARM
MBED_FORCENOINLINE
__asm static void delay_loop(uint32_t count)
{
1
  SUBS a1, a1, #1
  BCS  %BT1
  BX   lr
}
#elif defined (__ICCARM__)
MBED_FORCENOINLINE
static void delay_loop(uint32_t count)
{
  __asm volatile(
    "loop: \n"
    " SUBS %0, %0, #1 \n"
    " BCS.n  loop\n"
    : "+r" (count)
    :
    : "cc"
  );
}
#else // GCC
MBED_FORCENOINLINE
static void delay_loop(uint32_t count)
{
  __asm__ volatile (
    "%=:\n\t"
#if defined(__thumb__) && !defined(__thumb2__)
    "SUB  %0, #1\n\t"
#else
    "SUBS %0, %0, #1\n\t"
#endif
    "BCS  %=b\n\t"
    : "+l" (count)
    :
    : "cc"
  );
}
#endif

You'll also need to add MBED_FORCENOINLINE or something similar to mbed_toolchain.h. For IAR you should be able to use this syntax:

#define MBED_FORCEINLINE _Pragma("inline=never")

@theotherjimmy
Copy link
Contributor

@0xc0170 Maybe you missed my point. The LOOP is 1 cycle off, making every pass through it 1 cycle more than it was before. This corresponds to 9 cycles taken for every 8 seen previously, making the ratio between after/before 1.125. That's 12.5% difference, not 1 cycle.

@chrissnow
Copy link
Contributor Author

That would tie up roughly with the results of the test.

@theotherjimmy
Copy link
Contributor

Yeah I think your test was about 12.495 % off. So close that the difference can be chalked up to measurement error.

@chrissnow
Copy link
Contributor Author

Given we don't care how long the test actually takes so long as the way it is executed doesn't affect it do we need to go down the assembly route?

I will sort out the IAR build with the MBED_FORCENOINLINE suggestion.

@theotherjimmy
Copy link
Contributor

@chrissnow The advantage of assembly would be to remove the store instruction. Removing the store might eliminate another cause of difference: different stack depth at the call site. This would affect execution time on a target where different RAM banks had different speeds, and the stack was on the border of two continuous banks. It's a small possibility, but then again so was the different ROM location bug.

@chrissnow
Copy link
Contributor Author

I have implemented the above suggestion and it seems to work nicely,
I dropped the PPM down to test the results and it's looking good.

FAIL: Values Not Within Delta 0 Expected 157801 Was 157799

Which is 12.5 PPM, well within the 1000 allowed.

@chrissnow
Copy link
Contributor Author

chrissnow commented Jun 30, 2017

@0xc0170 I was using 7.2, using 8 I still have issues

        [Error] mbed_boot.c@571,0: [Pe020]: identifier "_MAX_LOCK" is undefined

Which I can find no definition of anywhere.

edit:
also

        [Error] mbed_retarget.cpp@832,47: [Pe393]: pointer to incomplete class type is not allowed
        [Error] mbed_retarget.cpp@832,15: [Pe393]: pointer to incomplete class type is not allowed
        [Error] mbed_retarget.cpp@833,30: [Pe393]: pointer to incomplete class type is not allowed
        [Error] mbed_retarget.cpp@833,15: [Pe393]: pointer to incomplete class type is not allowed
        [Error] mbed_retarget.cpp@834,30: [Pe393]: pointer to incomplete class type is not allowed
        [Error] mbed_retarget.cpp@834,15: [Pe393]: pointer to incomplete class type is not allowed
        [Error] mbed_retarget.cpp@847,47: [Pe393]: pointer to incomplete class type is not allowed
        [Error] mbed_retarget.cpp@847,15: [Pe393]: pointer to incomplete class type is not allowed
        [Error] mbed_retarget.cpp@848,30: [Pe393]: pointer to incomplete class type is not allowed
        [Error] mbed_retarget.cpp@848,15: [Pe393]: pointer to incomplete class type is not allowed
        [Error] mbed_retarget.cpp@849,30: [Pe393]: pointer to incomplete class type is not allowed
        [Error] mbed_retarget.cpp@849,0: [Pe393]: pointer to incomplete class type is not allowed

@theotherjimmy
Copy link
Contributor

@chrissnow We use 7.8, so IAR 8 may not work.

@@ -61,8 +102,7 @@ static int time_cpu_cycles(uint32_t cycles)
int timer_start = timer.read_us();

volatile uint32_t delay = (volatile uint32_t)cycles;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not need volatile anymore

@chrissnow
Copy link
Contributor Author

@theotherjimmy I couldn't find 7.8 so used the trial of 8.
if you have IAR could you try and build, I'm pretty sure it will be ok as the main at least compiles.

@theotherjimmy
Copy link
Contributor

@chrissnow The version of IAR Embedded workbench that has EWARM 7.8 is Embedded Workbench 7.5, I think.

@theotherjimmy
Copy link
Contributor

/morph test

@mbed-bot
Copy link

Result: SUCCESS

Your command has finished executing! Here's what you wrote!

/morph test

Output

mbed Build Number: 768

All builds and test passed!

@0xc0170
Copy link
Contributor

0xc0170 commented Jul 11, 2017

@c1728p9 Happy with the update ( to use inline assembly delay loop as proposed above) ?

@c1728p9
Copy link
Contributor

c1728p9 commented Jul 11, 2017

This looks good to me! Thanks for the changes @chrissnow

MBED_NOINLINE
__asm static void delay_loop(uint32_t count)
{
1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got a question regarding this 1, is it required? I came across this code today, and wondering if this is line is needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants