Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The code is optimized when use operator= for DigitalOut #863

Closed
leibin2014 opened this issue Jan 26, 2015 · 18 comments
Closed

The code is optimized when use operator= for DigitalOut #863

leibin2014 opened this issue Jan 26, 2015 · 18 comments

Comments

@leibin2014
Copy link
Contributor

According to the handbook I use a lot of operator= for setting chipselect, reset pins. The code works if I set optimization flag to -O0 or -O1, but sometimes can't work if set to -O2. Take below code as an example:

class ILI9341: public SPI
{
public:
    ILI9341(PinName mosi, PinName miso, PinName sclk, PinName cs, PinName reset, PinName dc, PinName backlight);
    ~ILI9341();
    void configureRectangle(uint32_t x1, uint32_t y1, uint32_t x2, uint32_t y2);

private:
    DigitalOut cs;
    DigitalOut reset;
    DigitalOut dc;
};

void ILI9341::configureRectangle(uint32_t x1, uint32_t y1, uint32_t x2, uint32_t y2)
{
    cs = 0;
    ADDRESS_SET(x1,y1,x2,y2);   
    dc = 1;
}

The code cs = 0; and dc = 1 in function configureRectangle is optimized if using -O2 as compiler flag.
Is there a way to avoid this problem if I want to use -O2 flag and also the operator= for DigitalOut?

My compiler is arm-none-eabi-gcc version 4.9.3.

@PrzemekWirkus
Copy link
Contributor

LIne of code:

cs = 0;

is just like:

cs.operator=(0);

so probably changing code to:

cs.write(0);

will not help. But actually have you tried?

I think it might be GCC problem :/

@adamgreen
Copy link
Contributor

@leibin2014 What device are you compiling for? Typically those operators get inlined all the way down to a device specific implementation so the issue might be device specific. Did you look at the disassembly to verify that there are missing STR instructions or are you just seeing external behavior that would indicate those statements were optimized out?

It might be helpful if you had a build-able project that you could point us to so that we could try reproducing the problem. I have recently been playing with similar code to blink LEDs on a few different devices with the 4.9.3 compiler and I haven't hit this issue.

@0xc0170
Copy link
Contributor

0xc0170 commented Jan 26, 2015

@leibin2014 as above, please share assembly output, for both scenarios described in your first post. As Adam proposed, the simple code snippet, which would reproduce hte problem would be helpful, a main code file with using DigitalOut, and the behavior you described.

@leibin2014
Copy link
Contributor Author

@PrzemekWirkus I think if change to cs.write(0) it should work, but I want to keep use cs = 0. Since cs = 0 is a normal use in mbed, I have a lot of code with this format. I wouldn't like to change them all to cs.write(0).

@leibin2014
Copy link
Contributor Author

@adamgreen and @0xc0170

I found this problem just happens in some functions. I also have a lot of code in the same project using the format cs = 0, it works. Even in the same file, some functions have this problem, the others don't. I run this code on nrf51822.
I attach the assembly code of both as below:

1) With -O0 FLAG:

   232 [1]  {
0x64ec                   80 b5        push  {r7, lr}
0x64ee  <+0x0002>        84 b0        sub   sp, #16
0x64f0  <+0x0004>        00 af        add   r7, sp, #0
0x64f2  <+0x0006>        f8 60        str   r0, [r7, #12]
0x64f4  <+0x0008>        b9 60        str   r1, [r7, #8]
0x64f6  <+0x000a>        7a 60        str   r2, [r7, #4]
0x64f8  <+0x000c>        3b 60        str   r3, [r7, #0]
        233 [1]     cs = 0;
0x64fa  <+0x000e>        fb 68        ldr   r3, [r7, #12]
0x64fc  <+0x0010>        18 33        adds  r3, #24
0x64fe  <+0x0012>        18 1c        adds  r0, r3, #0
0x6500  <+0x0014>        00 21        movs  r1, #0
0x6502  <+0x0016>        ff f7 c5 fb  bl    0x5c90 <mbed::DigitalOut::operator=(int)>
        234 [1]     ADDRESS_SET(x1,y1,x2,y2);
0x6506  <+0x001a>        fb 68        ldr   r3, [r7, #12]
0x6508  <+0x001c>        48 33        adds  r3, #72 ; 0x48
0x650a  <+0x001e>        18 1c        adds  r0, r3, #0
0x650c  <+0x0020>        00 21        movs  r1, #0
0x650e  <+0x0022>        ff f7 bf fb  bl    0x5c90 <mbed::DigitalOut::operator=(int)>
0x6512  <+0x0026>        fb 68        ldr   r3, [r7, #12]
0x6514  <+0x0028>        04 33        adds  r3, #4
0x6516  <+0x002a>        18 1c        adds  r0, r3, #0
0x6518  <+0x002c>        2a 21        movs  r1, #42 ; 0x2a
0x651a  <+0x002e>        fe f7 0b f8  bl    0x4534 <spi_master_write_fast>
0x651e  <+0x0032>        fb 68        ldr   r3, [r7, #12]
0x6520  <+0x0034>        48 33        adds  r3, #72 ; 0x48
0x6522  <+0x0036>        18 1c        adds  r0, r3, #0
0x6524  <+0x0038>        01 21        movs  r1, #1
0x6526  <+0x003a>        ff f7 b3 fb  bl    0x5c90 <mbed::DigitalOut::operator=(int)>
0x652a  <+0x003e>        fb 68        ldr   r3, [r7, #12]
0x652c  <+0x0040>        1a 1d        adds  r2, r3, #4
0x652e  <+0x0042>        bb 68        ldr   r3, [r7, #8]
0x6530  <+0x0044>        1b 0a        lsrs  r3, r3, #8
0x6532  <+0x0046>        10 1c        adds  r0, r2, #0
0x6534  <+0x0048>        19 1c        adds  r1, r3, #0
0x6536  <+0x004a>        fd f7 fd ff  bl    0x4534 <spi_master_write_fast>
0x653a  <+0x004e>        fb 68        ldr   r3, [r7, #12]
0x653c  <+0x0050>        48 33        adds  r3, #72 ; 0x48
0x653e  <+0x0052>        18 1c        adds  r0, r3, #0
0x6540  <+0x0054>        01 21        movs  r1, #1
0x6542  <+0x0056>        ff f7 a5 fb  bl    0x5c90 <mbed::DigitalOut::operator=(int)>
0x6546  <+0x005a>        fb 68        ldr   r3, [r7, #12]
0x6548  <+0x005c>        1a 1d        adds  r2, r3, #4
0x654a  <+0x005e>        bb 68        ldr   r3, [r7, #8]
0x654c  <+0x0060>        10 1c        adds  r0, r2, #0
0x654e  <+0x0062>        19 1c        adds  r1, r3, #0
0x6550  <+0x0064>        fd f7 f0 ff  bl    0x4534 <spi_master_write_fast>
0x6554  <+0x0068>        fb 68        ldr   r3, [r7, #12]
0x6556  <+0x006a>        48 33        adds  r3, #72 ; 0x48
0x6558  <+0x006c>        18 1c        adds  r0, r3, #0
0x655a  <+0x006e>        01 21        movs  r1, #1
0x655c  <+0x0070>        ff f7 98 fb  bl    0x5c90 <mbed::DigitalOut::operator=(int)>
0x6560  <+0x0074>        fb 68        ldr   r3, [r7, #12]
0x6562  <+0x0076>        1a 1d        adds  r2, r3, #4
0x6564  <+0x0078>        3b 68        ldr   r3, [r7, #0]
0x6566  <+0x007a>        1b 0a        lsrs  r3, r3, #8
0x6568  <+0x007c>        10 1c        adds  r0, r2, #0
0x656a  <+0x007e>        19 1c        adds  r1, r3, #0
0x656c  <+0x0080>        fd f7 e2 ff  bl    0x4534 <spi_master_write_fast>
0x6570  <+0x0084>        fb 68        ldr   r3, [r7, #12]
0x6572  <+0x0086>        48 33        adds  r3, #72 ; 0x48
0x6574  <+0x0088>        18 1c        adds  r0, r3, #0
0x6576  <+0x008a>        01 21        movs  r1, #1
0x6578  <+0x008c>        ff f7 8a fb  bl    0x5c90 <mbed::DigitalOut::operator=(int)>
0x657c  <+0x0090>        fb 68        ldr   r3, [r7, #12]
0x657e  <+0x0092>        1a 1d        adds  r2, r3, #4
0x6580  <+0x0094>        3b 68        ldr   r3, [r7, #0]
0x6582  <+0x0096>        10 1c        adds  r0, r2, #0
0x6584  <+0x0098>        19 1c        adds  r1, r3, #0
0x6586  <+0x009a>        fd f7 d5 ff  bl    0x4534 <spi_master_write_fast>
0x658a  <+0x009e>        fb 68        ldr   r3, [r7, #12]
0x658c  <+0x00a0>        48 33        adds  r3, #72 ; 0x48
0x658e  <+0x00a2>        18 1c        adds  r0, r3, #0
0x6590  <+0x00a4>        00 21        movs  r1, #0
0x6592  <+0x00a6>        ff f7 7d fb  bl    0x5c90 <mbed::DigitalOut::operator=(int)>
0x6596  <+0x00aa>        fb 68        ldr   r3, [r7, #12]
0x6598  <+0x00ac>        04 33        adds  r3, #4
0x659a  <+0x00ae>        18 1c        adds  r0, r3, #0
0x659c  <+0x00b0>        2b 21        movs  r1, #43 ; 0x2b
0x659e  <+0x00b2>        fd f7 c9 ff  bl    0x4534 <spi_master_write_fast>
0x65a2  <+0x00b6>        fb 68        ldr   r3, [r7, #12]
0x65a4  <+0x00b8>        48 33        adds  r3, #72 ; 0x48
0x65a6  <+0x00ba>        18 1c        adds  r0, r3, #0
0x65a8  <+0x00bc>        01 21        movs  r1, #1
0x65aa  <+0x00be>        ff f7 71 fb  bl    0x5c90 <mbed::DigitalOut::operator=(int)>
0x65ae  <+0x00c2>        fb 68        ldr   r3, [r7, #12]
0x65b0  <+0x00c4>        1a 1d        adds  r2, r3, #4
0x65b2  <+0x00c6>        7b 68        ldr   r3, [r7, #4]
0x65b4  <+0x00c8>        1b 0a        lsrs  r3, r3, #8
0x65b6  <+0x00ca>        10 1c        adds  r0, r2, #0
0x65b8  <+0x00cc>        19 1c        adds  r1, r3, #0
0x65ba  <+0x00ce>        fd f7 bb ff  bl    0x4534 <spi_master_write_fast>
0x65be  <+0x00d2>        fb 68        ldr   r3, [r7, #12]
0x65c0  <+0x00d4>        48 33        adds  r3, #72 ; 0x48
0x65c2  <+0x00d6>        18 1c        adds  r0, r3, #0
0x65c4  <+0x00d8>        01 21        movs  r1, #1
0x65c6  <+0x00da>        ff f7 63 fb  bl    0x5c90 <mbed::DigitalOut::operator=(int)>
0x65ca  <+0x00de>        fb 68        ldr   r3, [r7, #12]
0x65cc  <+0x00e0>        1a 1d        adds  r2, r3, #4
0x65ce  <+0x00e2>        7b 68        ldr   r3, [r7, #4]
0x65d0  <+0x00e4>        10 1c        adds  r0, r2, #0
0x65d2  <+0x00e6>        19 1c        adds  r1, r3, #0
0x65d4  <+0x00e8>        fd f7 ae ff  bl    0x4534 <spi_master_write_fast>
0x65d8  <+0x00ec>        fb 68        ldr   r3, [r7, #12]
0x65da  <+0x00ee>        48 33        adds  r3, #72 ; 0x48
0x65dc  <+0x00f0>        18 1c        adds  r0, r3, #0
0x65de  <+0x00f2>        01 21        movs  r1, #1
0x65e0  <+0x00f4>        ff f7 56 fb  bl    0x5c90 <mbed::DigitalOut::operator=(int)>
0x65e4  <+0x00f8>        fb 68        ldr   r3, [r7, #12]
0x65e6  <+0x00fa>        1a 1d        adds  r2, r3, #4
0x65e8  <+0x00fc>        bb 69        ldr   r3, [r7, #24]
0x65ea  <+0x00fe>        1b 0a        lsrs  r3, r3, #8
0x65ec  <+0x0100>        10 1c        adds  r0, r2, #0
0x65ee  <+0x0102>        19 1c        adds  r1, r3, #0
0x65f0  <+0x0104>        fd f7 a0 ff  bl    0x4534 <spi_master_write_fast>
0x65f4  <+0x0108>        fb 68        ldr   r3, [r7, #12]
0x65f6  <+0x010a>        48 33        adds  r3, #72 ; 0x48
0x65f8  <+0x010c>        18 1c        adds  r0, r3, #0
0x65fa  <+0x010e>        01 21        movs  r1, #1
0x65fc  <+0x0110>        ff f7 48 fb  bl    0x5c90 <mbed::DigitalOut::operator=(int)>
0x6600  <+0x0114>        fb 68        ldr   r3, [r7, #12]
0x6602  <+0x0116>        1a 1d        adds  r2, r3, #4
0x6604  <+0x0118>        bb 69        ldr   r3, [r7, #24]
0x6606  <+0x011a>        10 1c        adds  r0, r2, #0
0x6608  <+0x011c>        19 1c        adds  r1, r3, #0
0x660a  <+0x011e>        fd f7 93 ff  bl    0x4534 <spi_master_write_fast>
0x660e  <+0x0122>        fb 68        ldr   r3, [r7, #12]
0x6610  <+0x0124>        48 33        adds  r3, #72 ; 0x48
0x6612  <+0x0126>        18 1c        adds  r0, r3, #0
0x6614  <+0x0128>        00 21        movs  r1, #0
0x6616  <+0x012a>        ff f7 3b fb  bl    0x5c90 <mbed::DigitalOut::operator=(int)>
0x661a  <+0x012e>        fb 68        ldr   r3, [r7, #12]
0x661c  <+0x0130>        04 33        adds  r3, #4
0x661e  <+0x0132>        18 1c        adds  r0, r3, #0
0x6620  <+0x0134>        2c 21        movs  r1, #44 ; 0x2c
0x6622  <+0x0136>        fd f7 87 ff  bl    0x4534 <spi_master_write_fast>
        235 [1]     dc = 1;
0x6626  <+0x013a>        fb 68        ldr   r3, [r7, #12]
0x6628  <+0x013c>        48 33        adds  r3, #72 ; 0x48
0x662a  <+0x013e>        18 1c        adds  r0, r3, #0
0x662c  <+0x0140>        01 21        movs  r1, #1
0x662e  <+0x0142>        ff f7 2f fb  bl    0x5c90 <mbed::DigitalOut::operator=(int)>
        236 [1] }
0x6632  <+0x0146>        bd 46        mov   sp, r7
0x6634  <+0x0148>        04 b0        add   sp, #16
0x6636  <+0x014a>        80 bd        pop   {r7, pc}
2) With -O2 FLAG

0x64ec                   f0 b5        push  {r4, r5, r6, r7, lr}
0x64ee  <+0x0002>        47 46        mov   r7, r8
0x64f0  <+0x0004>        80 b4        push  {r7}
0x64f2  <+0x0006>        1f 1c        adds  r7, r3, #0
0x64f8  <+0x000c>        04 1c        adds  r4, r0, #0
0x64fa  <+0x000e>        88 46        mov   r8, r1
0x64fc  <+0x0010>        16 1c        adds  r6, r2, #0
        235 [1]     ADDRESS_SET(x1,y1,x2,y2);
0x6518  <+0x002c>        25 1d        adds  r5, r4, #4
0x651c  <+0x0030>        28 1c        adds  r0, r5, #0
0x651e  <+0x0032>        2a 21        movs  r1, #42 ; 0x2a
0x6520  <+0x0034>        fe f7 08 f8  bl    0x4534 <spi_master_write_fast>
0x6532  <+0x0046>        28 1c        adds  r0, r5, #0
0x6536  <+0x004a>        43 46        mov   r3, r8
0x6538  <+0x004c>        19 0a        lsrs  r1, r3, #8
0x653a  <+0x004e>        fd f7 fb ff  bl    0x4534 <spi_master_write_fast>
0x654c  <+0x0060>        28 1c        adds  r0, r5, #0
0x6550  <+0x0064>        41 46        mov   r1, r8
0x6552  <+0x0066>        fd f7 ef ff  bl    0x4534 <spi_master_write_fast>
0x6564  <+0x0078>        39 0a        lsrs  r1, r7, #8
0x6568  <+0x007c>        28 1c        adds  r0, r5, #0
0x656a  <+0x007e>        fd f7 e3 ff  bl    0x4534 <spi_master_write_fast>
0x657c  <+0x0090>        28 1c        adds  r0, r5, #0
0x6580  <+0x0094>        39 1c        adds  r1, r7, #0
0x6582  <+0x0096>        fd f7 d7 ff  bl    0x4534 <spi_master_write_fast>
0x6594  <+0x00a8>        28 1c        adds  r0, r5, #0
0x6598  <+0x00ac>        2b 21        movs  r1, #43 ; 0x2b
0x659a  <+0x00ae>        fd f7 cb ff  bl    0x4534 <spi_master_write_fast>
0x65ac  <+0x00c0>        31 0a        lsrs  r1, r6, #8
0x65b0  <+0x00c4>        28 1c        adds  r0, r5, #0
0x65b2  <+0x00c6>        fd f7 bf ff  bl    0x4534 <spi_master_write_fast>
0x65c4  <+0x00d8>        28 1c        adds  r0, r5, #0
0x65c8  <+0x00dc>        31 1c        adds  r1, r6, #0
0x65ca  <+0x00de>        fd f7 b3 ff  bl    0x4534 <spi_master_write_fast>
0x65da  <+0x00ee>        28 1c        adds  r0, r5, #0
0x65de  <+0x00f2>        06 9b        ldr   r3, [sp, #24]
0x65e0  <+0x00f4>        19 0a        lsrs  r1, r3, #8
0x65e2  <+0x00f6>        fd f7 a7 ff  bl    0x4534 <spi_master_write_fast>
0x65f2  <+0x0106>        28 1c        adds  r0, r5, #0
0x65f6  <+0x010a>        06 99        ldr   r1, [sp, #24]
0x65f8  <+0x010c>        fd f7 9c ff  bl    0x4534 <spi_master_write_fast>
0x6608  <+0x011c>        28 1c        adds  r0, r5, #0
0x660c  <+0x0120>        2c 21        movs  r1, #44 ; 0x2c
0x660e  <+0x0122>        fd f7 91 ff  bl    0x4534 <spi_master_write_fast>
        237 [1] }
0x6620  <+0x0134>        04 bc        pop   {r2}
0x6622  <+0x0136>        90 46        mov   r8, r2
0x6624  <+0x0138>        f0 bd        pop   {r4, r5, r6, r7, pc}

@leibin2014
Copy link
Contributor Author

@adamgreen and @0xc0170

I have reproduced this problem with below simple code:

#include "mbed.h"

DigitalOut cs(p18);
DigitalOut reset(p12);
DigitalOut dc(p14);

void setup()
{
    cs = 1;
}

void loop()
{

}

int main()
{
  setup();
  while(1)
  {
    loop();
  }
}

Assembly code for function setup():

1) With -O0 flag
        549 [1] {
0x26c                   80 b5        push   {r7, lr}
0x26e  <+0x0002>        00 af        add    r7, sp, #0
        550 [1]     cs = 1;
0x270  <+0x0004>        03 4b        ldr    r3, [pc, #12]   ; (0x280 <setup()+20>)
0x272  <+0x0006>        18 1c        adds   r0, r3, #0
0x274  <+0x0008>        01 21        movs   r1, #1
0x276  <+0x000a>        ff f7 e9 ff  bl 0x24c <mbed::DigitalOut::operator=(int)>
        551 [1] }
0x27a  <+0x000e>        bd 46        mov    sp, r7
0x27c  <+0x0010>        80 bd        pop    {r7, pc}
0x27e  <+0x0012>        c0 46        nop            ; (mov r8, r8)
0x280  <+0x0014>        2c 00        movs   r4, r5
0x282  <+0x0016>        00 20        movs   r0, #0
2) With -O2 flag
        549 [1] {
0x198                   10 b5  push {r4, lr}
        551 [1] }
0x1aa  <+0x0012>        10 bd  pop  {r4, pc}

@leibin2014
Copy link
Contributor Author

Here is the CFLAGS:

arm-none-eabi-g++.exe -g -O2 -Wall -Wextra -pipe -x c "-std=gnu11" -fno-hosted -Wno-old-style-declaration "-mtune=cortex-m0" "-mcpu=cortex-m0" -msoft-float -gdwarf-2 -Wall -Wextra -Wno-missing-field-initializers -Wno-unused-parameter -Wno-sign-compare -Wno-comment -Wno-switch -fno-delete-null-pointer-checks -fno-strict-aliasing -ffunction-sections "-fmessage-length=0" -fdata-sections -fsigned-char -fno-builtin -ffast-math -mno-sched-prolog -mthumb -nostdlib -MMD -DTOOLCHAIN_GCC_ARM -DTOOLCHAIN_GCC "-D__MBED__=1" -DNDEBUG -D__CORTEX_M0 -DARM_MATH_CM0

@adamgreen
Copy link
Contributor

I built your sample code with the following g++ command line:

arm-none-eabi-g++ -mcpu=cortex-m0 -mthumb -c -g -fno-common -fmessage-length=0 -Wall -fno-exceptions -ffunction-sections -fdata-sections -fomit-frame-pointer -MMD -MP -DNDEBUG -O2 -DTARGET_NRF51822 -DTARGET_M0 -DTARGET_CORTEX_M -DTARGET_NORDIC -DTARGET_NRF51822_MKIT -DTARGET_MCU_NRF51822 -DTARGET_MCU_NORDIC_16K -DTOOLCHAIN_GCC_ARM -DTOOLCHAIN_GCC -D__CORTEX_M0 -DARM_MATH_CM0 -DMBED_BUILD_TIMESTAMP=1422336272.34 -D__MBED__=1  -std=gnu++98 -fno-rtti -I. -I./mbed -I./mbed/TARGET_NRF51822 -I./mbed/TARGET_NRF51822/TARGET_NORDIC -I./mbed/TARGET_NRF51822/TARGET_NORDIC/TARGET_MCU_NRF51822 -I./mbed/TARGET_NRF51822/TARGET_NORDIC/TARGET_MCU_NRF51822/TARGET_NRF51822_MKIT -I./mbed/TARGET_NRF51822/TARGET_NORDIC/TARGET_MCU_NRF51822/Lib -I./mbed/TARGET_NRF51822/TARGET_NORDIC/TARGET_MCU_NRF51822/Lib/s110_nrf51822_7_1_0 -I./mbed/TARGET_NRF51822/TARGET_NORDIC/TARGET_MCU_NRF51822/Lib/s110_nrf51822_7_1_0/s110_nrf51822_7.1.0_API -I./mbed/TARGET_NRF51822/TARGET_NORDIC/TARGET_MCU_NRF51822/Lib/s110_nrf51822_7_1_0/s110_nrf51822_7.1.0_API/include -I./mbed/TARGET_NRF51822/TARGET_NORDIC/TARGET_MCU_NRF51822/Lib/nrf-sdk -I./mbed/TARGET_NRF51822/TARGET_NORDIC/TARGET_MCU_NRF51822/Lib/nrf-sdk/app_common -I./mbed/TARGET_NRF51822/TARGET_NORDIC/TARGET_MCU_NRF51822/Lib/nrf-sdk/sd_common -I./mbed/TARGET_NRF51822/TOOLCHAIN_GCC_ARM  -o main.o main.cpp

That produced the expected assembly code for main where setup was actually inlined:

00016198 <main>:
   16198:   4b02        ldr r3, [pc, #8]    ; (161a4 <main+0xc>)
   1619a:   68da        ldr r2, [r3, #12]
   1619c:   685b        ldr r3, [r3, #4]
   1619e:   6013        str r3, [r2, #0]
   161a0:   e7fe        b.n 161a0 <main+0x8>
   161a2:   46c0        nop         ; (mov r8, r8)
   161a4:   200020fc    .word   0x200020fc

Where did you get your compiler flags? You should look for support from whomever gave you those options to use for the NRF51822 device with GCC. They aren't the recommended ones.

The flags I used above were from a project exported from the mbed online compiler for GCC_ARM so they are the recommended ones. I did change the -Os optimization flag to -O2.

@leibin2014
Copy link
Contributor Author

@adamgreen
Thanks for your reply! I still have the following points to say:

  1. I think you should look into the assembly code for function setup() but not function main(), to see if the code for cs=1 is optimized.
    2)Maybe with your flags can't reproduce the problem. But if you add the flag "-mno-sched-prolog" to and remove flag "NDEBUG" from your flags. I think then you can reproduce the problem.

@adamgreen
Copy link
Contributor

1) I think you should look into the assembly code for function setup() but not function main(), to see if the code for cs=1 is optimized.

If you look closely at the disassembly of my main(), you will see that there is no call to setup as it was inlined.

2)Maybe with your flags can't reproduce the problem. But if you add the flag "-mno-sched-prolog" to and remove flag "NDEBUG" from your flags. I think then you can reproduce the problem.

First, they aren't my flags, they are the ones you get if you export a project for GCC_ARM. The only thing that happens if I make your recommended flag changes are that it inlines some assert code from the operator= as well.

00016198 <main>:
   16198:   b510        push    {r4, lr}
   1619a:   2300        movs    r3, #0
   1619c:   4c06        ldr r4, [pc, #24]   ; (161b8 <main+0x20>)
   1619e:   56e3        ldrsb   r3, [r4, r3]
   161a0:   3301        adds    r3, #1
   161a2:   d003        beq.n   161ac <main+0x14>
   161a4:   68e3        ldr r3, [r4, #12]
   161a6:   6862        ldr r2, [r4, #4]
   161a8:   601a        str r2, [r3, #0]
   161aa:   e7fe        b.n 161aa <main+0x12>
   161ac:   4803        ldr r0, [pc, #12]   ; (161bc <main+0x24>)
   161ae:   4904        ldr r1, [pc, #16]   ; (161c0 <main+0x28>)
   161b0:   2224        movs    r2, #36 ; 0x24
   161b2:   f000 f991   bl  164d8 <mbed_assert_internal>
   161b6:   e7f5        b.n 161a4 <main+0xc>
   161b8:   200020fc    .word   0x200020fc
   161bc:   0001cc8c    .word   0x0001cc8c
   161c0:   0001cca4    .word   0x0001cca4

Why do you care about flags like "-mno-sched-prolog" anyway? I could see using that if your version of GDB got confused by an optimized prolog but I haven't seen that happen in my use of GCC_ARM's build of GDB. It has code to manually walk the prolog and figure out where things currently are based on how far you have progressed into a function.

Have you tried the recommended flags? Does the problem happen with those?

@leibin2014
Copy link
Contributor Author

@adamgreen Thanks for your reply and sorry for making you confused!
I just don't know how to create simple code to reproduce my original problem. Just using "-mno-sched-prolog" and with out "NDEBUG" can reproduce the similar problem with the simple code. Looks like it's not the same as my original problem.

I tried below CFLAGS, CXXFLAGS, and LDFLAGS, but can't solve my original problem.
Could you please paste out a recomended flags(CFLAGS, CXXFLAGS, and LDFLAGS)?

#############################################################################################################
GLOBAL_CFLAGS += -O$(OPTIMIZATION)
GLOBAL_CFLAGS += -std=gnu11
GLOBAL_CFLAGS += -c
GLOBAL_CFLAGS += -g
GLOBAL_CFLAGS += -fno-common
GLOBAL_CFLAGS += -fomit-frame-pointer
GLOBAL_CFLAGS += -MP
GLOBAL_CFLAGS += -Wall
GLOBAL_CFLAGS += -Wextra
GLOBAL_CFLAGS += -fmessage-length=0
GLOBAL_CFLAGS += -fno-exceptions
GLOBAL_CFLAGS += -ffunction-sections
GLOBAL_CFLAGS += -fdata-sections
GLOBAL_CFLAGS += -MMD
GLOBAL_CFLAGS += -mcpu=cortex-m0
GLOBAL_CFLAGS += -mthumb

#############################################################################################################
GLOBAL_CXXFLAGS += -O$(OPTIMIZATION)
GLOBAL_CXXFLAGS += -std=gnu++98
GLOBAL_CXXFLAGS += -fno-rtti
GLOBAL_CXXFLAGS += -c
GLOBAL_CXXFLAGS += -g
GLOBAL_CXXFLAGS += -fno-common
GLOBAL_CXXFLAGS += -fomit-frame-pointer
GLOBAL_CXXFLAGS += -MP
GLOBAL_CXXFLAGS += -Wall
GLOBAL_CXXFLAGS += -Wextra
GLOBAL_CXXFLAGS += -fmessage-length=0
GLOBAL_CXXFLAGS += -fno-exceptions
GLOBAL_CXXFLAGS += -ffunction-sections
GLOBAL_CXXFLAGS += -fdata-sections
GLOBAL_CXXFLAGS += -MMD
GLOBAL_CXXFLAGS += -mcpu=cortex-m0
GLOBAL_CXXFLAGS += -mthumb
GLOBAL_CXXFLAGS += -fpermissive

#############################################################################################################
GLOBAL_LDFLAGS =-mcpu=cortex-m0 -mthumb -Og -fmessage-length=0 -fsigned-char -ffunction-sections -fdata-sections -Wall -Wextra -g3 -Xlinker --gc-sections -Wl,--wrap=main --specs=nano.specs

@adamgreen
Copy link
Contributor

This is a makefile exported from the mbed online compiler for a simple NRF51822 blinky sample:

# This file was automagically generated by mbed.org. For more information,
# see http://mbed.org/handbook/Exporting-to-GCC-ARM-Embedded

GCC_BIN =
PROJECT = HelloWorld
OBJECTS = ./main.o 
SYS_OBJECTS = ./mbed/TARGET_NRF51822/TOOLCHAIN_GCC_ARM/retarget.o ./mbed/TARGET_NRF51822/TOOLCHAIN_GCC_ARM/board.o ./mbed/TARGET_NRF51822/TOOLCHAIN_GCC_ARM/cmsis_nvic.o ./mbed/TARGET_NRF51822/TOOLCHAIN_GCC_ARM/system_nrf51822.o ./mbed/TARGET_NRF51822/TOOLCHAIN_GCC_ARM/startup_NRF51822.o 
INCLUDE_PATHS = -I. -I./mbed -I./mbed/TARGET_NRF51822 -I./mbed/TARGET_NRF51822/TARGET_NORDIC -I./mbed/TARGET_NRF51822/TARGET_NORDIC/TARGET_MCU_NRF51822 -I./mbed/TARGET_NRF51822/TARGET_NORDIC/TARGET_MCU_NRF51822/TARGET_NRF51822_MKIT -I./mbed/TARGET_NRF51822/TARGET_NORDIC/TARGET_MCU_NRF51822/Lib -I./mbed/TARGET_NRF51822/TARGET_NORDIC/TARGET_MCU_NRF51822/Lib/s110_nrf51822_7_1_0 -I./mbed/TARGET_NRF51822/TARGET_NORDIC/TARGET_MCU_NRF51822/Lib/s110_nrf51822_7_1_0/s110_nrf51822_7.1.0_API -I./mbed/TARGET_NRF51822/TARGET_NORDIC/TARGET_MCU_NRF51822/Lib/s110_nrf51822_7_1_0/s110_nrf51822_7.1.0_API/include -I./mbed/TARGET_NRF51822/TARGET_NORDIC/TARGET_MCU_NRF51822/Lib/nrf-sdk -I./mbed/TARGET_NRF51822/TARGET_NORDIC/TARGET_MCU_NRF51822/Lib/nrf-sdk/app_common -I./mbed/TARGET_NRF51822/TARGET_NORDIC/TARGET_MCU_NRF51822/Lib/nrf-sdk/sd_common -I./mbed/TARGET_NRF51822/TOOLCHAIN_GCC_ARM 
LIBRARY_PATHS = -L./mbed/TARGET_NRF51822/TOOLCHAIN_GCC_ARM 
LIBRARIES = -lmbed 
LINKER_SCRIPT = ./mbed/TARGET_NRF51822/TOOLCHAIN_GCC_ARM/NRF51822.ld
SOFTDEVICE = mbed/TARGET_NRF51822/TARGET_NORDIC/TARGET_MCU_NRF51822/Lib/s110_nrf51822_7_0_0/s110_nrf51822_7.0.0_softdevice.hex

###############################################################################
AS      = $(GCC_BIN)arm-none-eabi-as
CC      = $(GCC_BIN)arm-none-eabi-gcc
CPP     = $(GCC_BIN)arm-none-eabi-g++
LD      = $(GCC_BIN)arm-none-eabi-gcc
OBJCOPY = $(GCC_BIN)arm-none-eabi-objcopy
OBJDUMP = $(GCC_BIN)arm-none-eabi-objdump
SIZE    = $(GCC_BIN)arm-none-eabi-size
SREC_CAT = srec_cat

CPU = -mcpu=cortex-m0 -mthumb
CC_FLAGS = $(CPU) -c -g -fno-common -fmessage-length=0 -Wall -fno-exceptions -ffunction-sections -fdata-sections -fomit-frame-pointer
CC_FLAGS += -MMD -MP
CC_SYMBOLS = -DTARGET_NRF51822 -DTARGET_M0 -DTARGET_CORTEX_M -DTARGET_NORDIC -DTARGET_NRF51822_MKIT -DTARGET_MCU_NRF51822 -DTARGET_MCU_NORDIC_16K -DTOOLCHAIN_GCC_ARM -DTOOLCHAIN_GCC -D__CORTEX_M0 -DARM_MATH_CM0 -DMBED_BUILD_TIMESTAMP=1422336272.34 -D__MBED__=1 

LD_FLAGS = $(CPU) -Wl,--gc-sections -Wl,--wrap=main --specs=nano.specs -u _printf_float -u _scanf_float
LD_FLAGS += -Wl,-Map=$(PROJECT).map,--cref
LD_SYS_LIBS = -lstdc++ -lsupc++ -lm -lc -lgcc -lnosys

ifeq ($(DEBUG), 1)
  CC_FLAGS += -DDEBUG -O0
else
  CC_FLAGS += -DNDEBUG -Os
endif

all: $(PROJECT).bin $(PROJECT).hex 

clean:
    rm -f $(PROJECT).bin $(PROJECT).elf $(PROJECT).hex $(PROJECT).map $(PROJECT).lst $(OBJECTS) $(DEPS)

.s.o:
    $(AS) $(CPU) -o $@ $<

.c.o:
    $(CC)  $(CC_FLAGS) $(CC_SYMBOLS) -std=gnu99   $(INCLUDE_PATHS) -o $@ $<

.cpp.o:
    $(CPP) $(CC_FLAGS) $(CC_SYMBOLS) -std=gnu++98 -fno-rtti $(INCLUDE_PATHS) -o $@ $<


$(PROJECT).elf: $(OBJECTS) $(SYS_OBJECTS)
    $(LD) $(LD_FLAGS) -T$(LINKER_SCRIPT) $(LIBRARY_PATHS) -o $@ $^ $(LIBRARIES) $(LD_SYS_LIBS) $(LIBRARIES) $(LD_SYS_LIBS)
    $(SIZE) $@

$(PROJECT).bin: $(PROJECT).elf
    @$(OBJCOPY) -O binary $< $@

$(PROJECT).hex: $(PROJECT).elf
    @$(OBJCOPY) -O ihex $< $@

$(PROJECT).lst: $(PROJECT).elf
    @$(OBJDUMP) -Sdh $< > $@

lst: $(PROJECT).lst

size:
    $(SIZE) $(PROJECT).elf

DEPS = $(OBJECTS:.o=.d) $(SYS_OBJECTS:.o=.d)
-include $(DEPS)

merge:
    $(SREC_CAT) $(SOFTDEVICE) -intel $(PROJECT).hex -intel -o combined.hex -intel --line-length=44

@leibin2014
Copy link
Contributor Author

@adamgreen
I still can't fix this problem with the recommended flags.
I have recreated the problem with simple code as below. This code is most similar as my original code. I suppose you can also reproduce the problem with this code and -O2 flag.


#include "mbed.h"

#define MOSI        p18
#define MISO        p12
#define SCLK        p17
#define CS          p7

DigitalOut cs(p14);
DigitalOut reset(p12);
DigitalOut dc(p13);

SPI spi(MOSI, MISO, SCLK);


#define LCD_WR_DATA8(data)  { dc = 1; spi.frequency(8000000); }
#define LCD_WR_REG(data)    { dc = 0; spi.frequency(2000000); }

#define ADDRESS_SET(x1, y1, x2, y2) \
{                                   \
   LCD_WR_REG(0x2a);                \
   LCD_WR_DATA8(x1>>8);             \
   LCD_WR_DATA8(x1);                \
   LCD_WR_DATA8(x2>>8);             \
   LCD_WR_DATA8(x2);                \
                                    \
   LCD_WR_REG(0x2b);                \
   LCD_WR_DATA8(y1>>8);             \
   LCD_WR_DATA8(y1);                \
   LCD_WR_DATA8(y2>>8);             \
   LCD_WR_DATA8(y2);                \
                                    \
   LCD_WR_REG(0x2C);                \
}


void setup()
{
    cs = 0;
    ADDRESS_SET(1,2,3,4);
    dc = 1;


}


void loop()
{

}


int main()
{
  setup();
  while(1)
  {
    loop();
  }
}

@leibin2014
Copy link
Contributor Author

Looks like the code also can be simplified as below:

#include "mbed.h"

#define MOSI        p18
#define MISO        p12 
#define SCLK        p17
#define CS          p7

DigitalOut cs(p14);
DigitalOut reset(p12);
DigitalOut dc(p13);

SPI spi(MOSI, MISO, SCLK);


void setup()
{
    cs = 0;
    dc = 1;
    spi.frequency(8000000);
    dc = 0;
    spi.frequency(2000000);
    dc = 1;

}


void loop()
{

}

int main()
{
  setup();
  while(1)
  {
    loop();
  }
}

Here is the disassembly code:

559       {
          setup():
00000ce8:   push {r3, r4, r5, lr}
562           spi.frequency(8000000);
00000cf2:   ldr r5, [pc, #48]       ; (0xd24 <setup()+60>)
00000cfa:   adds r0, r5, #0
00000cfe:   ldr r1, [pc, #40]       ; (0xd28 <setup()+64>)
00000d00:   bl 0xe28 <mbed::SPI::frequency(int)>
564           spi.frequency(2000000);
00000d08:   adds r0, r5, #0
00000d0c:   ldr r1, [pc, #28]       ; (0xd2c <setup()+68>)
00000d0e:   bl 0xe28 <mbed::SPI::frequency(int)>
567       }

@leibin2014
Copy link
Contributor Author

@adamgreen @0xc0170
Are you able to recreate this problem with my latest simplified code?

@leibin2014
Copy link
Contributor Author

Looks like it's not due to GPIO setting is optimized. My original problem is caused by the code running too fast after using -O2.
Let me check more about it.
Thanks for all your help!

@adamgreen
Copy link
Contributor

Are you able to recreate this problem with my latest simplified code?

Nope but when I look at your disassembly it looks to obviously be incomplete since there are gaps in the address ranges. What follows is the disassembly from when I build your sample with the exported makefile switched to use -O2.

000161a8 <setup()>:
   161a8:   b538        push    {r3, r4, r5, lr}
   161aa:   4b0c        ldr r3, [pc, #48]   ; (161dc <setup()+0x34>)
   161ac:   4c0c        ldr r4, [pc, #48]   ; (161e0 <setup()+0x38>)
   161ae:   691a        ldr r2, [r3, #16]
   161b0:   685b        ldr r3, [r3, #4]
   161b2:   4d0c        ldr r5, [pc, #48]   ; (161e4 <setup()+0x3c>)
   161b4:   6013        str r3, [r2, #0]
   161b6:   6862        ldr r2, [r4, #4]
   161b8:   68e3        ldr r3, [r4, #12]
   161ba:   1c28        adds    r0, r5, #0
   161bc:   601a        str r2, [r3, #0]
   161be:   490a        ldr r1, [pc, #40]   ; (161e8 <setup()+0x40>)
   161c0:   f000 fa02   bl  165c8 <mbed::SPI::frequency(int)>
   161c4:   6923        ldr r3, [r4, #16]
   161c6:   6862        ldr r2, [r4, #4]
   161c8:   1c28        adds    r0, r5, #0
   161ca:   601a        str r2, [r3, #0]
   161cc:   4907        ldr r1, [pc, #28]   ; (161ec <setup()+0x44>)
   161ce:   f000 f9fb   bl  165c8 <mbed::SPI::frequency(int)>
   161d2:   68e3        ldr r3, [r4, #12]
   161d4:   6862        ldr r2, [r4, #4]
   161d6:   601a        str r2, [r3, #0]
   161d8:   bd38        pop {r3, r4, r5, pc}
   161da:   46c0        nop         ; (mov r8, r8)
   161dc:   20002104    .word   0x20002104
   161e0:   200020ec    .word   0x200020ec
   161e4:   2000211c    .word   0x2000211c
   161e8:   007a1200    .word   0x007a1200
   161ec:   001e8480    .word   0x001e8480

Command used to create this disassembly:

arm-none-eabi-objdump -d -f -M reg-names-std --demangle HelloWorld.elf >HelloWorld.disasm

Looks like it's not due to GPIO setting is optimized. My original problem is caused by the code running too fast after using -O2.

That makes sense.

@leibin2014
Copy link
Contributor Author

Yes, Adamgreen, my code works fine after adding some wait_us(1) in the code.
I got the disassembly code from the debug tool "QT creator", looks it just show the current run-time disassembly code. After I running next step debug, can see disassembly is changed, and the same as as yours.

Thanks again for all your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants