Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add weak memcpy, memmove and memset #14

Closed
wants to merge 1 commit into from
Closed

add weak memcpy, memmove and memset #14

wants to merge 1 commit into from

Conversation

japaric
Copy link
Member

@japaric japaric commented Aug 8, 2016

cc @Amanieu

The problem is that, with this change, intrinsics like __aeabi_memcpy get misoptimized to an infinitely recursive call:

$ cargo build --target arm-unknown-linux-gnueabi
$ arm-linux-gnueabi-objdump -Cd target/arm-unknown-linux-gnueabi/release/librustc_builtins.rlib
00000000 <__aeabi_memcpy>:
   0:   eafffffe        b       0 <__aeabi_memcpy>
(..)

@Amanieu
Copy link
Member

Amanieu commented Aug 8, 2016

I think this might just be a quirk of how the disassembly is shown when weak symbols are around. You need to link it into an executable to see the function that's actually being called.

@japaric
Copy link
Member Author

japaric commented Aug 8, 2016

Hmm, I tested this with branch with one of my projects and I'm still seeing the recursive bomb. Also:

$ arm-none-eabi-nm target/cortex-m3/release/05-init-data
0800014a T __aeabi_memcpy
08000152 T __aeabi_memcpy4
         U __aeabi_unwind_cpp_pr0
42210000 A __BB_GPIOA
42220000 A __BB_GPIOC
42420000 A __BB_RCC
42028000 A __BB_TIM7
42000000 A __BITBAND_ALIAS
40000000 A __BITBAND_START
20000000 D __DATA_END
0800015c A __DATA_LOAD
20000000 D __DATA_START
08000146 T __default_handler
08000008 R __EXCEPTIONS
40010800 A __GPIOA
40011000 A __GPIOC
08000040 R __INTERRUPTS
e0000000 A __ITM
08000040 r link_exceptions_end
08000008 r link_exceptions_start
08000120 r link_interrupts_end
08000040 r link_interrupts_start
08000008 r link_reset_end
08000004 r link_reset_start
e000e100 A __NVIC
40021000 A __RCC
08000004 R __RESET
08000120 T start
08000146 T __tim7
40001400 A __TIM7

I'm going to try something different.

@Amanieu
Copy link
Member

Amanieu commented Aug 8, 2016

OK, I know what's happening. The memcpy function is being optimized by LLVM into a call to __aeabi_memcpy. You need to add #![no_builtins] to libc.rs.

@Amanieu
Copy link
Member

Amanieu commented Aug 8, 2016

Oh sorry, #![no_builtins] needs to be added to the crate root, not libc.rs. See rlibc.

#![feature(naked_functions)]
#![no_builtins]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Amanieu It's already here

@Amanieu
Copy link
Member

Amanieu commented Aug 8, 2016

Hmm, maybe try #[inline(never)]. Otherwise I'm out of ideas.

@japaric
Copy link
Member Author

japaric commented Aug 8, 2016

I was using LTO in my other project. If I disable that then I don't get infinite recursion.

08000120 <start>:
 8000120:       b580            push    {r7, lr}
 8000122:       f000 f803       bl      800012c <cu::rt::init_data::h0244c2bfaf46df0b>
 8000126:       e7fe            b.n     8000126 <start+0x6>

08000128 <__default_handler>:
 8000128:       be00            bkpt    0x0000
 800012a:       4770            bx      lr

0800012c <cu::rt::init_data::h0244c2bfaf46df0b>:
 800012c:       b580            push    {r7, lr}
 800012e:       f240 0000       movw    r0, #0
 8000132:       f240 0100       movw    r1, #0
 8000136:       f2c2 0000       movt    r0, #8192       ; 0x2000
 800013a:       f2c2 0100       movt    r1, #8192       ; 0x2000
 800013e:       1a09            subs    r1, r1, r0
 8000140:       f021 0203       bic.w   r2, r1, #3
 8000144:       f240 116c       movw    r1, #364        ; 0x16c
 8000148:       f6c0 0100       movt    r1, #2048       ; 0x800
 800014c:       f000 f801       bl      8000152 <__aeabi_memcpy4>
 8000150:       bd80            pop     {r7, pc}

08000152 <__aeabi_memcpy4>:
 8000152:       f000 b800       b.w     8000156 <memcpy>

08000156 <memcpy>:
 8000156:       2a00            cmp     r2, #0
 8000158:       bf08            it      eq
 800015a:       4770            bxeq    lr
 800015c:       4684            mov     ip, r0
 800015e:       f811 3b01       ldrb.w  r3, [r1], #1
 8000162:       3a01            subs    r2, #1
 8000164:       f80c 3b01       strb.w  r3, [ip], #1
 8000168:       d1f9            bne.n   800015e <memcpy+0x8>
 800016a:       4770            bx      lr

I didn't have to modify this branch.

@japaric
Copy link
Member Author

japaric commented Aug 8, 2016

So, perhaps #![no_builtins] breaks with LTO.

LTO is very important for my embedded projects ...

@Amanieu
Copy link
Member

Amanieu commented Aug 8, 2016

Does #[inline(never)] help at all?

@japaric
Copy link
Member Author

japaric commented Aug 8, 2016

Does #[inline(never)] help at all?

Sadly, no.

@Amanieu
Copy link
Member

Amanieu commented Aug 8, 2016

OK, one last thing to check, does the infinite recursion still happen if you remove #[linkage = "weak"]?

@Amanieu
Copy link
Member

Amanieu commented Aug 8, 2016

Oh, and what happens if you add #![no_builtins] to your application?

@japaric
Copy link
Member Author

japaric commented Aug 8, 2016

OK, one last thing to check, does the infinite recursion still happen if you remove #[linkage = "weak"]?

Fixed! With or without LTO:

08000120 <start>:
 8000120:       b580            push    {r7, lr}
 8000122:       f240 0000       movw    r0, #0
 8000126:       f240 0100       movw    r1, #0
 800012a:       f2c2 0000       movt    r0, #8192       ; 0x2000
 800012e:       f2c2 0100       movt    r1, #8192       ; 0x2000
 8000132:       1a09            subs    r1, r1, r0
 8000134:       f021 0203       bic.w   r2, r1, #3
 8000138:       f240 1160       movw    r1, #352        ; 0x160
 800013c:       f6c0 0100       movt    r1, #2048       ; 0x800
 8000140:       f000 f803       bl      800014a <__aeabi_memcpy4>
 8000144:       e7fe            b.n     8000144 <start+0x24>

08000146 <__default_handler>:
 8000146:       be00            bkpt    0x0000
 8000148:       4770            bx      lr

0800014a <__aeabi_memcpy4>:
 800014a:       2a00            cmp     r2, #0
 800014c:       bf08            it      eq
 800014e:       4770            bxeq    lr
 8000150:       f811 3b01       ldrb.w  r3, [r1], #1
 8000154:       3a01            subs    r2, #1
 8000156:       f800 3b01       strb.w  r3, [r0], #1
 800015a:       d1f9            bne.n   8000150 <__aeabi_memcpy4+0x6>
 800015c:       4770            bx      lr

@japaric
Copy link
Member Author

japaric commented Aug 8, 2016

Oh, and what happens if you add #![no_builtins] to your application?

This also fixes it. Keeping memcpy as weak, LTO enabled and adding no_builtins to the binary crate:

08000120 <start>:
 8000120:       b580            push    {r7, lr}
 8000122:       f240 0000       movw    r0, #0
 8000126:       f240 0100       movw    r1, #0
 800012a:       f2c2 0000       movt    r0, #8192       ; 0x2000
 800012e:       f2c2 0100       movt    r1, #8192       ; 0x2000
 8000132:       1a09            subs    r1, r1, r0
 8000134:       f021 0203       bic.w   r2, r1, #3
 8000138:       f240 1164       movw    r1, #356        ; 0x164
 800013c:       f6c0 0100       movt    r1, #2048       ; 0x800
 8000140:       f000 f80e       bl      8000160 <__aeabi_memcpy4>
 8000144:       e7fe            b.n     8000144 <start+0x24>

08000146 <__default_handler>:
 8000146:       be00            bkpt    0x0000
 8000148:       4770            bx      lr

0800014a <memcpy>:
 800014a:       2a00            cmp     r2, #0
 800014c:       bf08            it      eq
 800014e:       4770            bxeq    lr
 8000150:       4684            mov     ip, r0
 8000152:       f811 3b01       ldrb.w  r3, [r1], #1
 8000156:       3a01            subs    r2, #1
 8000158:       f80c 3b01       strb.w  r3, [ip], #1
 800015c:       d1f9            bne.n   8000152 <memcpy+0x8>
 800015e:       4770            bx      lr

08000160 <__aeabi_memcpy4>:
 8000160:       f7ff bff3       b.w     800014a <memcpy>

@Amanieu
Copy link
Member

Amanieu commented Aug 8, 2016

OK, so here's what I think we should do: place weak versions of mem* in a separate crate which is compiled to a staticlib. Then link that staticlib into rustc-builtins. This will allow us to continue using LTO for the main project, while building only that static lib with #![no_builtins].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants