Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy_nonoverlapping optimizes to a __aeabi_memcpy function that recurses infinitely #31544

Closed
japaric opened this issue Feb 10, 2016 · 6 comments · Fixed by #35637
Closed

copy_nonoverlapping optimizes to a __aeabi_memcpy function that recurses infinitely #31544

japaric opened this issue Feb 10, 2016 · 6 comments · Fixed by #35637

Comments

@japaric
Copy link
Member

japaric commented Feb 10, 2016

on a custom target with no-compiler-rt: true.

This binary:

08000000 <_ZN10EXCEPTIONS20h109777d051950307UbaE>:
 8000000:       20002000        andcs   r2, r0, r0
 8000004:       08000009        stmdaeq r0, {r0, r3}

08000008 <__reset>:
 8000008:       b580            push    {r7, lr}
 800000a:       f240 0000       movw    r0, #0
 800000e:       f240 0100       movw    r1, #0
 8000012:       f2c2 0000       movt    r0, #8192       ; 0x2000
 8000016:       f2c2 0100       movt    r1, #8192       ; 0x2000
 800001a:       1a09            subs    r1, r1, r0
 800001c:       f021 0203       bic.w   r2, r1, #3
 8000020:       f240 0140       movw    r1, #64 ; 0x40
 8000024:       f6c0 0100       movt    r1, #2048       ; 0x800
 8000028:       f000 f806       bl      8000038 <__aeabi_memcpy4>
 800002c:       f240 0000       movw    r0, #0
 8000030:       f6c0 0000       movt    r0, #2048       ; 0x800
 8000034:       6800            ldr     r0, [r0, #0]
 8000036:       e7fe            b.n     8000036 <__reset+0x2e>

08000038 <__aeabi_memcpy4>:
 8000038:       f000 b800       b.w     800003c <__aeabi_memcpy>

0800003c <__aeabi_memcpy>:
 800003c:       e7fe            b.n     800003c <__aeabi_memcpy>

Was generated by this cargo project:

// src/main.rs
#![feature(core_intrinsics)]
#![feature(lang_items)]

#![no_std]

extern crate rlibc;

use core::{intrinsics, mem};

// Entry point
#[no_mangle]
pub unsafe extern "C" fn __reset() {
    init_data();

    // Make sure the compiler doesn't remove the EXCEPTIONS symbol
    intrinsics::volatile_load(&EXCEPTIONS[0]);

    loop {}
}

// Initialize the data section
unsafe fn init_data() {
    extern "C" {
        static __DATA_LOAD: u32;

        static mut __DATA_END: u32;
        static mut __DATA_START: u32;
    }

    let n = (&__DATA_END as *const _ as usize - &__DATA_START as *const _ as usize) /
            mem::size_of::<u32>();

    intrinsics::copy_nonoverlapping(&__DATA_LOAD, &mut __DATA_START, n);
}

// This is how compiler-rt defines this symbol
#[no_mangle]
pub unsafe extern "C" fn __aeabi_memcpy4(dest: *mut u8, src: *const u8, size: usize) {
    rlibc::memcpy(dest, src, size);
}

// This is how compiler-rt defines this symbol
#[no_mangle]
pub unsafe extern "C" fn __aeabi_memcpy(dest: *mut u8, src: *const u8, size: usize) {
    rlibc::memcpy(dest, src, size);
}

// Stuff to build a place symbols in the addresses where the hardware expects them
extern "C" {
    fn __STACK_START();
}

#[link_section = ".exceptions"]
static EXCEPTIONS: [Option<unsafe extern "C" fn()>; 2] = [Some(__STACK_START), Some(::__reset)];

mod lang_items {
    #[lang = "eh_personality"]
    fn eh_personality() {}

    #[lang = "panic_fmt"]
    fn panic_fmt() {}

    // Unused, just to appease the compiler
    #[lang = "start"]
    fn start(_: *const u8, _: isize, _: *const *const u8) -> isize {
        0
    }
}

// Unused, just to appease the compiler
fn main() {}

thumbv7m-none-eabi.json

{
  "arch": "arm",
  "llvm-target": "thumbv7m-none-eabi",
  "os": "none",
  "target-endian": "little",
  "target-pointer-width": "32",

  "cpu": "cortex-m3",
  "executables": true,
  "morestack": false,
  "no-compiler-rt": true,
  "pre-link-args": [
    "-Tlayout.ld",
    "-Wl,--build-id=none",
    "-Wl,--gc-sections",
    "-mcpu=cortex-m3",
    "-mthumb",
    "-nostartfiles"
  ],
  "relocation-model": "static"
}
# Cargo.toml
[package]
authors = []
name = "bug"
version = "0.1.0"

[dependencies]
rlibc = { git = "https://github.com/hackndev/rlibc", branch = "zinc" }
rust-libcore = "0.0.3"
/* layout.ld */
MEMORY
{
    rom(RX)     : ORIGIN = 0x08000000, LENGTH = 128K
    ram(WAIL)   : ORIGIN = 0x20000000, LENGTH = 8K
}

ENTRY(__reset)

__DATA_LOAD = LOADADDR(.data);

SECTIONS
{
    .text : ALIGN(4)
    {
        KEEP(*(.exceptions))
        *(.text*)
    } > rom

    .data : ALIGN(4)
    {
      __DATA_START = .;
      *(.data*)
      . = ALIGN(4);
      __DATA_END = .;
    } > ram AT > rom

    /DISCARD/ :
    {
      *(.ARM.exidx*)
    }

    __STACK_START = ORIGIN(ram) + LENGTH(ram);
}

Version

rustc 1.8.0-nightly (75271d8f1 2016-02-09)

Workarounds

  • Define __aeabi_memcpy4 as a loop instead of using rlibc::memcpy then you can drop the rlibc dependency. But depending on how you write the definition you may end with infinite recursion again.
  • Remove the __aeabi_memcpy4 function and link to a cross compiled libcompiler-rt.a by changing no-compiler-rt to false. This is hard to do because one needs to patch rust-lang/compiler-rt to work with a specific custom target.
@alexcrichton
Copy link
Member

Is the problem here that LLVM recognizes __aeabi_memcpy4 as an implementation of __aeabi_memcpy4 and then tries to lower it to calling itself? The rlibc crate currently solves that via #![no_builtins], and maybe that would help here as well?

@japaric
Copy link
Member Author

japaric commented Feb 11, 2016

Adding #![no_builtins] to the top of the crate, and removing __aeabi_memcpy seems to fix the problem:

08000000 <_ZN10EXCEPTIONS20h77c550f11b0b768cAbaE>:
 8000000:       20002000        andcs   r2, r0, r0
 8000004:       08000009        stmdaeq r0, {r0, r3}

08000008 <__reset>:
 8000008:       b580            push    {r7, lr}
 800000a:       f240 0000       movw    r0, #0
 800000e:       f240 0100       movw    r1, #0
 8000012:       f2c2 0000       movt    r0, #8192       ; 0x2000
 8000016:       f2c2 0100       movt    r1, #8192       ; 0x2000
 800001a:       1a09            subs    r1, r1, r0
 800001c:       f021 0203       bic.w   r2, r1, #3
 8000020:       f240 0154       movw    r1, #84 ; 0x54
 8000024:       f6c0 0100       movt    r1, #2048       ; 0x800
 8000028:       f000 f806       bl      8000038 <__aeabi_memcpy4>
 800002c:       f240 0000       movw    r0, #0
 8000030:       f6c0 0000       movt    r0, #2048       ; 0x800
 8000034:       6800            ldr     r0, [r0, #0]
 8000036:       e7fe            b.n     8000036 <__reset+0x2e>

08000038 <__aeabi_memcpy4>:
 8000038:       f000 b800       b.w     800003c <memcpy>

0800003c <memcpy>:
 800003c:       2a00            cmp     r2, #0
 800003e:       bf08            it      eq
 8000040:       4770            bxeq    lr
 8000042:       4684            mov     ip, r0
 8000044:       f811 3b01       ldrb.w  r3, [r1], #1
 8000048:       3a01            subs    r2, #1
 800004a:       f80c 3b01       strb.w  r3, [ip], #1
 800004e:       d1f9            bne.n   8000044 <memcpy+0x8>
 8000050:       4770            bx      lr

What's the scope of no_builtins though? It seems that applying it directly to a item or putting it inside a module doesn't work. Is it effect crate global? Should I isolate the __aeabi_memcpy4 function and #![no_builtins] into its own crate?

@alexcrichton
Copy link
Member

Ah no_builtins applies to the crate as we end up telling LLVM "don't consider using any builtins for this entire module", it's equivalent to the -fno-builtins argument to gcc I believe.

For now I guess yeah you'd have to isolate it to its own crate, unfortunately. If you want to add this to rlibc though I'd be fine to do that!

@japaric
Copy link
Member Author

japaric commented Feb 12, 2016

If you want to add this to rlibc though I'd be fine to do that!

I'm not sure about doing that because compiler-rt provides this symbol when cross compiled for arm-eabi like targets so that would preclude using compiler-rt and rlibc. Unless __aeabi_memcpy4 is defined as a weak symbol on rlibc.

@alexcrichton
Copy link
Member

Ah yeah I guess that wouldn't work out well then. In any case though it seems like no_builtins did the trick? If so I'm gonna close this, but feel free to reopen if I misinterpreted.

@japaric
Copy link
Member Author

japaric commented Feb 17, 2016

Update: I mentioned that using #![no_builtins] in the same crate works. Isolating __aeabi_memcpy4 into its own crate (with no_builtins) also works, unless LTO is enabled. If LTO is enabled then copy_nonoverlapping optimizes to a recursive call to __aeabi_memcpy as shown in the original report.

japaric pushed a commit to japaric/rust that referenced this issue Aug 13, 2016
this prevents intrinsics like `memcpy` from being mis-optimized to
infinite recursive calls when LTO is used.

fixes rust-lang#31544
closes rust-lang#35540
bors added a commit that referenced this issue Aug 16, 2016
exclude `#![no_builtins]` crates from LTO

this prevents intrinsics like `memcpy` from being mis-optimized to
infinite recursive calls when LTO is used.

fixes #31544
closes #35540

---

r? @alexcrichton
cc @Amanieu
Phaiax added a commit to Phaiax/zinc that referenced this issue Oct 15, 2016
Rust has changed some stuff regarding the core crate and compiler intrinsics (e.g. float to int conversion).
Crates do not need to link to libcore anymore, but need to link against compiler_builtins.

This changes fixes the appearance of recursive intrinsics that happend with larger arrays
https://users.rust-lang.org/t/compiler-generates-recursive-memclr/7575/3
rust-lang/rust#31544
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants