Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommendation for allocators when linking with C* binary #1238

Open
keith opened this issue Mar 30, 2022 · 18 comments
Open

Recommendation for allocators when linking with C* binary #1238

keith opened this issue Mar 30, 2022 · 18 comments
Labels

Comments

@keith
Copy link
Member

keith commented Mar 30, 2022

When using rules_rust to link a rust library into a final C / C++ binary, you end up hitting some linker issues around missing allocator symbols:

Undefined symbols for architecture arm64:
  "___rust_alloc", referenced from:
      core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h2255888a02196d18 in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h8c8e34d5191bcf2c in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      alloc::raw_vec::finish_grow::hf9c3a7b5d277e764 in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      addr2line::Context$LT$R$GT$::find_frames::h2faede176ceedb5a in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      addr2line::ResDwarf$LT$R$GT$::parse::hde4e81cf26db861b in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      addr2line::ResUnit$LT$R$GT$::parse_lines::h989d285cff418912 in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      addr2line::ResUnit$LT$R$GT$::render_file::h057f5d698fb8b6fa in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      ...
  "___rust_alloc_error_handler", referenced from:
      alloc::alloc::handle_alloc_error::rt_error::h0566a1d251bd0a5b in liballoc-7e50779556d46264.a(alloc-7e50779556d46264.alloc.6069a24a-cgu.0.rcgu.o)
  "___rust_alloc_zeroed", referenced from:
      _$LT$std..sys..unix..fs..File$u20$as$u20$core..fmt..Debug$GT$::fmt::h858fd589f450b4fa in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
  "___rust_dealloc", referenced from:
      core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h5915480aa2f4d427 in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h8c8e34d5191bcf2c in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::he32668d17116d3cf in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      core::ptr::drop_in_place$LT$core..option..Option$LT$std..sys..unix..process..process_common..CStringArray$GT$$GT$::hc73bf8e1be5ce67e in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      core::ptr::drop_in_place$LT$alloc..collections..btree..map..BTreeMap$LT$u64$C$gimli..read..abbrev..Abbreviation$GT$$GT$::hd615d11dfc761993 in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      core::ptr::drop_in_place$LT$$LP$std..ffi..os_str..OsString$C$core..option..Option$LT$std..ffi..os_str..OsString$GT$$RP$$GT$::h53e782e9cd3898db in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      core::ptr::drop_in_place$LT$$LT$std..backtrace..Backtrace$u20$as$u20$core..fmt..Display$GT$..fmt..$u7b$$u7b$closure$u7d$$u7d$$GT$::h11130cb13523a05c in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      ...
  "___rust_realloc", referenced from:
      alloc::raw_vec::finish_grow::hf9c3a7b5d277e764 in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      addr2line::Context$LT$R$GT$::find_frames::h2faede176ceedb5a in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      addr2line::ResUnit$LT$R$GT$::parse_lines::h989d285cff418912 in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      std::env::current_dir::h829333952c66e8ad in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      std::ffi::c_str::CString::from_vec_unchecked::hc7adf6773960680b in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      std::ffi::c_str::CString::from_vec_with_nul_unchecked::h6cd7e60a1e34c996 in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      std::ffi::c_str::CString::from_vec_with_nul::he767a74f4ba7920f in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      ...

After reading some various issues it sounds like the core issue is that rust mostly expects to handle linking itself, and provides these manually during the final link, but when linking it into a C binary with bazel, we don't go through that logic and therefore end up missing these symbols.

This change 802bcf9 potentially improves this by allowing you to add a global allocator_library which can provide these symbols and solve this. But as far as I can tell that would currently require reproducing a lot of toolchain setup boilerplate, as well as a library containing the necessary symbols (one of which can be found here https://github.com/chromium/chromium/blob/42bd3d0802e90ca093450a2683ff2a104e10e48a/build/rust/std/remap_alloc.cc)

I'm hoping for some input on the best way to solve this, ideally for everyone, and ideally in rules_rust. One potential thing I could see working is defaulting the allocator library to a library in rules_rust containing something like chromium's solution. If that would be too invasive we could potentially expose the allocator_library attribute and pipe that through all the toolchain setup so users could specify one globally in their WORKSPACE without having to add too much boilerplate.

Please let me know what you think!

@sayrer
Copy link
Contributor

sayrer commented Mar 30, 2022

Does the same program link on x86_64 (Mac or Linux)? It could just be a bogus platform name, we've hit that before: https://github.com/bazelbuild/rules_rust/pull/806/files

@keith
Copy link
Member Author

keith commented Mar 30, 2022

Ah sorry I meant to attach my repro case too rustrepro.zip, this + bazel build main is all you need to get the above failure. This also fails on my x86_64 ubuntu machine

@keith
Copy link
Member Author

keith commented Apr 1, 2022

So also interestingly, this is avoided if you use rust_static_library instead of rust_library within the same build, which isn't the recommended usage from the docs where it says it should only be used when vendoring that final library. But in my test case the original usage with an older version of rules_rust was using create_type = "staticlib", which I guess would be equivalent to still using rust_static_library?

@keith
Copy link
Member Author

keith commented Apr 14, 2022

Thinking more about the rust_static_library workaround, maybe it would make sense to include a single rust_static_library in your build graph, to get the allocator infra, and then use everything else as a rust_library, similar to how you would inject a single cc_library in the build graph

@UebelAndre
Copy link
Collaborator

@hlopko wondering if you have any thoughts here.

@hlopko
Copy link
Member

hlopko commented Apr 27, 2022

Hi folks,

You're right, 802bcf9 was created so we have a way to provide those allocator shims to the linker somehow. Internally we use pretty much exactly what you found in chromium. rust_static_librtary/staticlib is tricky - you have to make sure you only have one staticlib in your transitive closure otherwise you have duplicated symbols error at link time.

Ideally we would get a supported solution upstream. rust-lang/rust#73632 is the tracking bug. In the meantime we will probably continue with the workaround like chromium does.

FTR, we don't expect to support custom allocators in our Bazel setup, we will use the allocator configured for C++ through --custom_malloc.

@keith
Copy link
Member Author

keith commented Apr 30, 2022

It looks like #1298 removes the rust_static_library workaround for this case anyways as discussed in #1063

Should we provide a default allocator in rules_rust for this case?

@keith
Copy link
Member Author

keith commented May 2, 2022

I actually realized that even with #1298 CcInfo is still propagated, maybe that one should similarly be restricted? I noticed because we have a rust_static_library depended on by other cc_library targets and it still works just fine with that patch. cc @scentini 

@scentini
Copy link
Collaborator

scentini commented May 3, 2022

I think cc_library -> rust_static_library is a valid dependency, and as such we want to emit CcInfo for rust_static_library. The rust_static_library workaround you mentioned above should still be valid. Is that not the case?

@keith
Copy link
Member Author

keith commented May 3, 2022

It is still valid, but I'm not sure if we want it to be 😛?

@scentini
Copy link
Collaborator

scentini commented May 4, 2022

Yes, there is a benefit to propagating a CcInfo from a rust_static_library in that one can depend on a rust_static_library from a cc_library without specifying additional linker arguments.

@bsilver8192
Copy link
Contributor

If it helps, I added a full working example in #1350. It takes a bit to set up, but I think it will work somewhat robustly this way. Rust toolchain changes may break it, but I don't think any recent ones would. A setup based on that PR works with all my toolchains, which include cross compiling with both GCC and Clang, and linking with llvm-ld and GNU ld.

@keith
Copy link
Member Author

keith commented May 20, 2022

Nice, I also have a demo with mobile in https://github.com/keith/bazel-rust-mobile-demo/

@bsilver8192
Copy link
Contributor

I suspect your approach will fail when building a cc_test with more complex codebases keith. It's definitely simpler, and if the caveats are acceptable then it looks nice, but I'd like to lay out some things for you or others who try using it. Also reasons why this is not a good approach for rules_rust to adopt. If you use llvm-ld or another linker which ignores the traditional Unix linker search algorithm and just searches all objects, then none of this applies, it'll work great.

If I'm understanding it correctly, you only have a dependency from one rust_library to the allocator shim implementation, which means the shared objects built for each library in test mode (at least for cc_library, not sure what rust_library currently does) won't have those symbols, so they won't link. You could add that to the deps of each rust_library manually, but that gets pretty tedious, especially if you use external dependencies with rust_libraries you don't write (either cargo-raze or Bazel external repositories with pre-written BUILD files).

Also, if no rust_library with a dependency on alloc uses a certain function, but something else without that dependency does, then Bazel's arbitrary choice of link order determines whether those symbols get linked in or linking fails.

@GregBowyer
Copy link
Contributor

I hit this without a custom allocator using GCC (on clang it works fine ala 802bcf9).

An alternative workaround to the one posted in #1350, is to do a stub of a rust binary that essentially calls into C++. I made the C++ have a int app_main(int argc, char** argv) and the following rust src

use std::ffi::{CString};
use std::os::raw::{c_char, c_int};

extern "C" {
    fn app_main(num_args: c_int, args: *const *const c_char) -> c_int;
}

fn main() {
    let args = std::env::args_os()
        .map(|arg| arg.to_string_lossy().to_string())
        .map(|arg| arg.into_bytes())
        .map(|arg| unsafe { CString::from_vec_unchecked(arg) })
        .collect::<Vec<_>>();

    let rc = unsafe { app_main(args.len() as c_int, args.as_ptr() as *const *const c_char) };

    std::process::exit(rc);
}

This is untested on anything other than linux and is IMHO a hack, but might be useful for people who need to tweak a binary here or there.

Of interest, this seems to only happen to me for some link targets of the form rust_library -> cc_library -> cc_library -> cc_binary I have found others link just fine rust_library -> cc_library -> cc_binary

GregBowyer added a commit to GregBowyer/rules_rust that referenced this issue Jul 27, 2022
Previous work has made `rust_library` operate as a `CcInfo` provider. As far as I can tell this lets the compiler directly consume an rlib rather than a formal static or shared library crate type.

This is a fine experiment but is a poor recommendation in the documentation, I would suggest that until rust-lang/rust#73632 is in a better place we dont recommend people use a `rust_library` target directly where `CcInfo` is expected. I found that this failed for me in ways similar to bazelbuild#1238 even without stating a custom allocator. After fixing this for Linux + GCC this then failed in macos *and* windows.

It seems that we are jumping the gun slightly and encouraging a behavior that only works with a specific configuration as the default, rather than as something to trial.
@GregBowyer
Copy link
Contributor

Alas this failed for me on macos, I think recommending rust_library for CcInfo provision is a little flawed, even if the older approach has its issues.

@bsilver8192
Copy link
Contributor

Update on this: #1490 added the rust_toolchain.allocator_library attribute which allows setting a cc_library that implements these symbols. Actually creating that library is still tricky though. You can get started by copying the one that PR includes, but it may need updates to work with different rustc versions.

@bsilver8192
Copy link
Contributor

For anybody looking to use a non-default global allocator with allocator_library, here's how I do it:

//! Hand-written allocator shims to use `my_allocator` with Bazel. This is used instead of
//! `#[global_allocator]`.

use std::alloc::{GlobalAlloc, Layout};

use my_allocator::MyAllocator;

static ALLOC: MyAllocator = MyAllocator;

#[no_mangle]
extern "C" fn __rust_alloc(size: usize, align: usize) -> *mut u8 {
    unsafe { ALLOC.alloc(Layout::from_size_align_unchecked(size, align)) }
}

#[no_mangle]
extern "C" fn __rust_dealloc(ptr: *mut u8, size: usize, align: usize) {
    unsafe { ALLOC.dealloc(ptr, Layout::from_size_align_unchecked(size, align)) }
}

#[no_mangle]
extern "C" fn __rust_realloc(ptr: *mut u8, old_size: usize, align: usize, new_size: usize) -> *mut u8 {
    unsafe { ALLOC.realloc(ptr, Layout::from_size_align_unchecked(old_size, align), new_size) }
}

#[no_mangle]
extern "C" fn __rust_alloc_zeroed(size: usize, align: usize) -> *mut u8 {
    unsafe { ALLOC.alloc_zeroed(Layout::from_size_align_unchecked(size, align)) }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants