Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dlclose() does not behave properly on Mac #47974

Open
dradtke opened this issue Feb 2, 2018 · 15 comments
Open

dlclose() does not behave properly on Mac #47974

dradtke opened this issue Feb 2, 2018 · 15 comments
Labels
A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows A-thread-locals Area: Thread local storage (TLS) C-bug Category: This is a bug. O-macos Operating system: macOS

Comments

@dradtke
Copy link

dradtke commented Feb 2, 2018

This report will reference this repository which reproduces the issue: https://github.com/dradtke/rust-dylib-issues

The Issue

The repository contains an application library, built as a dylib, and two example main programs, one in Rust and one in C. Each main application runs in a loop, loading the library with dlopen(), calling a method, and then closing with dlclose(). The expectation is that any changes to the library will be picked up immediately by the main application when it is recompiled.

However, the behavior between the two programs differs. If I run the two main programs side-by-side, then make a change to the returned message and recompile the library, only the C program immediately reflects the change. The Rust main program won't reflect any changes until it is fully restarted.

It appears that this is Mac-specific behavior. When the same test is run on Debian, the two main programs behave identically.

The Environment

Operating System: macOS Sierra 10.12.6
Rust Version:

rustc 1.23.0 (766bd11c8 2018-01-01)
binary: rustc
commit-hash: 766bd11c8a3c019ca53febdcd77b2215379dd67d
commit-date: 2018-01-01
host: x86_64-apple-darwin
release: 1.23.0
LLVM version: 4.0

C Compiler:

Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 8.0.0 (clang-800.0.38)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
@cuviper
Copy link
Member

cuviper commented Feb 2, 2018

Your main.rs includes extern crate app; -- it may be that the linker on Linux is trimming the unused dependency but macOS is keeping it linked. If the library is loaded at startup, then dlopen/dlclose will just be bumping the reference count up and down.

@dradtke
Copy link
Author

dradtke commented Feb 2, 2018

Ah, that's a good call. Unfortunately, it looks like removing extern crate app; causes it to segfault, which also doesn't happen on Linux.

@cuviper cuviper added A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows C-bug Category: This is a bug. O-macos Operating system: macOS labels Feb 3, 2018
@cuviper
Copy link
Member

cuviper commented Feb 3, 2018

Can you capture any information about the segfault? Perhaps a debugger backtrace?

@nagisa
Copy link
Member

nagisa commented Feb 3, 2018

Likely a duplicate of #28794.

@nagisa
Copy link
Member

nagisa commented Feb 3, 2018

A quick look at your Rust code reveals it invoking undefined behaviour. You use CString to null-terminate your literals, however CString::new(&symbol[..]).unwrap().into_raw() will immediately free the buffer CString allocates so the C code reads an invalid pointer.

This could also be a cause for different behaviour.

@dradtke
Copy link
Author

dradtke commented Feb 5, 2018

Here's what the debugger says when I run it:

Process 12004 launched: '/Users/dradtke/Workspace/rust/dylib/main/target/debug/main' (x86_64)
Message: hello there world
Process 12004 stopped
* thread #1: tid = 0x78d98d, 0x00007fff9932dca9 libsystem_platform.dylib`OSSpinLockLock + 7, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00007fff9932dca9 libsystem_platform.dylib`OSSpinLockLock + 7
libsystem_platform.dylib`OSSpinLockLock:
->  0x7fff9932dca9 <+7>:  lock
    0x7fff9932dcaa <+8>:  cmpxchgl %ecx, (%rdi)
    0x7fff9932dcad <+11>: jne    0x7fff9932dcb0            ; <+14>
    0x7fff9932dcaf <+13>: retq

And the full backtrace:

warning: could not load any Objective-C class information. This will significantly reduce the quality of type information available.
* thread #1: tid = 0x78d98d, 0x00007fff9932dca9 libsystem_platform.dylib`OSSpinLockLock + 7, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x00007fff9932dca9 libsystem_platform.dylib`OSSpinLockLock + 7
    frame #1: 0x00000001000250c6 main`je_arena_dalloc_large [inlined] je_malloc_mutex_lock + 38 at mutex.h:99 [opt]
    frame #2: 0x00000001000250ba main`je_arena_dalloc_large(tsdn=0x000000010060d008, arena=0x4d746c7561666544, chunk=0x0000000100200000, ptr=0x00000001003002c0) + 26 at arena.c:3075 [opt]
    frame #3: 0x0000000100026625 main`je_arena_ralloc [inlined] je_arena_sdalloc(slow_path=true) + 12 at arena.h:1516 [opt]
    frame #4: 0x0000000100026619 main`je_arena_ralloc [inlined] je_isdalloct(slow_path=true) + 164 at jemalloc_internal.h:1195 [opt]
    frame #5: 0x0000000100026575 main`je_arena_ralloc [inlined] je_isqalloc(slow_path=true) at jemalloc_internal.h:1205 [opt]
    frame #6: 0x0000000100026575 main`je_arena_ralloc(tsd=0x000000010060d008, arena=0x0000000000000000, ptr=<unavailable>, oldsize=<unavailable>, size=<unavailable>, alignment=<unavailable>, zero=<unavailable>, tcache=<unavailable>) + 2037 at arena.c:3376 [opt]
    frame #7: 0x000000010001cc79 main`je_rallocx [inlined] je_iralloct(ptr=<unavailable>, oldsize=<unavailable>, alignment=0, tcache=<unavailable>, arena=0x0000000000000000) + 263 at jemalloc_internal.h:1259 [opt]
    frame #8: 0x000000010001cb72 main`je_rallocx(ptr=0x00000001003002c0, size=33, flags=<unavailable>) + 674 at jemalloc.c:2414 [opt]
    frame #9: 0x0000000100019ee1 main`alloc_jemalloc::contents::__rde_realloc + 81 at lib.rs:170 [opt]
    frame #10: 0x0000000100002d68 main`alloc::vec::{{impl}}::reserve_exact<u8> [inlined] alloc::heap::{{impl}}::realloc + 19 at heap.rs:127 [opt]
    frame #11: 0x0000000100002d55 main`alloc::vec::{{impl}}::reserve_exact<u8> [inlined] alloc::raw_vec::{{impl}}::reserve_exact<u8,alloc::heap::Heap> + 28 at raw_vec.rs:429 [opt]
    frame #12: 0x0000000100002d39 main`alloc::vec::{{impl}}::reserve_exact<u8> + 25 at vec.rs:486 [opt]
    frame #13: 0x0000000100006bee main`std::ffi::c_str::{{impl}}::from_vec_unchecked + 30 at c_str.rs:360 [opt]
    frame #14: 0x0000000100006ba2 main`std::ffi::c_str::{{impl}}::_new + 114 at c_str.rs:335 [opt]
    frame #15: 0x00000001000020fc main`std::ffi::c_str::{{impl}}::new<&str>(t=(data_ptr = "../app/target/debug/libapp.dylibget_message", length = 32)) + 60 at c_str.rs:329
    frame #16: 0x0000000100002246 main`main::main + 102 at main.rs:19
    frame #17: 0x000000010003fc0f main`panic_unwind::__rust_maybe_catch_panic + 31 at lib.rs:101 [opt]
    frame #18: 0x000000010000fab9 main`std::rt::lang_start [inlined] std::panicking::try<(),closure> + 51 at panicking.rs:459 [opt]
    frame #19: 0x000000010000fa86 main`std::rt::lang_start [inlined] std::panic::catch_unwind<closure,()> at panic.rs:365 [opt]
    frame #20: 0x000000010000fa86 main`std::rt::lang_start + 422 at rt.rs:58 [opt]
    frame #21: 0x0000000100002705 main`main + 37
    frame #22: 0x00007fff9911f235 libdyld.dylib`start + 1
    frame #23: 0x00007fff9911f235 libdyld.dylib`start + 1

@cuviper
Copy link
Member

cuviper commented Feb 5, 2018

@nagisa

however CString::new(&symbol[..]).unwrap().into_raw() will immediately free the buffer

That's not true -- CString::into_raw() relinquishes ownership, and that will just leak unless you pass the memory back to CString::from_raw() later.

But that does highlight to me that the other from_raw() calls are problematic. Especially CString::from_raw(dlerror()), as dlerror()'s return value is not meant to be freed by the caller. That should probably be CStr::from_ptr() instead.

The other CString::from_raw(func()) might be OK, when you're absolutely sure that func() is returning memory that came from CString::into_raw(). Plus, those CStrings need to be using the same allocator, which is what I suspect broke after removing extern crate app, since the crash is in jemalloc.

Generally speaking, allocating in one domain and freeing in another is fraught with danger.

@ubolonton
Copy link

I have a somewhat similar problem with my library, so I tried the repository above.
My Rust version is the same, but I'm on High Sierra 10.13.3.

I ran it with DYLD_PRINT_APIS=1 to see dyld log.

It (reloading) actually worked correctly.

dlopen(../app/target/debug/libapp.dylib, 0x00000002)
dyld_image_path_containing_address(0x1019ce000)
  dlopen(../app/target/debug/libapp.dylib) ==> 0x10261b000
dlsym(0x10261b000, get_message)
  dlsym(0x10261b000, get_message) ==> 0x1019cf620
Message: hello world
dlclose(0x10261b000)
dlclose(), found unused image 0x10261b000 libapp.dylib
dlclose(), deleting 0x10261b000 libapp.dylib

dlopen(../app/target/debug/libapp.dylib, 0x00000002)
dyld_image_path_containing_address(0x1019ce000)
  dlopen(../app/target/debug/libapp.dylib) ==> 0x10261b000
dlsym(0x10261b000, get_message)
  dlsym(0x10261b000, get_message) ==> 0x1019cf620
Message: hello world
dlclose(0x10261b000)
dlclose(), found unused image 0x10261b000 libapp.dylib
dlclose(), deleting 0x10261b000 libapp.dylib

When I changed it to crate-type = ["cdylib"], dlclose no longer unloaded the lib (and the program either segfaulted or returned with error).

dlopen(../app/target/debug/libapp.dylib, 0x00000002)
dyld_image_path_containing_address(0x10e772000)
  dlopen(../app/target/debug/libapp.dylib) ==> 0x7fe1e0700000
dlsym(0x7fe1e0700000, get_message)
  dlsym(0x7fe1e0700000, get_message) ==> 0x10e773460
Message: hello world
dlclose(0x7fe1e0700000)

dlopen(../app/target/debug/libapp.dylib, 0x00000002)
  dlopen(../app/target/debug/libapp.dylib) ==> 0x7fe1e0700000
dlsym(0x7fe1e0700000, get_mess)
  dlsym(0x7fe1e0700000, get_mess) ==> NULL
dlerror()
Failed to retrieve get_message symbol: dlsym(0x7fe1e0700000, get_mess): symbol not found

This is quite weird, since the problem I have with my library is the opposite: unloading worked in Sierra, but stopped working in High Sierra (regardless of crate-type).

@haudan
Copy link

haudan commented Apr 28, 2018

I'm experiencing this problem as well, on High Sierra (10.13.4). I noticed the following:

When dlcloseing a library written in C (clang -shared), the dylib gets unloaded as expected.

dlclose(0x7fc3eaf8b000)
3043 dlclose(), found unused image 0x7fc3eaf8b000 libhsgame.dylib
3044 dlclose(), deleting 0x7fc3eaf8b000 libhsgame.dylib

When I try the same exact thing again with an identical cdylib written in Rust, dlclose does not unload the library. A refcount > 0 can't be the problem, because even when I dlclose the cdylib 100 times in a loop, dlclose still refuses to release the library (DYLD_PRINT_APIS=1 confirms it only gets opened once and closed 100 times).

To my knowledge, dlclose only refuses to release a dylib when, other than a refcount > 0, the dylib is still being used somewhere (pointers holding addresses of the lib's symbols still exist), to avoid dangling pointers.

If that's the case, then the question is, where do these pointers come from? If not, what the hell else is going on?

@haudan
Copy link

haudan commented Apr 28, 2018

Ok, I think I found something - I tried two more things with my Rust cdylib:

  • Switching to the system allocator, no effect.
  • Switching to the system allocator and turning the cdylib into a no_std crate, this fixes the problem - dlclose releases the lib.
rustc --version
rustc 1.27.0-nightly (ac3c2288f 2018-04-18)

@nagisa
Copy link
Member

nagisa commented Apr 28, 2018

There’s a recent change in OS X that has "improved" dlclose recently to not actually unload libraries if some conditions are satisfied. See this comment. Perhaps that’s the reason your library wasn’t unloaded?

@haudan
Copy link

haudan commented Apr 28, 2018

Thanks for the link @nagisa, this definitely seems to be related. Do you know a page where all cases, that make a dylib un-unloadable, are listed? @nanotech's comment only lists a few and I'd like to figure the exact reason why Rust cdylibs fall into that category.

@nagisa
Copy link
Member

nagisa commented Apr 28, 2018

Historically, thread local storage (__thread) is what causes rust dylibs to not get unloaded or generally not work right with dlclose.

I don’t know of a full list though.

@thomcc thomcc added the A-thread-locals Area: Thread local storage (TLS) label Sep 23, 2022
@thomcc
Copy link
Member

thomcc commented Sep 23, 2022

This is very likely to be related to the issues described in #88737 and #88737 (comment). Fixing this will likely not result in the behavior the user wants though -- the fact that dlclose works anywhere is kind of a bug, it failing to unload is actually macOS doing the right thing.

In general you should not dlclose rust libraries that use libstd. There's no way for us to support this on many targets (and we don't quite support it correctly on all the platforms where we could, which is why sometimes it will unload, which can be unsound).

Unfortunately, dlclose is just not really coherent in programs which have thread local storage (in particular if destructors to be run on that TLS data). See that issue for an explanation of why.

@bjorn3
Copy link
Member

bjorn3 commented Jan 16, 2023

Musl libc doesn't even implement dlclose at all. It returns without doing anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows A-thread-locals Area: Thread local storage (TLS) C-bug Category: This is a bug. O-macos Operating system: macOS
Projects
None yet
Development

No branches or pull requests

7 participants