Skip to content

Segmentation fault when thread using dynamically loaded Rust library exits #91979

Open
@devongovett

Description

@devongovett

Scenario: I have a Rust cdylib, which is loaded by a C program via dlopen. The C program creates a thread, and loads the Rust module inside it. It proceeds to call one of the Rust functions, and closes the library via dlclose. Then the thread exits. The Rust program has a thread local variable with a struct that implements Drop, which it modifies in the function called from C.

Full reproduction here: https://github.com/devongovett/rust-threadlocal-bug

On CentOS 7, which uses glibc 2.17, it segfaults at __nptl_deallocate_tsd() inside pthread_create.c. With later versions of glibc, there is no crash. I believe the crash occurs because Rust creates a thread local key with pthread_key_create but never calls pthread_key_delete (the call in the destructor is commented out):

impl Drop for Key {
fn drop(&mut self) {
// Right now Windows doesn't support TLS key destruction, but this also
// isn't used anywhere other than tests, so just leak the TLS key.
// unsafe { imp::destroy(self.key) }
}
}

When the thread exits, glibc tries to call the destructor for the key, but because the dynamic library has already been unloaded via dlclose at this point, the function no longer exists and we get a crash.

My theory is that this only occurs with glibc 2.17 and not later versions is due to __cxa_thread_atexit_impl not existing in these older versions. This function is used when available to register destructors, otherwise a fallback implementation is used:

if !__cxa_thread_atexit_impl.is_null() {
type F = unsafe extern "C" fn(
dtor: unsafe extern "C" fn(*mut u8),
arg: *mut u8,
dso_handle: *mut u8,
) -> libc::c_int;
mem::transmute::<*const libc::c_void, F>(__cxa_thread_atexit_impl)(
dtor,
t,
&__dso_handle as *const _ as *mut _,
);
return;
}
However, I'm not sure about that. It could be some other change in glibc.

I have not tested, but I think the bug could potentially be fixed if the commented out destructor linked above were actually called. The comment indicates something about windows not supporting this, so maybe it could be called conditionally?

glibc 2.17 is indeed pretty old, however, it is the version used by the current CentOS 7 version which is not EOL until 2024, so I do think this bug should be fixed.

Meta

rustc --version --verbose:

rustc 1.57.0 (f1edd0429 2021-11-29)
binary: rustc
commit-hash: f1edd0429582dd29cccacaf50fd134b05593bd9c
commit-date: 2021-11-29
host: x86_64-unknown-linux-gnu
release: 1.57.0
LLVM version: 13.0.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-thread-localsArea: Thread local storage (TLS)C-bugCategory: This is a bug.T-libsRelevant to the library team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions