ProxyAllocator::alloc can deadlock if another thread panics #102

marcelo-gonzalez · 2022-11-02T15:57:52Z

Hard to say what to do about this, but I think there is an issue with trying to get backtraces from within the global allocator. I have observed a deadlock where one thread is here:

#0  __lll_lock_wait (futex=futex@entry=0x7ff302290990 <_rtld_global+2352>, private=0) at lowlevellock.c:52
#1  0x00007ff302038131 in __GI___pthread_mutex_lock (mutex=0x7ff302290990 <_rtld_global+2352>) at ../nptl/pthread_mutex_lock.c:115
#2  0x00007ff301e43291 in __GI___dl_iterate_phdr (callback=0x7ff30206c5f0, data=0x7ff05cdf9c50) at dl-iteratephdr.c:40
#3  0x00007ff30206d6c1 in _Unwind_Find_FDE () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#4  0x00007ff302069868 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#5  0x00007ff30206aa20 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#6  0x00007ff30206b76c in _Unwind_Backtrace () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#7  0x00005557bb535f94 in <near_rust_allocator_proxy::allocator::ProxyAllocator<A> as core::alloc::global::GlobalAlloc>::alloc ()
#8  0x00005557bbe28275 in <actix::fut::future::FutureWrap<F,A> as actix::fut::future::ActorFuture<A>>::poll ()
...
...

and another is here:

#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00005557bc63bbf0 in std::sys::unix::futex::futex_wait ()
#2  0x00005557bb4c0958 in std::sys::unix::locks::futex_mutex::Mutex::lock_contended ()
#3  0x00005557bb64eaea in backtrace::lock::lock ()
#4  0x00005557bb535f59 in <near_rust_allocator_proxy::allocator::ProxyAllocator<A> as core::alloc::global::GlobalAlloc>::alloc ()
#5  0x00005557bc640930 in std::backtrace_rs::symbolize::gimli::libs_dl_iterate_phdr::callback ()
#6  0x00007ff301e433d5 in __GI___dl_iterate_phdr (callback=0x5557bc6408e0 <std::backtrace_rs::symbolize::gimli::libs_dl_iterate_phdr::callback>, data=0x7ff05cfface8) at dl-iteratephdr.c:75
#7  0x00005557bc66dea4 in std::sys_common::backtrace::_print_fmt::{{closure}} ()
#8  0x00005557bc65ba41 in std::backtrace_rs::backtrace::libunwind::trace::trace_fn ()
#9  0x00007ff30206b794 in _Unwind_Backtrace () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#10 0x00005557bc66dac0 in <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt ()
#11 0x00005557bb6d646c in core::fmt::write ()
#12 0x00005557bc63add0 in std::io::Write::write_fmt ()
#13 0x00005557bc66e94e in std::panicking::default_hook::{{closure}} ()
#14 0x00005557bc66f6f7 in std::panicking::rust_panic_with_hook ()
#15 0x00005557bc66f292 in std::panicking::begin_panic_handler::{{closure}} ()
#16 0x00005557bc66f206 in std::sys_common::backtrace::__rust_end_short_backtrace ()
#17 0x00005557bc66f1c2 in rust_begin_unwind ()
#18 0x00005557bb40d292 in core::panicking::panic_fmt ()
...
...

I think what is happening here is that the first one above holds the backtrace lock, and is trying to acquire a glibc lock protecting dynamically loaded libs inside of dl_iterate_phdr(), and the second thread holds that lock, but is trying to acquire the backtrace lock while allocating memory, so we have a deadlock.

If I understand correctly, this is really possible in any case where we have at least 2 threads that allocate memory from rust, and one of them calls std::sys_common::backtrace::print(), e.g. from panic!(), since the std::backtrace_rs::symbolize::gimli implementation will call the glibc dl_iterate_phdr() function and allocate from rust inside its callback. So maybe the whole prospect of printing backtraces from ProxyAllocator::alloc() is doomed?

The text was updated successfully, but these errors were encountered:

setting the global allocator to the one provided by near-rust-allocator-proxy leads to different sorts of problems (deadlocks and invalid thread local accesses), and that won't change until/unless it's rewritten somewhat. For now, we can't really use it correctly, so we'll just remove it. Shouldn't be too hard to re-add in the future if it's fixed. See these issues: near/near-memory-tracker#103 near/near-memory-tracker#102 As a result of this, perf builds no longer print memory stats in logs. Also we're just going to use std::thread::current().id() instead of near_rust_allocator_proxy::get_tid() (and we won't sort them in logs), but this shouldn't be too much of a problem

…xy (#8268) setting the global allocator to the one provided by near-rust-allocator-proxy leads to different sorts of problems (deadlocks and invalid thread local accesses), and that won't change until/unless it's rewritten somewhat. For now, we can't really use it correctly, so we'll just remove it. Shouldn't be too hard to re-add in the future if it's fixed. See these issues: near/near-memory-tracker#103 near/near-memory-tracker#102 As a result of this, perf builds no longer print memory stats in logs. Also we're just going to use `std::thread::current().id()` instead of `near_rust_allocator_proxy::get_tid()` (and we won't sort them in logs), but this shouldn't be too much of a problem

marcelo-gonzalez mentioned this issue Dec 22, 2022

features: remove the memory_stats feature and near-rust-allocator-proxy near/nearcore#8268

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ProxyAllocator::alloc can deadlock if another thread panics #102

ProxyAllocator::alloc can deadlock if another thread panics #102

marcelo-gonzalez commented Nov 2, 2022

ProxyAllocator::alloc can deadlock if another thread panics #102

ProxyAllocator::alloc can deadlock if another thread panics #102

Comments

marcelo-gonzalez commented Nov 2, 2022