You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hard to say what to do about this, but I think there is an issue with trying to get backtraces from within the global allocator. I have observed a deadlock where one thread is here:
#0 __lll_lock_wait (futex=futex@entry=0x7ff302290990 <_rtld_global+2352>, private=0) at lowlevellock.c:52
#1 0x00007ff302038131 in __GI___pthread_mutex_lock (mutex=0x7ff302290990 <_rtld_global+2352>) at ../nptl/pthread_mutex_lock.c:115
#2 0x00007ff301e43291 in __GI___dl_iterate_phdr (callback=0x7ff30206c5f0, data=0x7ff05cdf9c50) at dl-iteratephdr.c:40
#3 0x00007ff30206d6c1 in _Unwind_Find_FDE () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#4 0x00007ff302069868 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#5 0x00007ff30206aa20 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#6 0x00007ff30206b76c in _Unwind_Backtrace () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#7 0x00005557bb535f94 in <near_rust_allocator_proxy::allocator::ProxyAllocator<A> as core::alloc::global::GlobalAlloc>::alloc ()
#8 0x00005557bbe28275 in <actix::fut::future::FutureWrap<F,A> as actix::fut::future::ActorFuture<A>>::poll ()
...
...
and another is here:
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x00005557bc63bbf0 in std::sys::unix::futex::futex_wait ()
#2 0x00005557bb4c0958 in std::sys::unix::locks::futex_mutex::Mutex::lock_contended ()
#3 0x00005557bb64eaea in backtrace::lock::lock ()
#4 0x00005557bb535f59 in <near_rust_allocator_proxy::allocator::ProxyAllocator<A> as core::alloc::global::GlobalAlloc>::alloc ()
#5 0x00005557bc640930 in std::backtrace_rs::symbolize::gimli::libs_dl_iterate_phdr::callback ()
#6 0x00007ff301e433d5 in __GI___dl_iterate_phdr (callback=0x5557bc6408e0 <std::backtrace_rs::symbolize::gimli::libs_dl_iterate_phdr::callback>, data=0x7ff05cfface8) at dl-iteratephdr.c:75
#7 0x00005557bc66dea4 in std::sys_common::backtrace::_print_fmt::{{closure}} ()
#8 0x00005557bc65ba41 in std::backtrace_rs::backtrace::libunwind::trace::trace_fn ()
#9 0x00007ff30206b794 in _Unwind_Backtrace () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#10 0x00005557bc66dac0 in <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt ()
#11 0x00005557bb6d646c in core::fmt::write ()
#12 0x00005557bc63add0 in std::io::Write::write_fmt ()
#13 0x00005557bc66e94e in std::panicking::default_hook::{{closure}} ()
#14 0x00005557bc66f6f7 in std::panicking::rust_panic_with_hook ()
#15 0x00005557bc66f292 in std::panicking::begin_panic_handler::{{closure}} ()
#16 0x00005557bc66f206 in std::sys_common::backtrace::__rust_end_short_backtrace ()
#17 0x00005557bc66f1c2 in rust_begin_unwind ()
#18 0x00005557bb40d292 in core::panicking::panic_fmt ()
...
...
I think what is happening here is that the first one above holds the backtrace lock, and is trying to acquire a glibc lock protecting dynamically loaded libs inside of dl_iterate_phdr(), and the second thread holds that lock, but is trying to acquire the backtrace lock while allocating memory, so we have a deadlock.
If I understand correctly, this is really possible in any case where we have at least 2 threads that allocate memory from rust, and one of them calls std::sys_common::backtrace::print(), e.g. from panic!(), since the std::backtrace_rs::symbolize::gimli implementation will call the glibcdl_iterate_phdr() function and allocate from rust inside its callback. So maybe the whole prospect of printing backtraces from ProxyAllocator::alloc() is doomed?
The text was updated successfully, but these errors were encountered:
setting the global allocator to the one provided by
near-rust-allocator-proxy leads to different sorts of problems
(deadlocks and invalid thread local accesses), and that won't change
until/unless it's rewritten somewhat. For now, we can't really use it
correctly, so we'll just remove it. Shouldn't be too hard to re-add in
the future if it's fixed. See these issues:
near/near-memory-tracker#103near/near-memory-tracker#102
As a result of this, perf builds no longer print memory stats in logs.
Also we're just going to use std::thread::current().id() instead of
near_rust_allocator_proxy::get_tid() (and we won't sort them in logs),
but this shouldn't be too much of a problem
…xy (#8268)
setting the global allocator to the one provided by near-rust-allocator-proxy leads to different sorts of problems (deadlocks and invalid thread local accesses), and that won't change until/unless it's rewritten somewhat. For now, we can't really use it correctly, so we'll just remove it. Shouldn't be too hard to re-add in the future if it's fixed. See these issues:
near/near-memory-tracker#103near/near-memory-tracker#102
As a result of this, perf builds no longer print memory stats in logs. Also we're just going to use `std::thread::current().id()` instead of `near_rust_allocator_proxy::get_tid()` (and we won't sort them in logs), but this shouldn't be too much of a problem
Hard to say what to do about this, but I think there is an issue with trying to get backtraces from within the global allocator. I have observed a deadlock where one thread is here:
and another is here:
I think what is happening here is that the first one above holds the backtrace lock, and is trying to acquire a
glibc
lock protecting dynamically loaded libs inside ofdl_iterate_phdr()
, and the second thread holds that lock, but is trying to acquire thebacktrace
lock while allocating memory, so we have a deadlock.If I understand correctly, this is really possible in any case where we have at least 2 threads that allocate memory from rust, and one of them calls
std::sys_common::backtrace::print()
, e.g. frompanic!()
, since thestd::backtrace_rs::symbolize::gimli
implementation will call theglibc
dl_iterate_phdr()
function and allocate from rust inside its callback. So maybe the whole prospect of printing backtraces fromProxyAllocator::alloc()
is doomed?The text was updated successfully, but these errors were encountered: