Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProxyAllocator::alloc can deadlock if another thread panics #102

Open
marcelo-gonzalez opened this issue Nov 2, 2022 · 0 comments
Open

Comments

@marcelo-gonzalez
Copy link

Hard to say what to do about this, but I think there is an issue with trying to get backtraces from within the global allocator. I have observed a deadlock where one thread is here:

#0  __lll_lock_wait (futex=futex@entry=0x7ff302290990 <_rtld_global+2352>, private=0) at lowlevellock.c:52
#1  0x00007ff302038131 in __GI___pthread_mutex_lock (mutex=0x7ff302290990 <_rtld_global+2352>) at ../nptl/pthread_mutex_lock.c:115
#2  0x00007ff301e43291 in __GI___dl_iterate_phdr (callback=0x7ff30206c5f0, data=0x7ff05cdf9c50) at dl-iteratephdr.c:40
#3  0x00007ff30206d6c1 in _Unwind_Find_FDE () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#4  0x00007ff302069868 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#5  0x00007ff30206aa20 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#6  0x00007ff30206b76c in _Unwind_Backtrace () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#7  0x00005557bb535f94 in <near_rust_allocator_proxy::allocator::ProxyAllocator<A> as core::alloc::global::GlobalAlloc>::alloc ()
#8  0x00005557bbe28275 in <actix::fut::future::FutureWrap<F,A> as actix::fut::future::ActorFuture<A>>::poll ()
...
...

and another is here:

#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00005557bc63bbf0 in std::sys::unix::futex::futex_wait ()
#2  0x00005557bb4c0958 in std::sys::unix::locks::futex_mutex::Mutex::lock_contended ()
#3  0x00005557bb64eaea in backtrace::lock::lock ()
#4  0x00005557bb535f59 in <near_rust_allocator_proxy::allocator::ProxyAllocator<A> as core::alloc::global::GlobalAlloc>::alloc ()
#5  0x00005557bc640930 in std::backtrace_rs::symbolize::gimli::libs_dl_iterate_phdr::callback ()
#6  0x00007ff301e433d5 in __GI___dl_iterate_phdr (callback=0x5557bc6408e0 <std::backtrace_rs::symbolize::gimli::libs_dl_iterate_phdr::callback>, data=0x7ff05cfface8) at dl-iteratephdr.c:75
#7  0x00005557bc66dea4 in std::sys_common::backtrace::_print_fmt::{{closure}} ()
#8  0x00005557bc65ba41 in std::backtrace_rs::backtrace::libunwind::trace::trace_fn ()
#9  0x00007ff30206b794 in _Unwind_Backtrace () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#10 0x00005557bc66dac0 in <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt ()
#11 0x00005557bb6d646c in core::fmt::write ()
#12 0x00005557bc63add0 in std::io::Write::write_fmt ()
#13 0x00005557bc66e94e in std::panicking::default_hook::{{closure}} ()
#14 0x00005557bc66f6f7 in std::panicking::rust_panic_with_hook ()
#15 0x00005557bc66f292 in std::panicking::begin_panic_handler::{{closure}} ()
#16 0x00005557bc66f206 in std::sys_common::backtrace::__rust_end_short_backtrace ()
#17 0x00005557bc66f1c2 in rust_begin_unwind ()
#18 0x00005557bb40d292 in core::panicking::panic_fmt ()
...
...

I think what is happening here is that the first one above holds the backtrace lock, and is trying to acquire a glibc lock protecting dynamically loaded libs inside of dl_iterate_phdr(), and the second thread holds that lock, but is trying to acquire the backtrace lock while allocating memory, so we have a deadlock.

If I understand correctly, this is really possible in any case where we have at least 2 threads that allocate memory from rust, and one of them calls std::sys_common::backtrace::print(), e.g. from panic!(), since the std::backtrace_rs::symbolize::gimli implementation will call the glibc dl_iterate_phdr() function and allocate from rust inside its callback. So maybe the whole prospect of printing backtraces from ProxyAllocator::alloc() is doomed?

marcelo-gonzalez added a commit to marcelo-gonzalez/nearcore that referenced this issue Dec 22, 2022
setting the global allocator to the one provided by
near-rust-allocator-proxy leads to different sorts of problems
(deadlocks and invalid thread local accesses), and that won't change
until/unless it's rewritten somewhat. For now, we can't really use it
correctly, so we'll just remove it. Shouldn't be too hard to re-add in
the future if it's fixed.  See these issues:

near/near-memory-tracker#103
near/near-memory-tracker#102

As a result of this, perf builds no longer print memory stats in logs.
Also we're just going to use std::thread::current().id() instead of
near_rust_allocator_proxy::get_tid() (and we won't sort them in logs),
but this shouldn't be too much of a problem
near-bulldozer bot pushed a commit to near/nearcore that referenced this issue Dec 22, 2022
…xy (#8268)

setting the global allocator to the one provided by near-rust-allocator-proxy leads to different sorts of problems (deadlocks and invalid thread local accesses), and that won't change until/unless it's rewritten somewhat. For now, we can't really use it correctly, so we'll just remove it. Shouldn't be too hard to re-add in the future if it's fixed.  See these issues:

near/near-memory-tracker#103 near/near-memory-tracker#102

As a result of this, perf builds no longer print memory stats in logs. Also we're just going to use `std::thread::current().id()` instead of `near_rust_allocator_proxy::get_tid()` (and we won't sort them in logs), but this shouldn't be too much of a problem
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant