Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cargo test with musl hangs occasionally #2144

Closed
yihuaf opened this issue Jul 6, 2023 · 6 comments
Closed

cargo test with musl hangs occasionally #2144

yihuaf opened this issue Jul 6, 2023 · 6 comments
Assignees

Comments

@yihuaf
Copy link
Collaborator

yihuaf commented Jul 6, 2023

We observed in the field that unit tests with musl hangs from time to time. This happens fairly often but not all the time. Several CI runs now shows this behavior and I am able to reproduce this locally with:

[eng@stardust youki]$ cargo +nightly test fork -Zbuild-std --target $(uname -m)-unknown-linux-musl --package libcontainer --no-default-features -F v2
    Finished test [unoptimized + debuginfo] target(s) in 0.12s
     Running unittests src/lib.rs (target/x86_64-unknown-linux-musl/debug/deps/libcontainer-0019f012789e0a63)

running 3 tests
test process::fork::test::test_container_clone_sibling ... ok
test process::fork::test::test_container_fork ... ok
test process::fork::test::test_container_err_fork has been running for over 60 seconds

The test that hangs is a simple test where we fork/clone3 a with a function that returns error right away. The test itself runs correctly in the normal unit test with glibc. This seems to only happen in musl.

I was able to reproduce this case locally that happens about once in a while. I see 3 process related to cargo test at the time of the hang:

[eng@stardust youki]$ ps aux | grep libcontainer
eng      1325755  0.7  0.1 119340 44988 pts/5    S+   16:42   0:00 /home/eng/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo test fork -Zbuild-std --target x86_64-unknown-linux-musl --package libcontainer --no-default-features -F v2
eng      1325757  0.0  0.0  14016  6400 pts/5    Sl+  16:42   0:00 /home/eng/private/youki/target/x86_64-unknown-linux-musl/debug/deps/libcontainer-0019f012789e0a63 fork
eng      1325762  0.0  0.0  18128  1532 pts/5    S+   16:42   0:00 /home/eng/private/youki/target/x86_64-unknown-linux-musl/debug/deps/libcontainer-0019f012789e0a63 fork

Specifically, 1325757 seems to be the actual unit test process. The process correctly is calling waitpid:

[eng@stardust youki]$ strace -f -p 1325757
strace: Process 1325757 attached with 2 threads
[pid 1325757] restart_syscall(<... resuming interrupted read ...> <unfinished ...>
[pid 1325759] wait4(1325762, ^Cstrace: Process 1325757 detached

The 1325762 is the forked process and interestingly it is stuck in a futex:

[eng@stardust youki]$ strace -f -p 1325762
strace: Process 1325762 attached
futex(0x7f2c41d69790, FUTEX_WAIT_PRIVATE, 2147483651, NULL^Cstrace: Process 1325762 detached
 <detached ...>

I don't have a definitive proof, but I highly suspect that this is related to how we are using clone3 kernel syscall directly to create process. As mentioned in #2030 (comment), libc does extra bookkeeping around the fork and clone syscall when creating processes. Using the kernel syscall directly bypass the these bookkeeping and my result in Undefined Behavior.

I will do a bit more digging and see if I can have more proof.

In the meantime, restarting the musl test seems to be able to let it pass. If it happens too often, we can turn it off until it is resolved. If clone3 turns out to be the issue, once #2121 merges, I can force musl into the fallback path with a feature flag.

@yihuaf yihuaf self-assigned this Jul 6, 2023
@yihuaf
Copy link
Collaborator Author

yihuaf commented Jul 7, 2023

This is the strace for a run that doesn't hang:

...
[pid 1344477] clone3({flags=0, exit_signal=SIGCHLD, stack=NULL, stack_size=0}, 88) = 1344478
strace: Process 1344478 attached
[pid 1344477] wait4(1344478,  <unfinished ...>
[pid 1344478] exit_group(-1)            = ?
[pid 1344478] +++ exited with 255 +++
[pid 1344477] <... wait4 resumed>[{WIFEXITED(s) && WEXITSTATUS(s) == 255}], 0, NULL) = 1344478
[pid 1344477] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1344478, si_uid=1000, si_status=255, si_utime=0, si_stime=0} ---
[pid 1344477] futex(0x7fe00a882258, FUTEX_WAKE_PRIVATE, 1) = 1
...
test process::fork::test::test_container_err_fork ... ok

Note here 1344477 is the test process waiting on 1344478 to exit. The 1344477 then receive a signal SIGCHLD and wakes up from the wait.

@yihuaf
Copy link
Collaborator Author

yihuaf commented Jul 7, 2023

Reproduced one more time, but this time seems a bit different. I used gdb to see which futex is invoked. This time, there are two fork test cases that are stuck.

(gdb) bt
#0  __futexwait () at ../src_musl/src/internal/pthread_impl.h:178
#1  __lock () at ../src_musl/src/thread/__lock.c:44
#2  0x00007f7ef056551e in __libc_malloc_impl () at ../src_musl/src/malloc/mallocng/malloc.c:337
#3  0x00007f7ef08e8772 in std::sys::unix::alloc::{impl#0}::alloc (self=0x7f7ef0b1f318, layout=...) at src/sys/unix/alloc.rs:14
#4  0x00007f7ef092517a in std::alloc::__default_lib_allocator::__rdl_alloc (size=11, align=1) at src/alloc.rs:381
#5  0x00007f7ef0a46f50 in alloc::alloc::alloc (layout=...) at src/alloc.rs:100
#6  0x00007f7ef0a470d9 in alloc::alloc::Global::alloc_impl (self=0x7f7eefffad30, layout=..., zeroed=false) at src/alloc.rs:183
#7  0x00007f7ef0a475f9 in alloc::alloc::{impl#1}::allocate (self=0x7f7eefffad30, layout=...) at src/alloc.rs:243
#8  0x00007f7ef0a3f414 in alloc::raw_vec::RawVec<u8, alloc::alloc::Global>::allocate_in<u8, alloc::alloc::Global> (capacity=11, init=alloc::raw_vec::AllocInit::Uninitialized, alloc=...)
    at src/raw_vec.rs:184
#9  0x00007f7ef0a3f00c in alloc::raw_vec::RawVec<u8, alloc::alloc::Global>::with_capacity_in<u8, alloc::alloc::Global> (capacity=11, alloc=...) at src/raw_vec.rs:130
#10 0x00007f7ef0a45614 in alloc::vec::Vec<u8, alloc::alloc::Global>::with_capacity_in<u8, alloc::alloc::Global> (capacity=11, alloc=...) at src/vec/mod.rs:672
#11 0x00007f7ef0a455d7 in alloc::vec::Vec<u8, alloc::alloc::Global>::with_capacity<u8> (capacity=11) at src/vec/mod.rs:479
#12 0x00007f7ef0a49718 in alloc::ffi::c_str::{impl#1}::new::spec_new_impl_bytes (bytes=...) at src/ffi/c_str.rs:287
#13 alloc::ffi::c_str::{impl#1}::new::{impl#2}::spec_new_impl (self=...) at src/ffi/c_str.rs:306
#14 0x00007f7ef099d28b in alloc::ffi::c_str::CString::new<&str> (t=...)
    at /home/eng/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/ffi/c_str.rs:316
#15 0x00007f7ef056374a in prctl::set_name (name=...) at src/lib.rs:356
#16 0x00007f7ef03345bd in libcontainer::process::fork::container_clone<libcontainer::process::fork::test::test_container_err_fork::{closure_env#0}> (child_name=..., cb=..., clone_cmd=...)
    at crates/libcontainer/src/process/fork.rs:69
#17 0x00007f7ef0334f65 in libcontainer::process::fork::container_fork<libcontainer::process::fork::test::test_container_err_fork::{closure_env#0}> (child_name=..., cb=...)
    at crates/libcontainer/src/process/fork.rs:95

Looks like prctl::set_name is blocked on a futex in both processes. Is there a race condition leading to deadlock?

@yihuaf
Copy link
Collaborator Author

yihuaf commented Jul 7, 2023

Removed the prctl::set_name call inside the clone callback, then we got:

(gdb) bt
#0  __futexwait () at ../src_musl/src/internal/pthread_impl.h:178
#1  __lock () at ../src_musl/src/thread/__lock.c:44
#2  0x00007f67425687ce in __ofl_lock () at ../src_musl/src/stdio/ofl.c:11
#3  0x00007f674256b2cd in __stdio_exit () at ../src_musl/src/stdio/__stdio_exit.c:19
#4  0x00007f67422ec049 in exit () at ../src_musl/src/exit/exit.c:31
#5  0x00007f67429ab1ae in std::sys::unix::os::exit (code=-1) at src/sys/unix/os.rs:660
#6  0x00007f674292cec5 in std::process::exit (code=-1) at src/process.rs:2124
#7  0x00007f6742334618 in libcontainer::process::fork::container_clone<libcontainer::process::fork::test::test_container_err_fork::{closure_env#0}> (child_name=..., cb=..., clone_cmd=...)
    at crates/libcontainer/src/process/fork.rs:78
#8  0x00007f6742334e75 in libcontainer::process::fork::container_fork<libcontainer::process::fork::test::test_container_err_fork::{closure_env#0}> (child_name=..., cb=...)
    at crates/libcontainer/src/process/fork.rs:95

Looks like os.exit caused a futex wait.

@yihuaf
Copy link
Collaborator Author

yihuaf commented Jul 7, 2023

I think I have a better understanding about this now... There are a number of issues that are in play. It does have to do with difference between musl and glibc. Specifically, this has to do with the bookkeeping around process creation, what is safe/defined behavior and what is unsafe/undefined behavior, and how forking from a multi-thread parent process is not safe.

At current main before #2121, we have a number of issues that I highly suspect has to do with using kernel syscall clone3 directly without a proper libc mapper. As comments above mentions, prctl::setname can cause the process to hang. With that removed, the exit syscall hangs:

(gdb) bt
#0  __futexwait () at ../src_musl/src/internal/pthread_impl.h:178
#1  __lock () at ../src_musl/src/thread/__lock.c:44
#2  0x00007fedb508b04a in __ofl_lock () at ../src_musl/src/stdio/ofl.c:11
#3  0x00007fedb508defd in __stdio_exit () at ../src_musl/src/stdio/__stdio_exit.c:19
#4  0x00007fedb4eeb049 in exit () at ../src_musl/src/exit/exit.c:31
#5  0x00007fedb55a615e in std::sys::unix::os::exit (code=-1) at src/sys/unix/os.rs:660
#6  0x00007fedb5527e75 in std::process::exit (code=-1) at src/process.rs:2124
#7  0x00007fedb4f31427 in libcontainer::process::fork::clone3 (cb=0x7fedb4bfb268, flags=0, exit_signal=...) at crates/libcontainer/src/process/fork.rs:125
#8  0x00007fedb4f30b0d in libcontainer::process::fork::clone_internal (cb=..., flags=0, exit_signal=...) at crates/libcontainer/src/process/fork.rs:65
#9  0x00007fedb4f30a53 in libcontainer::process::fork::container_clone (cb=...) at crates/libcontainer/src/process/fork.rs:56
#10 0x00007fedb5073ec0 in libcontainer::process::fork::test::test_container_err_fork () at crates/libcontainer/src/process/fork.rs:273
#11 0x00007fedb4efb7b7 in libcontainer::process::fork::test::test_container_err_fork::{closure#0} () at crates/libcontainer/src/process/fork.rs:272

Now, with #2121 and forcing the same test to use the clone fallback, we got pass all these issues and hangs in free.

(gdb) bt
#0  __futexwait () at ../src_musl/src/internal/pthread_impl.h:178
#1  __lock () at ../src_musl/src/thread/__lock.c:44
#2  0x00007fa125a886c8 in wrlock () at ../src_musl/src/malloc/mallocng/glue.h:72
#3  __libc_free () at ../src_musl/src/malloc/mallocng/free.c:143
#4  0x00007fa125ee2a09 in std::sys::unix::alloc::{impl#0}::dealloc (self=0x7fa126119b58, ptr=0x7fa1264d2c60, _layout=...) at src/sys/unix/alloc.rs:42
#5  0x00007fa125f1f369 in std::alloc::__default_lib_allocator::__rdl_dealloc (ptr=0x7fa1264d2c60, size=16, align=8) at src/alloc.rs:389
#6  0x00007fa12599a092 in alloc::alloc::dealloc (ptr=0x7fa1264d2c60, layout=...)
    at /home/eng/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:119
#7  0x00007fa12599a0eb in alloc::alloc::{impl#1}::deallocate (self=0x7fa1249f6f50, ptr=..., layout=...)
    at /home/eng/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:256
#8  0x00007fa125a5e045 in alloc::alloc::box_free<alloc::boxed::Box<dyn core::ops::function::FnMut<(), Output=i32>, alloc::alloc::Global>, alloc::alloc::Global> (ptr=..., alloc=...)
    at /home/eng/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:350
#9  0x00007fa1259a2774 in core::ptr::drop_in_place<alloc::boxed::Box<alloc::boxed::Box<dyn core::ops::function::FnMut<(), Output=i32>, alloc::alloc::Global>, alloc::alloc::Global>> ()
    at /home/eng/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:497
#10 0x00007fa125931c89 in libcontainer::process::fork::clone::main (data=0x7fa1264d2c60) at crates/libcontainer/src/process/fork.rs:210
#11 0x00007fa125a87c29 in __clone () at ../src_musl/src/thread/x86_64/clone.s:22

Now, after some more research, it seems there is a whole list of things that are unsafe in the child process after forking from a multi-thread parents. Based on my understanding, which may not be 100% or complete, posix in theory only claim exec family syscalls to be safe after fork/clone process. Calling any other syscall is considered to be undefined behavior. In reality, there are a list of syscalls that are safe (async-signal safe) syscalls. And these calls do not include malloc and free, which is what we have here. The rough reasonings are: inside the new child process, only the thread that called the fork or clone will continue to be executed. All other threads disappears in the child process. As a result, if any of these threads hold a lock (futex), it will be locked forever with no one to unlock. malloc and free are one of these functions that are unsafe. Based on my search, glibc actively prevents some of these issues but it is claimed to be not posix-compliant (I don't know much about deeper implications about this), but musl exposes these behaviors.

The free comes from the function we pass into the clone call:

    extern "C" fn main(data: *mut libc::c_void) -> libc::c_int {
        unsafe { Box::from_raw(data as *mut CloneCb)() }
    }

The Box will call free at the end of the scope, as it should. However, this is the free that triggers the deadlock. A direct solution to this is not to free the closure memory. Instead of leting the Box do the free, we let the system do the free, by leaking the pointer of the closure. This is safe because we know the process terminates afterwards and the OS will clean up the leaked memory. Basically, change the above function to:

    extern "C" fn main(data: *mut libc::c_void) -> libc::c_int {
        let mut func = unsafe { Box::from_raw(data as *mut CloneCb) };
        let ret = func();
        Box::into_raw(func);

        ret
    }

We use Box::into_raw to unbox the closure. At the end of the scope, the memory is leaked and the OS will cleanup after the process terminates. Based on my cursory test, this seems to fix the issue.

Note, in real youki execution, this issue will not arise because we do not use threads. All of our processes are single thread and therefore doesn't have this issue. This only happens when the parent is a multi-thread process forking child process.

YJDoc2 added a commit that referenced this issue Jul 11, 2023
Temporarily disable musl tests until #2144 is resolved
@jprendes
Copy link
Contributor

jprendes commented Oct 6, 2023

We are hitting this issue in runwasi with glibc.
Also, see https://sourceware.org/bugzilla/show_bug.cgi?id=26371

YJDoc2 added a commit to YJDoc2/youki that referenced this issue Jan 4, 2024
YJDoc2 added a commit to YJDoc2/youki that referenced this issue Jan 4, 2024
YJDoc2 added a commit to YJDoc2/youki that referenced this issue Jan 4, 2024


Signed-off-by: Yashodhan Joshi <yjdoc2@gmail.com>
YJDoc2 added a commit to YJDoc2/youki that referenced this issue Jan 4, 2024


Signed-off-by: Yashodhan Joshi <yjdoc2@gmail.com>
@YJDoc2
Copy link
Collaborator

YJDoc2 commented Feb 16, 2024

Closing for now as #2685 seems a good solution. Will re-open if issue persists

@YJDoc2 YJDoc2 closed this as completed Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants