Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calling libc::res_init from multiple threads is unsafe on at least OSX #43592

Closed
oconnor663 opened this issue Aug 1, 2017 · 10 comments
Closed
Labels
C-bug Category: This is a bug. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@oconnor663
Copy link
Contributor

We recently started calling res_init on DNS failure to fix a glibc bug, and there was some worry that we might be calling a function that isn't threadsafe. It looked like we might not have any thread safety issues with glibc specifically, but now I'm convinced that we do have them in libc on OSX. I haven't repro'd the issue in Rust yet, but I'm able to produce very scary crashes with the following Go program if I run it on OSX (but not on Linux):

package main

// #cgo LDFLAGS: -lresolv
// #include<sys/types.h>
// #include<netinet/in.h>
// #include<arpa/nameser.h>
// #include<resolv.h>
import "C"
import "fmt"

func main() {
	// Loop on res_init() in a background goroutine...
	go func() {
		for {
			fmt.Println("background res_init")
			C.res_init()
		}
	}()
	// ...and also loop on it in the main thread.
	for {
		fmt.Println("foreground res_init")
		C.res_init()
	}
}

The result is inconsistent, but here are a couple examples:

fatal error: unexpected signal during runtime execution
foreground res_init
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x7fffddf6705d]
test(30623,0x70000b910000) malloc: *** error for object 0x4601510: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
foreground res_init
foreground res_init
SIGABRT: abort
PC=0x7fffde988d42 m=3 sigcode=0
signal arrived during cgo execution

It might be that the best workaround is to limit our calls to res_init to when we know we're linking against glibc? That way we could still fix the original bug (stale /etc/resolv.conf data in glibc specifically), take advantage of the thread safety that glibc seems to have here (we might want to audit it more carefully than I'm able to), and not worry about breaking any other platforms.

@jonhoo
Copy link
Contributor

jonhoo commented Aug 1, 2017

Your proposed workaround of only calling res_init when linking with glibc seems reasonable, also because only glibc has this problem. The other libc implementations will update the nameservers when the network changes. I don't exactly know how to implement that. Linux only seems too strict and also too weak? As mentioned in #41582 (comment), it should be possible to switch to res_ninit, but it would require figuring out the exact symbol use by the per-thread _res symbol, which I suspect will a) be tricky from within Rust, and b) is probably libc dependent :/

@oconnor663
Copy link
Contributor Author

oconnor663 commented Aug 1, 2017

Can we detect whether the symbol gnu_get_libc_version exists at runtime, and call it if so? I think we do something similar at https://github.com/rust-lang/rust/blob/1.19.0/src/libstd/sys/unix/weak.rs#L73-L79, though I don't know how happy we are about doing that :|

@oconnor663
Copy link
Contributor Author

This kinda sorta seems to work (in libstd we'd probably replace most of the machinery with the existing weak! macro):

extern crate libc;

use std::ffi::CStr;
use std::mem;

fn glibc_version() -> Option<&'static CStr> {
    unsafe {
        let funcptr = libc::dlsym(
            libc::RTLD_DEFAULT,
            "gnu_get_libc_version\0".as_ptr() as *const libc::c_char,
        );
        if !funcptr.is_null() {
            type CharstarFn = extern "C" fn() -> *const libc::c_char;
            let gnu_get_libc_version: CharstarFn = mem::transmute(funcptr);
            Some(CStr::from_ptr(gnu_get_libc_version()))
        } else {
            None
        }
    }
}

fn main() {
    println!("{:?}", glibc_version());
}

For me, cargo run prints Some("2.25"), and cargo run --target=x86_64-unknown-linux-musl prints None.

@Mark-Simulacrum Mark-Simulacrum added C-bug Category: This is a bug. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Aug 3, 2017
@alex
Copy link
Member

alex commented Oct 1, 2017

FWIW, I just hit a crash on macOS that appears to possibly be this.

Edit: Here's the error message:

ct-tools(62565,0x70000a072000) malloc: *** error for object 0x100c021e8: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
Process 62565 stopped
* thread #2, stop reason = signal SIGABRT
    frame #0: 0x00007fff6eadffce libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
->  0x7fff6eadffce <+10>: jae    0x7fff6eadffd8            ; <+20>
    0x7fff6eadffd0 <+12>: movq   %rax, %rdi
    0x7fff6eadffd3 <+15>: jmp    0x7fff6ead776c            ; cerror_nocancel
    0x7fff6eadffd8 <+20>: retq
Target 0: (ct-tools) stopped.
(lldb) bt
* thread #2, stop reason = signal SIGABRT
  * frame #0: 0x00007fff6eadffce libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff6ec1d150 libsystem_pthread.dylib`pthread_kill + 333
    frame #2: 0x00007fff6ea3c32a libsystem_c.dylib`abort + 127
    frame #3: 0x00007fff6eb44b28 libsystem_malloc.dylib`szone_error + 596
    frame #4: 0x00007fff6eb3a76b libsystem_malloc.dylib`tiny_free_list_remove_ptr + 298
    frame #5: 0x00007fff6eb4fac8 libsystem_malloc.dylib`tiny_free_no_lock + 1450
    frame #6: 0x00007fff6eb50254 libsystem_malloc.dylib`free_tiny + 628
    frame #7: 0x00007fff6ea8a6c2 libsystem_info.dylib`freeaddrinfo + 66
    frame #8: 0x000000010038181d ct-tools`std::net::addr::resolve_socket_addr [inlined] std::sys_common::net::{{impl}}::drop at net.rs:160 [opt]
    frame #9: 0x0000000100381811 ct-tools`std::net::addr::resolve_socket_addr [inlined] core::ptr::drop_in_place<std::sys_common::net::LookupHost> at ptr.rs:61 [opt]
    frame #10: 0x0000000100381811 ct-tools`std::net::addr::resolve_socket_addr [inlined] core::ptr::drop_in_place<std::net::LookupHost> at ptr.rs:61 [opt]
    frame #11: 0x0000000100381811 ct-tools`std::net::addr::resolve_socket_addr [inlined] core::ptr::drop_in_place<core::iter::Map<std::net::LookupHost, closure>> at ptr.rs:61 [opt]
    frame #12: 0x0000000100381811 ct-tools`std::net::addr::resolve_socket_addr [inlined] alloc::vec::{{impl}}::extend_desugared<std::net::addr::SocketAddr,core::iter::Map<std::net::LookupHost, closure>> at vec.rs:1915 [opt]
    frame #13: 0x00000001003815d0 ct-tools`std::net::addr::resolve_socket_addr [inlined] alloc::vec::{{impl}}::spec_extend<std::net::addr::SocketAddr,core::iter::Map<std::net::LookupHost, closure>> at vec.rs:1800 [opt]
    frame #14: 0x00000001003815d0 ct-tools`std::net::addr::resolve_socket_addr [inlined] alloc::vec::{{impl}}::from_iter<std::net::addr::SocketAddr,core::iter::Map<std::net::LookupHost, closure>> at vec.rs:1795 [opt]
    frame #15: 0x0000000100381400 ct-tools`std::net::addr::resolve_socket_addr [inlined] alloc::vec::{{impl}}::from_iter<std::net::addr::SocketAddr,core::iter::Map<std::net::LookupHost, closure>> at vec.rs:1696 [opt]
    frame #16: 0x0000000100381400 ct-tools`std::net::addr::resolve_socket_addr [inlined] core::iter::iterator::Iterator::collect<core::iter::Map<std::net::LookupHost, closure>,alloc::vec::Vec<std::net::addr::SocketAddr>> at iterator.rs:1298 [opt]
    frame #17: 0x0000000100381400 ct-tools`std::net::addr::resolve_socket_addr at addr.rs:850 [opt]
    frame #18: 0x0000000100381a9c ct-tools`std::net::addr::{{impl}}::to_socket_addrs at addr.rs:870 [opt]
    frame #19: 0x00000001002638ef ct-tools`std::panicking::try::do_call::h98e985df9219ceeb + 479
    frame #20: 0x000000010039c44d ct-tools`panic_unwind::__rust_maybe_catch_panic at lib.rs:99 [opt]
    frame #21: 0x0000000100261b32 ct-tools`_$LT$futures_cpupool..MySender$LT$F$C$$u20$core..result..Result$LT$$LT$F$u20$as$u20$futures..future..Future$GT$..Item$C$$u20$$LT$F$u20$as$u20$futures..future..Future$GT$..Error$GT$$GT$$u20$as$u20$futures..future..Future$GT$::poll::h5fdc3798a778b180 + 626
    frame #22: 0x0000000100298a90 ct-tools`futures::task_impl::std::Run::run::h94868cd32b108ec9 + 448
    frame #23: 0x000000010028f553 ct-tools`std::sys_common::backtrace::__rust_begin_short_backtrace::hf0ef3b99d4e69109 + 259
    frame #24: 0x000000010039c44d ct-tools`panic_unwind::__rust_maybe_catch_panic at lib.rs:99 [opt]
    frame #25: 0x0000000100293650 ct-tools`_$LT$F$u20$as$u20$alloc..boxed..FnBox$LT$A$GT$$GT$::call_box::ha45b02055eb8de54 + 144
    frame #26: 0x000000010038e55c ct-tools`std::sys::imp::thread::{{impl}}::new::thread_start [inlined] alloc::boxed::{{impl}}::call_once<(),()> at boxed.rs:738 [opt]
    frame #27: 0x000000010038e559 ct-tools`std::sys::imp::thread::{{impl}}::new::thread_start [inlined] std::sys_common::thread::start_thread at thread.rs:24 [opt]
    frame #28: 0x000000010038e4de ct-tools`std::sys::imp::thread::{{impl}}::new::thread_start at thread.rs:90 [opt]
    frame #29: 0x00007fff6ec1a6c1 libsystem_pthread.dylib`_pthread_body + 340
    frame #30: 0x00007fff6ec1a56d libsystem_pthread.dylib`_pthread_start + 377
    frame #31: 0x00007fff6ec19c5d libsystem_pthread.dylib`thread_start + 13

@oconnor663
Copy link
Contributor Author

oconnor663 commented Oct 2, 2017

I just put up a PR (#44965) with a possible workaround, similar to the glibc_version example above. There's a very good chance I did something wrong, so @jonhoo please let me know what you think.

oconnor663 added a commit to oconnor663/rust that referenced this issue Oct 5, 2017
The previous workaround for gibc's res_init bug is not thread-safe on
other implementations of libc, and it can cause crashes. Use a runtime
check to make sure we only call res_init when we need to, which is also
when it's safe. See rust-lang#43592.
bors added a commit that referenced this issue Oct 6, 2017
replace libc::res_init with res_init_if_glibc_before_2_26

The previous workaround for gibc's res_init bug is not thread-safe on
other implementations of libc, and it can cause crashes. Use a runtime
check to make sure we only call res_init when we need to, which is also
when it's safe. See #43592.

~This PR is returning an InvalidData IO error if the glibc version string fails to parse. We could also have treated that case as "not glibc", and gotten rid of the idea that these functions could return an error. (Though I'm not a huge fan of ignoring error returns from `res_init` in any case.) Do other folks agree with these design choices?~

I'm pretty new to hacking on libstd. Is there an easy way to build a toy rust program against my changes to test this, other than doing an entire `sudo make install` on my system? What's the usual workflow?
@oconnor663
Copy link
Contributor Author

The fix just landed in master. @alex, any chance you can test it after the next nighty comes out?

@alex
Copy link
Member

alex commented Oct 6, 2017

Will do. (Although I haven't seen this error in a few days, probably because race conditions :-/)

@oconnor663
Copy link
Contributor Author

Btw, n00b question for you @alex. How do you figure out from std::sys_common::net::{{impl}}::drop at net.rs:160 that it's actually line 181 or whatever that's to blame? Are there other tools that tell you who wrote to what memory?

@alex
Copy link
Member

alex commented Oct 6, 2017

I didn't, at least not conclusively. I hit this crash, @Manishearth pointed me to this bug, and since it was crashing in "the right place", it seemed that with high probability this was the right diagnosis :-)

If I hadn't been pointed at this issue, I probably would have compiled with ASAN and seen what I hit.

@oconnor663
Copy link
Contributor Author

Closing this since #44965 landed a while ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants