-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unclear safety of std::thread::catch_panic
#25662
Comments
If |
/cc @alexcrichton |
@kballard I'm not entirely convinced that would be safe but I can't prove it unsafe either. In the multithreaded case, data protected by a lock must not violate any invariants any time the lock isn't held because one generally can't make statements about the state of other threads. However, in single threaded environments one can reasonably make scope-based invariants. For example, my horrorshow template macro emulates scoped TLS as follows: pub fn __with_template_reentrant<F: FnMut(&mut Template)>(mut f: F) {
// The scoped variant is unstable so we do this ourselves...
__TEMPLATE.with(|template| {
let mut local_template = None;
::std::mem::swap(&mut *template.borrow_mut(), &mut local_template);
(f)(local_template.as_mut().unwrap());
::std::mem::swap(&mut *template.borrow_mut(), &mut local_template);
});
}```
If the last swap isn't run, the invariant is violated and my library will likely panic at some future point.
However, this example doesn't violate memory safety and I can't find a way to do so without using `unsafe`. |
@Stebalien You're effectively poisoning the TLS value while that code is being run. If the thread panics and avoids the last In the case of |
Can't you use the binaryheap again? It was used to show unsoundness off thread scoped (the first time) |
Ok, so I put myself to work since it was my suggestion. Do you remember soundness issue the first time with std::thread::scoped (issue #20807) and why we made it stop catching panics? The following is (possibly?) a reason: It allows violating invariants in a type, and still go back to using that same value. The following code example segfaults in safe rust. I'm using a mutex and ignoring poison (this is safe rust). I think we have other ways to share memory that would avoid mutex altogether, so I think we can probably ignore mutex in the discussion(?). Note: This segfault works just as well with thread::spawn! rustc version: rustc 1.2.0-nightly (0cc99f9 2015-05-17) (built 2015-05-18) #![feature(catch_panic, collections, std_misc)]
use std::cmp::Ordering;
use std::collections::BinaryHeap;
use std::collections::BTreeSet;
use std::thread;
use std::sync::{Arc, Mutex};
#[derive(PartialEq, Eq, Ord, Debug, Clone)]
struct Panicker<T>(T);
impl<T: PartialEq> PartialOrd for Panicker<T>
{
fn partial_cmp(&self, _: &Self) -> Option<Ordering>
{
panic!()
}
}
fn main() {
let heap = Arc::new(Mutex::new(BinaryHeap::new()));
let heap_ref = heap.clone();
let guard = thread::catch_panic(move || {
let mut local_heap = heap.lock().unwrap();
local_heap.push(Panicker(BTreeSet::<i32>::new()));
local_heap.push(Panicker(BTreeSet::<i32>::new()));
});
println!("Thread result: {:?}", guard);
let mutex_guard = match heap_ref.lock() {
Ok(inner) => inner,
Err(e) => e.into_inner(),
};
let vector_from_binary_heap = mutex_guard.clone().into_vec();
println!("{:?}", vector_from_binary_heap);
} Q: Why use BTreeSet as the element type? You have to ask the question: Is it catching panics or BinaryHeap that is broken? If we fix BinaryHeap, what other type invariants can we break for fun & unsoundness? Edit: as tomaka says below, is it ignoring poisoning that is unsafe? |
Time to call in the cavalry, cc @nikomatsakis @aturon |
oh and cc @tomaka |
|
I want to be sure we solve the fundamental issues and don't just patch the holes in an unsound bag. I don't know which of the two categories ignoring poisoning belongs to. |
The golden rule is that when a panic occurs, any object that is mutably borrowed must no longer be used in the future (except for its destructor) because its internal state is possibly messed up. Rust cleans up local variables inside the closure passed to
References are explicitely forbidden because the closure is required to be Only |
One thing that I neglected in the previous post are destructors. Maybe with a lot of hacky (but safe) code it could possible to trigger a crash by writing a destructor that is being run during a panic (that's just a supposition, I have absolutely no idea how). Destructors are some kind of gray area when it comes to exception-safety. |
This problem is really specific to TLS, because That's probably the reason why |
For completeness, the TLS + RefCell version of the same segfault. |
@tbu- can you clarify what you're looking for in this issue? There's discussion about referencing
It's currently an explicit design decision that
Looks like the implementation of BinaryHeap is the one at fault here, it is not panic-safe.
It's a fundamental part of poisoning that this is not unsafe. It is only possible to break memory safety as a part of panic safety with
This is only true for for unsafe code, safe code can only have logical invariants violated.
The precise bounds for this API are a little squishy (hence the instability), but the goal was to prevent unnecessary leakage of accidental vectors of panic safety. It was thought that the bounds can be dropped possibly if good rationale comes up in the future.
Yes, TLS did not come up much in the initial discussions of |
On one hand several changes have to been made to help unsafe code writers: forbidding transmuting On the other hand, failure to write panic-safe code should be blamed on the writer? |
@alexcrichton BinaryHeap hasn't even been updated from zeroing to filling drop.. This is super broken! If we choose to blame this particular location in libstd
|
Note that like the abstractions of
Whatever the outcome of this bug is, however, this isn't super relevant as they all need to be fixed regardless.
We're all quite busy and things fall under the radar from time to time, it's not like we all sat down at a table and decided that a memory safety bug in BinaryHeap was OK to make it into 1.0.
I mean only to clarify what the outcome of the issue was. We definitely realize that exception safety is a problem for |
I was apparently not on the same page. I think it sounds good that we tackle exception safety head on, but it seems to be a different kind of direction than what #20807 concluded. |
I pretty much agree with what @alexcrichton said, and I think that was my takeaway from #20807 as well. That is, exception safety is "still a thing" -- but we do our best to minimize its impact on your life, particularly if you are not developing unsafe code. We opted to change the API for #20807 not because we believed it would render exception safety unnecessary, but because the current API was effectively adding an (ineffecient, but otherwise relatively usable) "catch" keyword to the language, which wasn't really the intention. (If we want catch, we should just add catch.) This is also the intention of poisoning mutexes (but letting you bypass it). It is not intended to absolutely prevent you from obtaining bad data, but it does make you think about it at least. |
Ok. I'll just say, total days during which binaryheap was known to be broken: 1 single day*. *Known defined as having bug reported |
This function has remained in std for quite some time now without modifications, and it's a core building block for robust FFI, so this commit stabilizes the signature as-is. It is possible to relax the `Send` or `'static` bounds in the future additionally. Closes rust-lang#25662
@alexcrichton We (@zonyitoo and me) noticed another issue with the current behaviour of |
There is one easy solution for this: Don't allow panic in |
Preventing panics in The only alternative I see right now is to introduce some additional API which allows us to "register a stack". We could pass a callback to that register method which could be invoked whenever the panic & unwind modules need the panic count. What is at the momen the thread-local IMHO this would be the least intrusive and most flexible approach right now. Closing down Rust to only support threads and nothing more seems to me like an unnecessary limitation though. Apart from that I think there are only 2 solutions: One of which would be to adapt the C++ model and the other would be to have "stack-local" instead of "thread-local" data. |
@lhecker The concern is a userland context switch in the middle of unwinding? Things should work if you |
@sfackler Nope. Not in the middle of unwinding. As you can see in here, the |
This is unfortunately one of the major drawbacks of coroutines in a systems language right now, which is that they don't support native TLS easily. It sounds like this interaction with I don't think this problem is necessarily contained to Also note that TLS has bad interactions with optimizations in LLVM right now, specifically the usage of TLS across a function call that can change threads may cause segfaults. (some info about this is in 12c5fc5) |
@alexcrichton Hmm, unfortunately, switching TLS tables could not solve this problem, because after switching, the current executing context still holds pointers to values in the original TLS (for example, the It is not easy to make coroutine compatible with |
@alexcrichton The only thing with Rust where TLS is used in a way which does not work with coroutines is For instance the poison flag you mentioned? That's not inherently a problem! The poison flag only calls But what happens if you catch that panic? The AFAIK this is the only major problem with TLS & Coroutines in Rust right now. And as I said earlier this could either be solved by:
Edit: You're right though about it being dangerous... For instance a |
Thinking more on this, the bug pointed out in 12c5fc5 essentially means that any use of TLS is subject to segfaults with coroutines currently. That's unfortunately not the only problem that needs to be solved (e.g. also this business with The gist of that bug is that this code can segfault: fn foo() -> bool {
let a = thread::panicking();
bar();
a && thread::panicking()
} As a side note, I just realized that this issue is basically now long since closed. We've discussed the whole |
If I get it correctly, this function cannot be safe as is, due to everyone in a thread implicitely having a reference to TLS.
Created this issue because I didn't want discussion about this to be lost in IRC.
The text was updated successfully, but these errors were encountered: