Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for setjmp / longjmp #2625

Open
gnzlbg opened this issue Jan 20, 2019 · 54 comments
Open

Support for setjmp / longjmp #2625

gnzlbg opened this issue Jan 20, 2019 · 54 comments
Labels
A-ffi FFI related proposals. A-machine Proposals relating to Rust's abstract machine. T-lang Relevant to the language team, which will review and decide on the RFC.

Comments

@gnzlbg
Copy link
Contributor

gnzlbg commented Jan 20, 2019

Motivation

rust-lang/libc#1216 proposes adding setjmp and longjmp to libc. These functions are required to interface with some C libraries, but sadly these functions are currently impossible to use correctly. Users deserve a solution that works and warns or prevents common pitfalls.

Issues with current solution

The first issue is that libc cannot add LLVM returns_twice attribute to setjmp:

This attribute indicates that this function can return twice. The C setjmp is an example of such a function. The compiler disables some optimizations (like tail calls) in the caller of these functions.

Basically, LLVM assumes that setjmp can return only once without this attribute, potentially leading to miscompilations (playground):

unsafe fn foo() -> i32 {
    let mut buf: jmp_buf = [0; 8];
    let mut x = 42;
    if setjmp(&mut buf) != 0 {  // Step 0: setjmp returns 0
        // Step 3: when setjmp returns 1 x has always been
        // modified to be  == 13 so this should always return 13:
        return x;
    }
    x = 13; // Step 1: x is modified
    longjmp(&mut buf, 1); // Step 2: jumps to Step 0 returning 1
    x // this will never be reached
}

// In debug builds foo returns 13 correctly, but 
// in release builds foo returns 42
assert_eq!(unsafe { foo() }, 13); // FAILs in release

Because setjmp is not returns_twice, LLVM assumes that the return x; will only be reached before x = 13, so it will always return 42. However, setjmp returns 0 the first time, and returns 1 when jumped into it from the longjmp.

Using a volatile load instead works around this issue (e.g. playground). Basically, if stack variables are modified after a setjmp, all reads and writes until all possible longjmps will probably need to be volatile.

One way to improve this could be to add an stable attribute #[returns_twice] that users can use to mark that extern "C" { ... } functions can return multiple times and for Rust to handle these functions specially (e.g. by emitting the corresponding LLVM attribute). An alternative would be for Rust to provide these functions as part of the language.


Modulo LLVM-level misoptimizations, C only allows setjmp in specific "contexts" [0], and these features interact badly with languages with destructors. Rust does not guarantee that destructors will run (e.g. mem::forget is safe), so skipping destructors using longjmp is not unsound [1]. However, unsafe code will need to take into account that code outside it can use longjmp to skip destructors when creating safe abstractions (e.g. see Observational equivalency and unsafe code).

More worrying is how longjmp subverts the borrow checker to, e.g., produce undefined behavior of the form use-after-move without triggering a type error (playground):

fn bar(_a: A) { println!("a moved") }
fn foo() {
    let mut buf: jmp_buf = [0; 8];
    let a = A;
    if unsafe { setjmp(&mut buf) } != 0 {  // Step 0: setjmp returns 0
        bar(a);  // Step 3: a is moved _again_ (UB: use-after-move)
        return;
    }
    bar(a); // Step 1: a is moved here
    unsafe { longjmp(&mut buf, 1) };  // Step 2: jumps to Step 0 returning 1
}

This prints "a moved" twice, which means that the variable a was moved from twice, so the second time an use-after-move happened which type-checked. Obviously, longjmp is not the only way to achieve this in Rust, e.g. it is trivial to use the pointer methods to do so as well, but longjmp combined with Drop types makes this happen with no effort (playground):

struct A; impl Drop for A { ... }
fn bar(_a: A) { 
   // _a is dropped here
}
fn foo() {
    let mut buf: jmp_buf = [0; 8];
    let a = A;
    if unsafe { setjmp(&mut buf) } != 0 {
        // use-after-move: a has been moved (and dropped) below 
        // but a is used (and dropped) in this branch again 
        // => double-drop => UB 
        return; 
    }
    bar(a); // moves a 
    unsafe { longjmp(&mut buf, 1) };
}

That is, using setjmp+longjmp to create double-drops (undefined behavior) is trivial.

Finally, there are problems with creating wrappers around these functions (playground):

fn foo(buf: &mut jmp_buf) {
    let mut a: i32 = 42;
    if unsafe { setjmp(buf) } != 0 {
        dbg!(a);  // use-after-free
        panic!("done");
    }
    a = 13;
}

fn main() {
    let mut buf: jmp_buf = [0; 8];
    foo(&mut buf);
    let b: i32 = 666;
    dbg!(b);
    unsafe { longjmp(&mut buf, 1); }
}

Prints b = 666 and a = 0. The problem here is that this code saves the stack pointer inside foo, but then foo returns, and afterwards the longjmp jumps to a stack frame that is no longer live, so dbg!(a) reads a after it has been free'd (use-after-free).

There are probably many other problems with these two functions, that does not mean that they are impossible to use correctly. Still, it would be good to have a solution here that at least warns about potentially incorrect usages since reasoning about these and the surrounding unsafe code is often very tricky.

It would also be good to have a way to soundly model these in miri and detect when they are used incorrectly.

At a minimum we should be able to write down documentation for these functions in Rust. Where exactly can they be used, what does the unsafe code surrounding them need to uphold to be correct, etc.


cc @nikomatsakis (wrote blog post about observational equivalence and unsafe code), @rkruppe , @RalfJung @ubsan - the unsafe code guidelines should probably say whether extern "C" functions are allowed to modify the stack pointer etc. like setjmp/longjmp do.


An invocation of the setjmp macro shall appear only in one of the following contexts:

  • the entire controlling expression of a selection or iteration statement;
  • one operand of a relational or equality operator with the other operand an integer constant expression, with the resulting expression being the entire controlling expression of a selection or iteration statement;
  • the operand of a unary ! operator with the resulting expression being the entire controlling expression of a selection or iteration statement; or
  • the entire expression of an expression statement (possibly cast to void).

While C11 7.13.1.1p5 states:

If the invocation appears in any other context, the behavior is undefined.

That is, let x = setjmp(...); would be UB in C. In Rust having a result of a function be usable only in some contexts would be weird.

  • [1] In C++, skipping destructors with longjmp is undefined behavior, e.g., [support.runtime] states:

If any automatic objects would be destroyed by a thrown exception transferring control to another (destination) point in the program, then a call to longjmp(jbuf, val) at the throw point that transfers control to the same (destination) point has undefined behavior.

@Centril Centril added T-lang Relevant to the language team, which will review and decide on the RFC. A-ffi FFI related proposals. A-machine Proposals relating to Rust's abstract machine. labels Jan 20, 2019
@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jan 20, 2019

Something like this would add zero checks over what C offers, moving the responsibility for solving all these issues to the user. Particularly, Drop values moved in one branch need to be manually mem::forget'ed in all others to prevent them from being used in a double-drop.

MVP

Add setjmp and longjmp with the following semantics:

pub type JmpBuf = <implementation-defined>;

/// Saves the current execution context into `env`. If `longjmp(env, val)` is used 
/// to restore execution at this call site, this function returns `val`. Otherwise, 
/// it returns `0`. 
#[rustc_returns_twice]
pub unsafe fn setjmp(env: &mut JmpBuf) -> c_int;

/// Restores execution at the `setjmp` call-site that stored `env`, returning 
/// there the `value` (if `value == 0`, then `setjmp` returns `1`).
///
/// If the function that called `setjmp` exits before the call to `longjmp`, the behavior is 
/// undefined.
///
/// All variables for which `drop` would be called if `longjmp` was to be replaced with 
/// `panic!` and `setjmp` with `catch_unwind` are not dropped.
pub unsafe fn longjmp(env: &JmpBuf, value: c_int);

The invocation of setjmp can appear only in the following contexts:

  • the entire controlling expression of match, e.g. match setjmp(env) { ... }.
  • if setjmp(env) $integer_relational_operator $integer_constant_expression { ... }
  • the entire expression of an expression statement: setjmp(env);

If setjmp appears in any other context, the behavior is undefined.

Upon return to the scope of setjmp, all accessible objects, floating-point status flags, and other components of the abstract machine have the same values as they had when longjmp was executed, except for the local variables in setjmp's scope whose values have been changed since the setjmp invocation - the behavior of reading / writing to these variables via non-volatile reads/write is undefined.

@jeff-davis
Copy link

Ultimately, what I am hoping to achieve is something like: longjmp_panic!(my_ffi_function(x,y,z))

Which would call my_ffi_function() and return normally, or panic if my_ffi_function() longjmps.

It would also be nice to have: longjmp_catch!(my_ffi_function(x,y,z)) which would return a Result (Ok if normal return, Err if longjmp).

The MVP suggested above allows such a macro to be written, so I am in favor of that approach (at least until we find something better).

There may be opportunity for something higher-level than the MVP that can be more safely exposed to the user and more easily documented (and leave rust more free to change the internals). The challenge is that, even if this is the only use case to support, we need some knowledge about how the FFI function might call longjmp().

My use case is postgres, to do something like PG_TRY():

#define PG_TRY()  \
    do { \
        sigjmp_buf *save_exception_stack = PG_exception_stack; \
        ErrorContextCallback *save_context_stack = error_context_stack; \
        sigjmp_buf local_sigjmp_buf; \
        if (sigsetjmp(local_sigjmp_buf, 0) == 0) \
        { \
            PG_exception_stack = &local_sigjmp_buf

#define PG_CATCH()  \
        } \
        else \
        { \
            PG_exception_stack = save_exception_stack; \
            error_context_stack = save_context_stack

#define PG_END_TRY()  \
        } \
        PG_exception_stack = save_exception_stack; \
        error_context_stack = save_context_stack; \
    } while (0)

All of this depends on longjmp() being called with the argument *PG_exception_stack. If it's called with anything else, our previous setjmp() is useless.

So, if we want to make something higher-level in rust, it would need a way to specify what this magic global variable is.

@RalfJung
Copy link
Member

Modulo LLVM-level misoptimizations, C only allows setjmp in specific "contexts" [0], and these features interact badly with languages with destructors. Rust does not guarantee that destructors will run (e.g. mem::forget is safe), so skipping destructors using longjmp is not unsound [1].

It's not that easy. For example, setjmp/longjmp are incompatible with closure-based APIs that expect "well-bracketed control flow", such as crossbeam::scope: you could use longjmp to skip the part of the code that is otherwise guaranteed to run. This has nothing to do with dropping.

Moreover, mem::forget leaks memory and does not call the destructor. setjmp/longjmp deallocates memory without calling its destructor, and that's unsound. For pinned memory we guarantee that once some memory is pinned, if this memory every gets invalidated (e.g., deallocated), then the destructor will first get called. Any mechanism that pops stack frames without calling destructors breaks this guarantee (well, at least it makes stack-pinning impossible, and that is intended to be allowed).

In other words, the "observational equivalence" argument does not apply in this case, setjmp/longjmp increases the observational power of the context. (This is a well-known result about continuations in PL theory, it is not a particular quirk of how they got implemented in C.) I don't think there is any way setjmp/longjmp can ever be used to "jump across" unknown code. This can only be sound if you can carefully control the stack frames that get popped by the longjmp, and make sure that they contain no types where skipping drop is a problem.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jan 22, 2019

is not unsound

To clarify, what I meant by "is not unsound" is that performing a longjmp does not instantaneously invoke undefined behavior because it could make a lot of live objects instantaneously invalid - AFAICT longjmp does not do that.

I did not meant to say that it is not unsound in the sense that safe abstractions around unsafe code written without considering setjmp/longjmp remain safe - as you mention, they don't.

Moreover, mem::forget leaks memory and does not call the destructor. setjmp/longjmp deallocates memory without calling its destructor, and that's unsound.

That's a good point.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jan 24, 2019

@RalfJung I think C++'s semantic might be what we need:

If replacing of longjmp with panic!() and setjmp with panic::catch_unwind would execute a non-trivial destructor for any automatic object when panic=unwind, the behavior of such longjmp is undefined.

We might just need to replace "non-trivial destructor" with something else, and also setjmp cannot just be replaced with catch_unwind, but basically all the code from the setjmp to the longjmp would need to be inside a catch_unwind.

@eaglgenes101
Copy link

eaglgenes101 commented Jan 24, 2019

C's setjmp/longjmp is essentially a C-styled implementation of continuations, and this C flavor definitely shows in how it works. More refined continuation mechanisms are present in higher-level langauges, especially functional languages, and the same #[returns_twice] needed to support setjmp without causing undefined behavior would also help make it possible to implement continuations reasonably safely in Rust.

As for setjmp/longjmp itself, setjmp can be wrapped up to be safe in the same sense that a raw pointer that is never dereferenced is safe. longjmp, on the other hand, is crazy unsafe. That thing can punch holes straight down the call stack, which not only allows for memory leaks as large as dam breaks, it completely subverts any mechanism that a library could use to ensure that code which must run after user code to hold up partial correctness of the API actually does run. Even in C, I never use it, because I find that keeping straight what happened between a landing point I might want to use for setjmp and wherever it might be useful to longjmp back to it is nearly impossible without garbage collection or dedicated bookkeeping, neither of which I am inclined to take the time to write and debug. However, others use setjmp/longjmp, so we may as well figure out how to make it happen for C FFI purposes.

From what I can tell, this is a (probably horribly incomplete) list of what we must make undefined to permit all of Rust's existing safety invariants to continue to hold up:

  • longjmping to a jmp_buf which does not correspond to a stack frame, to an uninitialized jmp_buf, to a jmp_buf corresponding to the stack frame of a function which has already returned, or to a jmp_buf corresponding to a stack frame on a stack which is still being used by another thread
  • longjmping ahead over initialization of memory, and subsequently using this uninitialized memory
  • longjmping such that deallocated or leaked memory on the heap, or memory in stack frames above the jmp_buf stack frame, is accessible by reference, and subsequently dereferencing such a reference
  • longjmping to a jump_buf point between the initialization of an !Unpin type value and its pinning, and subsequently moving the value
  • In general, using longjmp to jump back to a point before a safety invariant needs to hold while something else expects the safety invariant to still hold and subsequently violating the safety invariant, or performing an operation which would violate a safety invariant at the point setjmp but is permitted because it is relaxed at a later point, then longjmping back to the jmp_buf without restoring the safety invariant
  • longjmping over the destructor of a pinned value that has live references to it
  • longjmping after modification of values held at the point of setjmp (either by borrowing them or owning them without letting them be mutably borrowed), unless these values had interior mutability at the time
  • longjmping over catch_unwind

Presumably a #[returns_twice] attribute would also come with borrow checker additions that account for the multiple return property of functions annotated with it, so I didn't include it in the above list. I'm still looking for a concrete example where longjmping over the destructor of a stack-pinned object can cause undefined behavior in otherwise safe code, but so far, I'm stuck. Turns out the requirement that stack destructors of Pinned values run before deallocation is needed to ensure you can safely have intrusive collections.

@jeff-davis
Copy link

FWIW, here is the macro I am currently using, which seems like a valid use of sigsetjmp(), and the MVP above would satisfy my requirements.

#[macro_export]
macro_rules! longjmp_panic {
    ($e:expr) => {
        let retval;
        unsafe {
            use postgres_extension::utils::elog
                ::{PG_exception_stack,
                   error_context_stack,
                   PgError};
            use postgres_extension::setjmp::{sigsetjmp,sigjmp_buf};
            let save_exception_stack: *mut sigjmp_buf = PG_exception_stack;
            let save_context_stack: *mut ErrorContextCallback = error_context_stack;
            let mut local_sigjmp_buf: sigjmp_buf = std::mem::uninitialized();
            if sigsetjmp(&mut local_sigjmp_buf, 0) == 0 {
                PG_exception_stack = &mut local_sigjmp_buf;
                retval = $e;
            } else {
                PG_exception_stack = save_exception_stack;
                error_context_stack = save_context_stack;
                panic!(PgError);
            }
            PG_exception_stack = save_exception_stack;
            error_context_stack = save_context_stack;
        }
        retval
    }
}

It is intended to wrap a FFI function call which, when evaluated, may call siglongjmp() in accordance with the postgresql convention for throwing an exception. It can be used in a function like this:

#[inline(never)]
fn wrap_elog() {
    longjmp_panic! { elog_internal(file!(), line!(), ERROR, "test error") }
}

(In the above example, elog_internal() will always throw an exception and longjmp(). It's not a direct FFI call in this particular example, but it's close enough for illustration.)

I looked at the optimized IR, and it looks fine to me, but someone who understands this better might see a problem:

; udf_error::wrap_elog
; Function Attrs: noinline nonlazybind uwtable
define internal fastcc void @_ZN9udf_error9wrap_elog17he7efb2354edb7c3eE() unnamed_addr #\
8 {
start:
  %local_sigjmp_buf = alloca %"postgres_extension::setjmp::sigjmp_buf", align 8
  %0 = load i64, i64* bitcast (%"postgres_extension::setjmp::sigjmp_buf"** @PG_exception_\
stack to i64*), align 8
  %1 = load i64, i64* bitcast (%"postgres_extension::utils::elog::ErrorContextCallback"**\
 @error_context_stack to i64*), align 8
  %2 = bitcast %"postgres_extension::setjmp::sigjmp_buf"* %local_sigjmp_buf to i8*
  call void @llvm.lifetime.start.p0i8(i64 208, i8* nonnull %2)
  %3 = call i32 @__sigsetjmp(%"postgres_extension::setjmp::sigjmp_buf"* nonnull %local_si\
gjmp_buf, i32 0)
  %4 = icmp eq i32 %3, 0
  br i1 %4, label %bb3, label %bb4

bb3:                                              ; preds = %start
  store %"postgres_extension::setjmp::sigjmp_buf"* %local_sigjmp_buf, %"postgres_extensio\
n::setjmp::sigjmp_buf"** @PG_exception_stack, align 8
; call postgres_extension::utils::elog::elog_internal
  call void @_ZN18postgres_extension5utils4elog13elog_internal17h0f179afe9978ab59E([0 x i\
8]* noalias nonnull readonly bitcast (<{ [10 x i8] }>* @1 to [0 x i8]*), i64 10, i32 14, \
i32 20, [0 x i8]* noalias nonnull readonly bitcast (<{ [10 x i8] }>* @2 to [0 x i8]*), i6\
4 10)
  store i64 %0, i64* bitcast (%"postgres_extension::setjmp::sigjmp_buf"** @PG_exception_s\
tack to i64*), align 8
  store i64 %1, i64* bitcast (%"postgres_extension::utils::elog::ErrorContextCallback"** \
@error_context_stack to i64*), align 8
  call void @llvm.lifetime.end.p0i8(i64 208, i8* nonnull %2)
  ret void

bb4:                                              ; preds = %start
  store i64 %0, i64* bitcast (%"postgres_extension::setjmp::sigjmp_buf"** @PG_exception_s\
tack to i64*), align 8
  store i64 %1, i64* bitcast (%"postgres_extension::utils::elog::ErrorContextCallback"** \
@error_context_stack to i64*), align 8
; call std::panicking::begin_panic
  call fastcc void @_ZN3std9panicking11begin_panic17hc642c894ac6ffd69E()
  unreachable
}

This usage is quite constrained, but it serves my purposes. Because the wrapper function is always almost identical (just the FFI call expression changes), and because it's marked #[inline(never)], there are unlikely to be any more optimizations applied that could cause a problem.

Note that sigsetjmp() is declared as:

; Function Attrs: nounwind nonlazybind uwtable
declare i32 @__sigsetjmp(%"postgres_extension::setjmp::sigjmp_buf"*, i32) unnamed_addr #4

but it should have the returns_twice attribute as well. This means that it may be undefined behavior, but in such a constrained use I don't see an obvious way that it would misoptimize it.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Feb 5, 2019

This usage is quite constrained, but it serves my purposes.

I think this should work as long as we use the returns_twice attribute for sigsetjmp, and as long as the longjmp from the C code does not introduce undefined behavior in the C code (e.g. because it skips C++ or Rust destructors).

@comex
Copy link

comex commented Feb 5, 2019

In practice, returns_twice in LLVM only affects a few things, but the things it does affect are important:

  • Disables stack slot coloring, aka reusing the same stack slot for unrelated variables with disjoint lifetimes (and ditto for WebAssembly locals);
  • When AddressSanitizer is on, disables allocating locals on the heap (for somewhat dubious reasons);
  • Disables tail call optimization;
  • Ensures that functions that call returns_twice functions cannot themselves be inlined into other functions (i.e. what @jeff-davis replicated manually with #[inline(never)]);
  • On SPARC, prevents assuming that the register window mechanism will automatically restore registers on return;
  • When X86SpeculativeLoadHardening is enabled (Spectre paranoia mode), change a gadget added after roughly every call, which normally tries to read from the callee's return address stack slot to ensure the return to the current IP wasn't mispredicted.

At least the AddressSanitizer, X86SpeculativeLoadHardening, and SPARC ones seem potentially relevant even to a simple function like wrap_elog, so I would recommend ensuring that the returns_twice attribute is added.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Feb 6, 2019

At least the AddressSanitizer, X86SpeculativeLoadHardening, and SPARC ones seem potentially relevant even to a simple function like wrap_elog, so I would recommend ensuring that the returns_twice attribute is added.

The path of least resistance would be to add a #[rustc_returns_twice] attribute to rustc, and add these functions to libc, exposing them only for libc's that are built as part of the Rust standard library, using that attribute in that case. We then would need to expose those functions through standard somehow.

@eaglgenes101
Copy link

eaglgenes101 commented Feb 6, 2019

#[returns_twice] would also alert the borrow checker to the multiple-return nature of functions annotated with it, so it should be able to figure out that the example given is invalid:

fn bar(_a: A) { println!("a moved") }
fn foo() {
    let mut buf: jmp_buf = [0; 8];
    let a = A;
    if unsafe { setjmp(&mut buf) } != 0 {  
        bar(a);  // error[EO382]: use of moved value: `a`
        return;
    }
    bar(a); // Note: `a` was previously moved here
            // Note: `setjmp` may return multiple times
    unsafe { longjmp(&mut buf, 1) };  
}

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Feb 6, 2019

@eaglgenes101 I don't think it can work like that, at least if the lint is only based on #[returns_twice]. This is a different way of writing that example:

fn bar(_a: A) { println!("a moved") }
fn foo() {
    let mut buf: jmp_buf = [0; 8];
    let a = A;
    if unsafe { setjmp(&mut buf) } == 1 {  
        bar(a);
        return;
    } else {
        bar(a);
    }
}

At best we could error stating that different branches depending on a #[return_twice] "value" move the same value (if the value is not Copy), but I doubt the error message can get better than that without actually specifying how setjmp and friends work (e.g. returning 0 on first call, and non-zero when being jumped into). We'd probably need a #[setjmp_like] attribute for that.

@briansmith
Copy link

Why is it important for Rust to support setjmp/longjmp at all? The motivation says "Users deserve a solution that works and warns or prevents common pitfalls." However, that is not motivation as far as an RFC is concerned; it is just a statement of opinion with no supporting evidence.

Strawman: Document in the libc crate that they are intentionally omitted. Document that using them at all is undefined behavior. Motivation: Avoiding this functionality saves a lot of complexity and time, and it is unlikely that any attempt to implement it would be correct any time soon.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Feb 6, 2019

@briansmith some C libraries emulate exceptions using setjmp/longjmp and use that as their error reporting mechanism (e.g. Postgres).

@briansmith
Copy link

some C libraries emulate exceptions using setjmp/longjmp and use that as their error reporting mechanism (e.g. Postgres).

Some C++ libraries use C++ exceptions too, but Rust doesn't support them either, even though they're much more reasonable than setjmp/longjmp.

Of course it's nice to support Postgres but I think we should try to find ways of doing it that don't involve adding setjmp/longjmp to Rust.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Feb 6, 2019

Some C++ libraries use C++ exceptions too,

Rust has C FFI, AFAIK it does not have C++ FFI (yet).

These are C functions available on all systems with a C standard library and are easily callable via C FFI. If we want Rust to be able to interoperate with all correct and potentially legacy C code out there via its C FFI, we need to support them in some form. There is precedence of supporting bad and dangerous C APIs in Rust C FFI for the purpose of being able to interface with C code that uses them (e.g. va_args).

Of course it's nice to support Postgres but I think we should try to find ways of doing it that don't involve adding setjmp/longjmp to Rust.

I would be interested in better ways of interfacing with Postgres, and I think @jeff-davis would be interested too. My first suggestion was for @jeff-davis to write a C wrapper that performs the setjmp/longjmp without propagating it to Rust, so that Rust code calling the wrapper does not have to use these functions (in a similar way how one would wrap C++ code with a C api, catching all exceptions, before interfacing with Rust code).

@briansmith
Copy link

If we want Rust to be able to interoperate with all correct and potentially legacy C code out there via its C FFI, we need to support them in some form.

I don't think that is a goal, and I don't think it should be, especially in cases where one can work around the issues by writing a little bit of C that wraps the other C code and resolves the issue. I don't know about va_args issue and the decision making in it. I think setjmp/longjmp has a lot broader negative impact on Rust than va_args. (I am no fan of va_args either, BTW.)

@strega-nil
Copy link

strega-nil commented Feb 7, 2019

I would argue it really doesn't have a lot of impact. One might add some extra cases to the borrow checker, but other than that it's fairly similar to simple exception handling, something that Rust already has.

On the other hand, would it be possible to make it #[rustc::returns_twice]? That seems like the best way to do namespacing here. Although it seem better to call it #[returns_twice] or something along those lines, for FFI?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Feb 7, 2019

On the other hand, would it be possible to make it #[rustc::returns_twice]?

So currently all rustc internal attributes start with #[rustc_...], if this were to be changed, it would make sense to do that for all internal attributes at once.

Although it seem better to call it #[returns_twice] or something along those lines, for FFI?

That's an open question. Is #[returns_twice] as an attribute useful enough to warrant going through the stabilization process ? Also, I worry that doing this might be stabilizing an LLVM-ism. Would this attribute be useful for Cranelift (cc @sunfishcode ) ?

@briansmith
Copy link

briansmith commented Feb 7, 2019

Here's what compcert's documentation has to say about setjmp/longjmp:
http://compcert.inria.fr/man/manual005.html:

§7.13 Non-local jumps <setjmp.h>
The CompCert C compiler has no special knowledge of the setjmp and longjmp functions, treating them as ordinary functions that respect control flow. It is therefore not advised to use these two functions in CompCert-compiled code. To prevent misoptimisation, it is crucial that all local variables that are live across an invocation of setjmp be declared with volatile modifier.

@comex
Copy link

comex commented Feb 7, 2019

GCC, for one, supports a returns_twice attribute.

@jeff-davis
Copy link

@briansmith A lot of languages work well with C in the easy cases. I love rust because it works well with C in the hard cases.

I would really like to allow pure rust extensions to be written for postgres. Introducing C creates a lot of headaches that I'd rather not deal with.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Feb 8, 2019

@comex rust-lang/libc does not expose vfork, but that comment suggest that #[returns_twice] would also be required there. The blog post vfork considered dangerous by Rich Felker (main author/developer of musl), is a great read, it explains:

[vfork] works like fork, but without creating a new virtual address space; the parent and child run in the same address space. Unlike with pthread_create, where the new thread runs on its own stack, vfork behaves like fork and “returns twice”, once in the child and once in the parent. This seems impossible, since the parent and child would clobber one another’s stacks, but a clever trick saves the day: the parent process is suspended until the child performs exec or _exit, breaking the shared-memory-space relation between the two processes.

vfork was omitted from POSIX and modern standards because it’s difficult to use [...] However, many systems (including Linux) still provide a similar or identical interface at the kernel level, and new interest in its use has arisen again due to the fact that huge processes cannot fork on systems with strict commit charge accounting due to lack of memory, and the fact that copying the virtual memory layout of a process can be expensive if the process has a huge number of maps created by mmap. As such, both musl and glibc use vfork to implement posix_spawn, the modern interface for executing external programs as a new process.

If it were only setjmp, then a hack in the form of a #[rustc_returns_twice] private attribute might have been ok, but it appears that there are multiple low-level platform APIs that can invalidate the optimizers assumptions by being able to return multiple times and people might want to create bindings to these in their own code instead of only accessing them via the standard library or libc.So i'm leaning towards having a real #[returns_twice] attribute for this.

Does anybody know why it is called returns_twice in GCC? AFAICT vfork can only return twice, but setjmp can return multiple times. I don't know if this distinction matters. GCC describes returns_twice as:

The returns_twice attribute tells the compiler that a function may return more than one time.

But we probably want to keep the same name as C since that is what people using these interfaces will probably search for.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Feb 8, 2019

Unrelated question, the GCC docs for returns_twice also mention:

The longjmp-like counterpart of such function, if any, might need to be marked with the noreturn attribute.

Since C++17, the signature of longjmp is:

[[noreturn]] void longjmp( std::jmp_buf env, int status );

so in Rust longjmp should return -> !, right?

@comex
Copy link

comex commented Feb 8, 2019

Does anybody know why it is called returns_twice in GCC? AFAICT vfork can only return twice, but setjmp can return multiple times.

Yes, it's legal in C to longjmp to the same jmp_buf multiple times, which would cause setjmp to return more than twice. I think "twice" is used just because it's shorter than, say, "multiple_times" would be.

so in Rust longjmp should return -> !, right?

I believe so.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 13, 2019

What that commit is doing is the safe bet, because right now the situation is unclear. If I were to do this right now, I would just write a tool that auto-generates a C shims that automatically insert setjmp/longjmp on the C side, only communicating error codes back to Rust.

I think that the best way to attack this problem would be an RFC to add setjmp/longjmp APIs to the language, guaranteeing the interaction of these APIs with the rest of the language. This would probably be a massive language change that's not on the roadmap.

In the meantime, one can play with returns_twice on nightly, but that's just the bare minimum for this to be able to work at all. There are no guarantees that such code is correct.

@jeff-davis
Copy link

I don't see why we'd want to make setjmp defined in all cases -- we just need it to be defined in some cases, just like other unsafe code. So for me, the bare minimum of something like returns_twice is fine, but without some kind of progress toward stable rust, I don't see the point.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 13, 2019

I don't see why we'd want to make setjmp defined in all cases -- we just need it to be defined in some cases, just like other unsafe code.

Currently the behavior of jumping over Rust code is undefined in all cases, and there is no RFC that changes that in any way. There is an RFC for a returns_twice attribute that allows passing an attribute to LLVM, but that RFC does not allow jumping over Rust code (without setting such an attribute, code that does that cannot work, but the attribute is not sufficient for the code to work properly).

@jeff-davis
Copy link

I don't need setjmp to jump over rust code. I just need it to catch jumps out of C code into rust code.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 13, 2019

Still, something like this:

let b = setjmp(&mut buf);
c_ffi_fn(&mut buf); 

has a jump in Rust code if c_ffi_fn is allowed to jump back to setjmp.

The jump is minimal, but it is still there =/

@jeff-davis
Copy link

What if we just defined very tight restrictions for the use of a returns_twice function that are unlikely to be subtly broken later. Something like:

For any function that calls a returns_twice FFI function:

  • Don't inline it into another function
  • Don't apply any optimizations to the rust code for that function (though LLVM may optimize it)
  • Undefined if that function refers to any data types that are allocated on the heap or have a destructor
  • Whatever other restrictions are necessary to make this work.

In other words, I could live with even the most sweeping restrictions if they help avoid the problems that you are worried about. The pattern for something like setjmp is very narrow.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 13, 2019

I'm just pointing out that somebody would need to write an RFC explaining what it means to jump over at least some subset of Rust code, however minimal that is, and push it through the process.

Like even if we were only to allow this pattern:

setjmp(&mut buf);
c_ffi_fn(&mut buf, some_arg);

if some_arg is copy you end up with something like this:

setjmp(&mut buf);
{ 
    let _tmp = copy some_arg;
    c_ffi_fn(&mut buf, _tmp);
}

such that c_ffi_fn can jump over let _tmp = copy some_arg; statement. If we do not define anywhere that doing so is ok, the behavior is not defined.

We have explored here a lot of things that would be ok, and a lot of things that would not. So there is definitely enough material for someone to try to write down such an RFC. We don't have a WG FFI yet, but such work could be part of it, which could help push it through the process (cc @joshtriplett ).

@jeff-davis
Copy link

Thank you. That sounds like a good way to move forward.

@jamessewell
Copy link

Just checking in to see if any off-thread progress has been made?

I'm happy to help - just not sure what that help would be at this stage ...

@amluto
Copy link

amluto commented Aug 24, 2020

I don't know how useful this feedback is, but I use setjmp() regularly in nasty kernel selftests that I've written, and I don't actually want the returns-twice behavior. For example, I use it to exit a block of code from a signal handler. The fact that sigsetjmp returns twice just makes the code more convoluted than it deserves to be.

The feature I would actually want is a lot closer to panic!/catch_unwind. Here's an example that might be semantically sensible even though it has atrocious syntax:

sig_try(|jmpbuf| {
    some code;
    if (condition) { jmpbuf.jump_out() }
})

The sematics are that, if no one calls jmpbuf.jump_out(), then sig_try just runs the code in the closure normally. Otherwise, if the closure calls jump_out, it jumps to the end of the sig_try with similar semantics to panicking and unwinding. except that it's guaranteed to hit the correct landing pad and that it is explicitly permitted to jump out from a signal handler. (The latter bit probably gets tricky if the signal happens at a bad time, so maybe that needs more care.) This could conceivably even be usable from safe Rust.

Some benefits over sigsetjmp: nothing actually returns twice. jump_out doesn't return at all. Each bit of Rust code executes zero or one times. I think that jmpbuf could be a plain (&) reference to a non-Send non-Sync type, so it's impossible for the closure to stash it away anywhere or otherwise cause jump_out to be invoked more than once or to be invoked after the closure returns.

And I think this would give me all of the functionality I have ever wanted from setjmp or sigsetjmp.

@programmerjake
Copy link
Member

IIRC, that's all that's needed when trying to use libpng, as long as you can choose where the jumpbuf is stored and call jump_out on a new reference to that. My guess is the vast majority of setjmp/longjmp uses also fit that pattern.

@Amanieu
Copy link
Member

Amanieu commented Aug 25, 2020

You can implement sig_try using inline assembly, you don't even need naked functions. It's very similar to coroutine stack switching (e.g. libfringe) except you stay on the same stack.

@RalfJung
Copy link
Member

I assume jmpbuf is not allowed to leak from the closure? If so, we'd call this a "delimited continuation" in PL research, which indeed is less powerful and better behaved than general continuations.

yvt added a commit to r3-os/r3 that referenced this issue Sep 8, 2020
…ngjmp`

`pthread_exit` can't unwind through `catch_unwind` anymore because of
[1].

Because the actual `longjmp` isn't supported by Rust at this time and
[3] is tricky to set up, this commit implements something similar using
inline assembler.

[1]: rust-lang/rust#70212
[2]: rust-lang/rfcs#2625
[3]: https://github.com/jeff-davis/setjmp.rs
yvt added a commit to r3-os/r3 that referenced this issue Sep 8, 2020
…ngjmp`

`pthread_exit` can't unwind through `catch_unwind` anymore because of
[1].

Because the actual `longjmp` [2] isn't supported by Rust at this time
and [3] is tricky to set up, this commit implements something similar
using inline assembler.

[1]: rust-lang/rust#70212
[2]: rust-lang/rfcs#2625
[3]: https://github.com/jeff-davis/setjmp.rs
@SchrodingerZhu
Copy link

just some questions (perhaps rust internal is a better place? But here is already an open issue.)
I am investigating to create bindings forkoka-lang/libmprompt. Currently, the work is at https://github.com/SchrodingerZhu/libmprompt-sys

The example translated from C is very promising and it works:
https://github.com/SchrodingerZhu/libmprompt-sys/blob/main/src/lib.rs

But considering that the library relies heavily on the operations in the manner of longjmp, I am afraid that such operations may easily break up rust runtime.
So is there any previous discussion on the potential danger of such operations?

@andy-thomason
Copy link

Was there ever any resolution to this.

In R, for example, errors are thrown using setjmp/longjmp and we would like a way to catch
errors that come from C code in this way.

Currently, I am using a panic as a poor cousin alternative but panics from C-called functions
are very much not guaranteed and fail completely to be caught in many cases (such as Win32).

@amluto, @Amanieu I like the sig_try solution to the problem and would vote in its favour.

@Amanieu
Copy link
Member

Amanieu commented Jul 14, 2021

The question is, where should sig_try (or whatever we chose to name this) actually reside? It seems like a poor fit for the standard library since this is only useful when interacting with C code.

Ideally it should be in libc as some sort of setjmp_wrapper as the only way setjmp can be called from Rust, but this would require that libc use unstable features (either asm! or #[ffi_returns_twice]) which is not currently possible.

A final possibility would be for this to belong in a third party crate which wraps the setjmp call in a C function that is built from build.rs.

@programmerjake
Copy link
Member

The question is, where should sig_try (or whatever we chose to name this) actually reside? It seems like a poor fit for the standard library since this is only useful when interacting with C code.

Well, CStr is basically only used when interacting with C code, yet it's in the standard library -- I don't see how setjmp should be any different.

@programmerjake
Copy link
Member

The question is, where should sig_try (or whatever we chose to name this) actually reside? It seems like a poor fit for the standard library since this is only useful when interacting with C code.

Well, CStr is basically only used when interacting with C code, yet it's in the standard library -- I don't see how setjmp should be any different.

Maybe VaList is a better example, it's in the same category as setjmp: things which require special compiler support.

@Amanieu
Copy link
Member

Amanieu commented Jul 14, 2021

Not exactly: VaList and CStr don't rely on external support in libc, they are entirely implemented within the standard library. This isn't the case for setjmp which is tied to the particular implementation in the libc of the target.

Then there's also vfork, which suffers from the exact same issues as setjmp: the only way to call it safely would be to wrap it so that the child process only executes within a closure, just like sig_try. Should a vfork wrapper also be provided by the standard library?

@programmerjake
Copy link
Member

programmerjake commented Jul 14, 2021

Not exactly: VaList and CStr don't rely on external support in libc, they are entirely implemented within the standard library. This isn't the case for setjmp which is tied to the particular implementation in the libc of the target.

Then there's also vfork, which suffers from the exact same issues as setjmp: the only way to call it safely would be to wrap it so that the child process only executes within a closure, just like sig_try. Should a vfork wrapper also be provided by the standard library?

hmm, maybe instead there could be a macro kinda like this:

// in core/std
macro_rules! returns_twice_wrapper {
    ($vis:vis extern $abi:literal unsafe fn $id:ident($($args:ident: $arg_tys:ty),*) -> $ret:ty;) => {
        $vis unsafe fn $id<R>(f: impl Fn($ret) -> R, $($args: $arg_tys),*) -> R {
            extern $abi {
                #[returns_twice]
                fn $id($($args: $arg_tys),*) -> $ret;
            }
            let v = $id($($args),*);
            f(v)
        }
    };
}

use like:

// in libc
returns_twice_wrapper! {
    pub extern "C-unwind" unsafe fn setjmp(v: *mut jmp_buf) -> c_int;
}

// in user code
fn test_setjmp() {
    unsafe {
        let mut jb: MaybeUninit<jmp_buf> = MaybeUninit::uninit();
        let jb = &mut jb as *mut _;
        setjmp(|v: c_int| {
            if c == 0 {
                println!("between setjmp and longjmp");
                longjmp(jb, 1);
            } else {
                println!("after longjmp");
            }
        }, jb)
    }
}

facebook-github-bot pushed a commit to facebook/sapling that referenced this issue Nov 30, 2023
Summary:
We have seen edenfs crashing with SIGBUS in production. They are caused by
btrfs checksum errors when reading mmap buffers.

This diff adds a SIGBUS handler that zero-fills the page causing SIGBUS on
demand as an attempt to avoid SIGBUS crash. The idea is that the zero-filled
page will fail the indexedlog checksum somehow so it will become a (seemingly
regular) error, and might trigger re-tries (ex. re-fetch via network) to
transparently fix the issues without user intervention.

Considerations and discoveries:
- `setjmp` error handling: hard to verify soundness in Rust.
  See rust-lang/rfcs#2625
- Simply `return` from the signal handler: The SIGBUS will happen again and the
  program enters an infinite loop trying to read the bad data.
- `mmap` support to return zero pages without SIGBUS: There were some interests
  and attempts in the Linux kernel community. However, nothing usable is merged
  yet.

This signal handler assumes the SIGBUS behavior stay consistent during the
process lifetime. Namely, for previously returned `Ok` data, if read again
later from the same process, it should remain `Ok` not `SIGBUS`. It's okay to
get `Ok` consistently, and after a hard reboot, only get `SIGBUS`. While
practically this assumption seems okay to make (and the use of `mmap` relies on
the assumption that the returned `Ok` data remain unchanged), it is hard to
predicate all corner cases, and the code is optimized for easy-to-understand
rather than "async signal safety". Therefore, the feature is default off for
now.

Reviewed By: zzl0

Differential Revision: D51623839

fbshipit-source-id: 506b676ee1c6d4075a02417622adb0d3477c7a7a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ffi FFI related proposals. A-machine Proposals relating to Rust's abstract machine. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging a pull request may close this issue.