Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

illegal hardware instruction when using ffi+luajit on beta and nightly >=2021-03-11 #83541

Closed
khvzak opened this issue Mar 27, 2021 · 9 comments
Labels
A-FFI Area: Foreign function interface (FFI) regression-from-stable-to-beta Performance or correctness regression from stable to beta.

Comments

@khvzak
Copy link

khvzak commented Mar 27, 2021

Hello, I'm mlua maintainer and found a regression when running tests for the library on the latest beta and nightly.
I narrowed it down to a minimal example:

Code

main.rs

mod ffi {
    use std::os::raw::{c_int, c_void};

    type LuaCFn = unsafe extern "C" fn(L: *mut c_void) -> c_int;

    extern "C" {
        pub fn luaL_newstate() -> *mut c_void;
        pub fn lua_pcall(L: *mut c_void, nargs: c_int, nresults: c_int, errfunc: c_int) -> c_int;
        pub fn lua_gettable(L: *mut c_void, idx: c_int);
        pub fn lua_pushcclosure(L: *mut c_void, f: LuaCFn, n: c_int);
        pub fn lua_pushnil(L: *mut c_void);
    }

    // #[inline(always)]
    pub unsafe fn lua_gettable2(state: *mut c_void, idx: c_int) {
        lua_gettable(state, idx);
    }
}

pub unsafe fn test_me() {
    let state = ffi::luaL_newstate();
    assert!(!state.is_null());

    unsafe extern "C" fn run_me(state: *mut std::os::raw::c_void) -> std::os::raw::c_int {
        ffi::lua_pushnil(state);
        ffi::lua_pushnil(state);
        ffi::lua_gettable2(state, -2); // Changing this to `ffi::lua_gettable` solves the problem
        1
    }

    ffi::lua_pushcclosure(state, run_me, 0);
    assert!(ffi::lua_pcall(state, 0, -1, 0) != 0);
}

fn main() {
    unsafe { test_me() }
}

Cargo.toml

[package]
name = "luajit_test"
version = "0.1.0"
edition = "2018"
build = "build.rs"

[build-dependencies]
luajit-src = "*"

build.rs

fn main() {
    let artifacts = luajit_src::Build::new().build();
    artifacts.print_cargo_metadata();
    println!("cargo:rerun-if-changed=build.rs");
}

Repo: https://github.com/khvzak/luajit_bug

It works on stable rust, but fails on beta and nightly with illegal hardware instruction on macos 10.15 and linux (at least).

It works if uncomment #[inline(always)] OR replace ffi::lua_gettable2 call with ffi::lua_gettable.
Reproducible only using LuaJIT, but not on Lua 5.1.

Version it worked on

It most recently worked on: nightly-2021-03-10. Does not work on nightly-2021-03-11.

Version with regression

rustc --version --verbose:

rustc 1.52.0-nightly (f98721f88 2021-03-10)
binary: rustc
commit-hash: f98721f886ab52d32d622ad0a46216ad03f3e525
commit-date: 2021-03-10
host: x86_64-unknown-linux-gnu
release: 1.52.0-nightly
LLVM version: 12.0.0
@sfackler
Copy link
Member

You have declared run_me as a C ABI function, which Rust currently does not allow to unwind. However, LuaJIT is throwing an exception. The change in beta/nightly is to inject some code to cause the process to abort if the thread unwinds through a C ABI function: #52652.

There is a "C-unwind" ABI (currently nightly only), which is the C calling convention but allowed to unwind.

@sfackler
Copy link
Member

For reference, this diff fixes the UB:

diff --git a/src/main.rs b/src/main.rs
index c66bd3d..58d322f 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -1,7 +1,9 @@
+#![feature(c_unwind)]
+
 mod ffi {
     use std::os::raw::{c_int, c_void};

-    type LuaCFn = unsafe extern "C" fn(L: *mut c_void) -> c_int;
+    type LuaCFn = unsafe extern "C-unwind" fn(L: *mut c_void) -> c_int;

     extern "C" {
         pub fn luaL_newstate() -> *mut c_void;
@@ -21,7 +23,7 @@ pub unsafe fn test_me() {
     let state = ffi::luaL_newstate();
     assert!(!state.is_null());

-    unsafe extern "C" fn run_me(state: *mut std::os::raw::c_void) -> std::os::raw::c_int {
+    unsafe extern "C-unwind" fn run_me(state: *mut std::os::raw::c_void) -> std::os::raw::c_int {
         ffi::lua_pushnil(state);
         ffi::lua_pushnil(state);
         ffi::lua_gettable2(state, -2); // Changing this to `ffi::lua_gettable` solves the problem

@khvzak
Copy link
Author

khvzak commented Mar 27, 2021

Thanks Steven for the explanation!

What is confuses me that it's reproducible only when using LuaJIT. I ran a full test suit on Linux/Windows/MacOS and nightly rust (excluding LuaJIT) and it passed (https://github.com/khvzak/mlua/actions/runs/692598095)

I expected the same behaviour regardless of Lua version, they all should use setjmp/longjmp for exception handling.
I see the PR #76570 that changed the behaviour but haven't yet went through the details.

UPD: I'm definitely wrong, LuaJIT uses it's own frame unwinding implementation not based on setjmp/longjmp.

@sfackler
Copy link
Member

Yeah, it is using the C++ unwinding API:

* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x00007fff2a7cd631 libunwind.dylib`_Unwind_RaiseException
    frame #1: 0x0000000100009e48 luajit_test`lj_err_throw [inlined] err_raise_ext(errcode=<unavailable>) at lj_err.c:300:3 [opt]
    frame #2: 0x0000000100009e1c luajit_test`lj_err_throw(L=0x000000010022e380, errcode=2) at lj_err.c:514 [opt]
    frame #3: 0x0000000100009fbc luajit_test`lj_err_run(L=0x000000010022e380) at lj_err.c:623:3 [opt]
    frame #4: 0x000000010000a159 luajit_test`err_msgv(L=<unavailable>, em=<unavailable>) at lj_err.c:636:3 [opt]
    frame #5: 0x000000010000a1ff luajit_test`lj_err_optype(L=0x000000010022e380, o=<unavailable>, opm=LJ_ERR_OPINDEX) at lj_err.c:672:3 [opt]
    frame #6: 0x000000010000de8f luajit_test`lj_meta_tget(L=0x000000010022e380, o=0x000000010022fc38, k=0x000000010022fc40) at lj_meta.c:147:7 [opt]
    frame #7: 0x000000010001647f luajit_test`lua_gettable(L=0x000000010022e380, idx=<unavailable>) at lj_api.c:826:16 [opt]
    frame #8: 0x0000000100004094 luajit_test`luajit_test::ffi::lua_gettable2::h4d1817184f7e1edc(state=0x000000010022e380, idx=-2) at main.rs:16:9
    frame #9: 0x0000000100003f5c luajit_test`luajit_test::test_me::run_me::h21bdc8b42612ef14(state=0x000000010022e380) at main.rs:27:9
    frame #10: 0x0000000100005e05 luajit_test`lj_BC_FUNCC + 68
    frame #11: 0x0000000100016552 luajit_test`lua_pcall(L=<unavailable>, nargs=<unavailable>, nresults=<unavailable>, errfunc=<unavailable>) at lj_api.c:1169:12 [opt]
    frame #12: 0x0000000100003efc luajit_test`luajit_test::test_me::hea87b0187a61e91a at main.rs:32:13
    frame #13: 0x0000000100003f79 luajit_test`luajit_test::main::h82de456d29ced77f at main.rs:36:14
    frame #14: 0x000000010000410e luajit_test`core::ops::function::FnOnce::call_once::hbde686e067bb3c66((null)=(luajit_test`luajit_test::main::h82de456d29ced77f at main.rs:35), (null)=<unavailable>) at function.rs:227:5
    frame #15: 0x0000000100004021 luajit_test`std::sys_common::backtrace::__rust_begin_short_backtrace::h69ec8b0c6c660907(f=(luajit_test`luajit_test::main::h82de456d29ced77f at main.rs:35)) at backtrace.rs:125:18
    frame #16: 0x0000000100003e24 luajit_test`std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::hdd4e1056a49aed1e at rt.rs:66:18
    frame #17: 0x0000000100071d44 luajit_test`std::rt::lang_start_internal::h86f505dc7de50d93 [inlined] core::ops::function::impls::_$LT$impl$u20$core..ops..function..FnOnce$LT$A$GT$$u20$for$u20$$RF$F$GT$::call_once::h0e377e204feaadc3 at function.rs:259:13 [opt]
    frame #18: 0x0000000100071d3d luajit_test`std::rt::lang_start_internal::h86f505dc7de50d93 [inlined] std::panicking::try::do_call::h9c5b8eda90bbe0b7 at panicking.rs:379 [opt]
    frame #19: 0x0000000100071d3d luajit_test`std::rt::lang_start_internal::h86f505dc7de50d93 [inlined] std::panicking::try::he150bdff9d5b30f2 at panicking.rs:343 [opt]
    frame #20: 0x0000000100071d3d luajit_test`std::rt::lang_start_internal::h86f505dc7de50d93 [inlined] std::panic::catch_unwind::h04e9c415c0907892 at panic.rs:431 [opt]
    frame #21: 0x0000000100071d3d luajit_test`std::rt::lang_start_internal::h86f505dc7de50d93 at rt.rs:51 [opt]
    frame #22: 0x0000000100003e01 luajit_test`std::rt::lang_start::h27e980249989fdcb(main=(luajit_test`luajit_test::main::h82de456d29ced77f at main.rs:35), argc=1, argv=0x00007ffeefbff788) at rt.rs:65:5
    frame #23: 0x0000000100003fa2 luajit_test`main + 34
    frame #24: 0x00007fff2061b621 libdyld.dylib`start + 1
    frame #25: 0x00007fff2061b621 libdyld.dylib`start + 1

@khvzak
Copy link
Author

khvzak commented Mar 27, 2021

Seems I have time only till the 6th of May to solve it (when Rust 1.52 will be released with changed "C" unwinding compatibility).
LuaJIT support is an important mlua feature and it would be sad to require nightly compiler (to activate the c_unwind feature).
I see two ways to solve it:

  1. Quick and dirty by activating c_unwind feature and setting RUSTC_BOOTSTRAP=1 env from the build script. I don't really want to use this way.
  2. Solve it fundamentally by creating a C shim and wrap each Lua API function that can panic in a pcall inside C code. Plus a C callback between Lua and Rust to call lua_error on Rust behalf in case of an error.

@cratelyn
Copy link

👋 Hi! I've opened #84158, which aims to keep this change in behavior behind the c_unwind feature gate. Thank you for flagging this!

@BatmanAoD
Copy link
Member

To be clear, once c_unwind is no longer feature-gated, these functions will need their ABI strings changed from "C" to "C-unwind", and compiling with panic=abort will break this feature entirely.

@khvzak
Copy link
Author

khvzak commented Apr 13, 2021

I already addressed this issue in commit mlua-rs/mlua@ba375fa
Any C calls that can trigger exceptions (c++ or via longjmp) were moved to a C shim with call convention C->Rust.
It wasn't too scary but required many hours to implement :(

@camelid camelid added the regression-from-stable-to-beta Performance or correctness regression from stable to beta. label Apr 29, 2021
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Apr 29, 2021
@camelid camelid added the A-FFI Area: Foreign function interface (FFI) label Apr 29, 2021
bors added a commit to rust-lang-ci/rust that referenced this issue Apr 30, 2021
…t-of-84158, r=Mark-Simulacrum

backport: move new c abi abort behavior behind feature gate

This is a backport of PR rust-lang#84158 to the beta branch.

The original T-compiler plan was to revert PR rust-lang#76570 in its entirety, as was attempted in PR rust-lang#84672. But the revert did not go smoothly (details below).

Therefore, we are backporting PR rust-lang#84158 instead, which was our established backup plan if a revert did not go smoothly.

I have manually confirmed that this backport fixes the luajit issue described on issue rust-lang#83541

<details>

<summary>Click for details as to why revert of PR rust-lang#76570 did not go smoothly.</summary>

It turns out that Miri had been subsequently updated to reflect changes to `rustc_target` that landed in PR rust-lang#76570. This meant that the attempt to land PR rust-lang#84672 broke Miri builds.

Normally we allow tools to break when landing PR's (and just expect follow-up PR's to fix the tools), but we don't allow it for tools in the run-up to a release.

(We shouldn't be using that uniform policy for all tools. Miri should be allow to break during the week before a release; but currently we cannot express that, due to issue rust-lang#74709.)

Therefore, its a lot of pain to try to revert PR rust-lang#76570. And we're going with the backup plan.

</details>

Original commit message follows:

----

 *Background*

In rust-lang#76570, new ABI strings including `C-unwind` were introduced. Their behavior is specified in RFC 2945 <sup>[1]</sup>.

However, it was reported in the #ffi-unwind stream of the Rust community Zulip that this had altered the way that `extern "C"` functions behaved even when the `c_unwind` feature gate was not active. <sup>[2]</sup>

 *Overview*

 This makes a small patch to `rustc_mir_build::build::should_abort_on_panic`, so that the same  behavior from before is in place when the `c_unwind` gate is not active.

`rustc_middle::ty::layout::fn_can_unwind` is not touched, as the visible behavior should not differ before/after rust-lang#76570. <sup>[3]</sup>

 ### Footnotes

 1.: https://github.com/rust-lang/rfcs/blob/master/text/2945-c-unwind-abi.md
 2.: https://rust-lang.zulipchat.com/#narrow/stream/210922-project-ffi-unwind/topic/Is.20unwinding.20through.20extern.20C.20UB.3F/near/230112325
 3.: https://github.com/rust-lang/rust/pull/76570/files#diff-b0320c2b8868f325d83c027fc5d71732636e9763551e35895488f30fe057c6e9L2599-R2617

 [1]: https://github.com/rust-lang/rfcs/blob/master/text/2945-c-unwind-abi.md
 [2]: https://rust-lang.zulipchat.com/#narrow/stream/210922-project-ffi-unwind/topic/Is.20unwinding.20through.20extern.20C.20UB.3F/near/230112325
 [3]: https://github.com/rust-lang/rust/pull/76570/files#diff-b0320c2b8868f325d83c027fc5d71732636e9763551e35895488f30fe057c6e9L2599-R2617
@pietroalbini
Copy link
Member

Fixed on beta by #84722. Closing this.

@apiraino apiraino removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jan 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-FFI Area: Foreign function interface (FFI) regression-from-stable-to-beta Performance or correctness regression from stable to beta.
Projects
None yet
Development

No branches or pull requests

8 participants