[LLVM backend regression] RuntimeError: unreachable #9443

buu700 · 2019-09-17T18:06:21Z

I just had to rebuild sidh.js with latest-fastcomp to work around an error caused by latest-upstream.

With the following test case:

git clone https://github.com/cyph/sidh.js
cd sidh.js
make
node -e "require('./dist/sidh').keyPair().then(kp => console.log(kp))"

The make succeeds, but then I get the following unexpected run-time error:

RuntimeError: unreachable
RuntimeError: unreachable

/cyph/dist/sidh.js:1
var sidh=function(){var A,I,g,B,C={};function Q(A,I){if(0===A)return I;throw new Error("SIDH error: "+A)}function i(A,I){return new Uint8Array(new Uint8Array(C.HEAPU8.buffer,A,I))}function E(A){try{C._free(A)}catch(A){setTimeout((function(){throw A}),0)}}C.ready=new Promise((function(A,I){(B={}).onAbort=I,B.onRuntimeInitialized=function(){try{B._sidhjs_public_key_bytes(),A(B)}catch(A){I(A)}};var g,B=void 0!==B?B:{},C={};for(g in B)B.hasOwnProperty(g)&&(C[g]=B[g]);var Q=[],i=!1,E=!1,o=!1,a=!1,G=!1;i="object"==typeof window,E="function"==typeof importScripts,a="object"==typeof process&&"object"==typeof process.versions&&"string"==typeof process.versions.node,o=a&&!i&&!E,G=!i&&!o&&!E;var n,r,c,t,D="";function e(A){return B.locateFile?B.locateFile(A,D):D+A}o?(D=__dirname+"/",n=function(A,I){var g;return(g=YA(A))||(c||(c=eval("require")("fs")),t||(t=eval("require")("path")),A=t.normalize(A),g=c.readFileSync(A)),I?g:g.toString()},r=function(A){var I=n(A,!0);return I.buffer||(I=new Uint8Array(
abort(RuntimeError: unreachable). Build with -s ASSERTIONS=1 for more info.

This problem goes away after running emsdk install latest-fastcomp ; emsdk activate latest-fastcomp ; make.

The text was updated successfully, but these errors were encountered:

…n#9443

kripken · 2019-09-17T20:09:38Z

A full stack trace (in a build with names, so e.g. --profiling) might help. Also good would be to bisect to find where this broke.

TrevorSundberg · 2020-02-15T01:10:12Z

This is potentially unrelated, however I also received the RuntimeError: unreachable and it turned out it was because a missing return statement when the function returned a bool. Hope this helps some poor soul in the future. Also don't ignore warnings!

davlhd · 2020-03-03T15:52:41Z

I am experiencing the same problem. I wanted to switch to the new backend but it tells me

011245ce-331:8 Uncaught (in promise) RuntimeError: unreachable.
at wasm-function[331]:0x13538
at wasm-function[11511]:0x2356e3
at n.dynCall_iiiiiii (https://xxx35:156437)
at invoke_iiiiiii (https://xxs:35:102325)
at wasm-function[14939]:0x3ae4f3
at wasm-function[11508]:0x235501
at n.dynCall_iiiiiiiii (https://xxxs:35:156969)
at invoke_iiiiiiiii (https://xxx:35:102612)
at wasm-function[14865]:0x3ab14e

kripken · 2020-03-04T21:40:23Z

@davlhd building with --profiling will make the stack trace have function names.

Once you find the function this happens in, it might be the same issue @TrevorSundberg said - LLVM will emit unreachables in places it thinks control flow can't reach, so if you forget a return, stuff like that.

davlhd · 2020-06-04T09:35:34Z

That did no really help:

00ceb43a:formatted:403 Uncaught (in promise) RuntimeError: unreachable
    at <anonymous>:wasm-function[285]:0xf851
    at dynCall_iiiiiii (<anonymous>:wasm-function[7519]:0x170aa2)
    at i.dynCall_iiiiiii (https://xxxx:35:140187)
    at invoke_iiiiiii (https://xxxx35:116645)
    at <anonymous>:wasm-function[9936]:0x2be4fd
    at dynCall_iiiiiiiii (<anonymous>:wasm-function[7517]:0x170a78)
    at i.dynCall_iiiiiiiii (https://xxxx:35:140719)
    at invoke_iiiiiiiii (https://xxxx:35:116932)
    at <anonymous>:wasm-function[9087]:0x24ae07
    at dynCall_iiiiii (<anonymous>:wasm-function[7520]:0x170ab4)

kripken · 2020-06-04T14:28:03Z

@davlhd is there a chance the --profiling flag was not applied at the link stage? Or that that's an old build? (It would be a very serious unknown bug if that didn't work.)

davlhd · 2020-06-24T08:39:05Z

With "latest" this error disappeared. It now compiles and runs fine

DoDoENT · 2021-03-19T14:04:55Z

It appears that this has reappeared with emscripten 2.0.15. With adding --profiling to the link flags, I got the following stack trace in my project (the stack trace points to the Normalize function within google test:

SweatShopTest.wasm:0x187f9 Uncaught RuntimeError: unreachable
    at testing::internal::FilePath::Normalize() (http://localhost:6931/SweatShopTest.wasm:wasm-function[683]:0x187f9)
    at testing::internal::FilePath::FilePath(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> > const&) (http://localhost:6931/SweatShopTest.wasm:wasm-function[86]:0x25b3)
    at testing::internal::FilePath::GetCurrentDir() (http://localhost:6931/SweatShopTest.wasm:wasm-function[684]:0x18843)
    at testing::internal::UnitTestImpl::AddTestInfo(void (*)(), void (*)(), testing::TestInfo*) (http://localhost:6931/SweatShopTest.wasm:wasm-function[619]:0x1653f)
    at testing::internal::MakeAndRegisterTestInfo(char const*, char const*, char const*, char const*, testing::internal::CodeLocation, void const*, void (*)(), void (*)(), testing::internal::TestFactoryBase*) (http://localhost:6931/SweatShopTest.wasm:wasm-function[103]:0x2cad)
    at __cxx_global_var_init (http://localhost:6931/SweatShopTest.wasm:wasm-function[1158]:0x2b58f)
    at __wasm_call_ctors (http://localhost:6931/SweatShopTest.wasm:wasm-function[1117]:0x29973)
    at Module.___wasm_call_ctors (http://localhost:6931/SweatShopTest.js:3868:82)
    at func (http://localhost:6931/SweatShopTest.js:378:3)
    at callRuntimeCallbacks (http://localhost:6931/SweatShopTest.js:622:4)

Downgrading to emscripten 2.0.14 resolves the issue. Also, if I compile with -g (to emit debug symbols) or with threads enabled, the problem also goes away 🤔.

Any clue what could possibly go wrong?

kripken · 2021-03-19T15:51:53Z

Hard to say. Quickest diagnosis is probably bisection, https://emscripten.org/docs/contributing/developers_guide.html#bisecting

DoDoENT · 2021-03-20T10:49:42Z

I can't do the bisection as my machines are too slow for compiling the LLVM, however, I've managed to create a reproducible case with only open-source components - it's available in this repository.

To reproduce the issue:

clone this repository
initialize all its submodules with git submodule update --init
initialize your shell with emscripten 2.0.15
run emcmake cmake -GNinja /path/to/the/repository
build with ninja
run emrun --browser chrome ./SweatShop.html
application crashes with Uncaught RuntimeError: unreachable

If you do the same with emscripten 2.0.14, the application will work correctly.

Also, if you replace -flto with -flto=thin in the CMakeLists.txt, the application will also work with 2.0.15 (but I need the full LTO for my project).

Unlike in my project, adding -g here does not make the issue go away. Instead, it prints a nice stack trace:

Uncaught RuntimeError: unreachable
    at operator delete(void*) (http://localhost:6931/SweatShopTest.wasm:wasm-function[83]:0x46d1)
    at std::__2::_DeallocateCaller::__do_call(void*) (http://localhost:6931/SweatShopTest.wasm:wasm-function[140]:0x5565)
    at std::__2::_DeallocateCaller::__do_deallocate_handle_size(void*, unsigned long) (http://localhost:6931/SweatShopTest.wasm:wasm-function[138]:0x5516)
    at std::__2::_DeallocateCaller::__do_deallocate_handle_size_align(void*, unsigned long, unsigned long) (http://localhost:6931/SweatShopTest.wasm:wasm-function[135]:0x5497)
    at std::__2::__libcpp_deallocate(void*, unsigned long, unsigned long) (http://localhost:6931/SweatShopTest.wasm:wasm-function[134]:0x542f)
    at std::__2::allocator<char>::deallocate(char*, unsigned long) (http://localhost:6931/SweatShopTest.wasm:wasm-function[1167]:0x15b56)
    at std::__2::allocator_traits<std::__2::allocator<char> >::deallocate(std::__2::allocator<char>&, char*, unsigned long) (http://localhost:6931/SweatShopTest.wasm:wasm-function[1165]:0x15aea)
    at std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::~basic_string() (http://localhost:6931/SweatShopTest.wasm:wasm-function[72]:0x44c0)
    at testing::internal::CodeLocation::~CodeLocation() (http://localhost:6931/SweatShopTest.wasm:wasm-function[71]:0x4498)
    at testing::internal::MakeAndRegisterTestInfo(char const*, char const*, char const*, char const*, testing::internal::CodeLocation, void const*, void (*)(), void (*)(), testing::internal::TestFactoryBase*) (http://localhost:6931/SweatShopTest.wasm:wasm-function[70]:0x4442)

I hope this helps you with tracking down this bug.

kripken · 2021-03-20T15:41:44Z

@DoDoENT The instructions in that link do not require compiling. It downloads precompiled builds using the emsdk.

Sorry if they are not clear enough about that - we should improve them. @sbc100 also has had some recent improvements to the process which we meant to document (the emsdk now has a flag to install a build by the hash, which is slightly easier than editing the text file) but I don't think we have yet.

sbc100 · 2021-03-22T17:19:55Z

You might also want to try running with asan and/or ubsan.

DirtyHairy · 2021-03-22T20:48:26Z

Count me in. If I compile https://github.com/cloudpilot-emu/cloudpilot with 2.0.15 I get the follow trace at runtime:

    at std::__2::__tree<std::__2::__value_type<ChunkType, Chunk>, std::__2::__map_value_compare<ChunkType, std::__2::__value_type<ChunkType, Chunk>, std::__2::less<ChunkType>, true>, std::__2::allocator<std::__2::__value_type<ChunkType, Chunk> > >::destroy(std::__2::__tree_node<std::__2::__value_type<ChunkType, Chunk>, void*>*) (:4200/cloudpilot_web.wasm:wasm-function[2176]:0x611d1)
    at std::__2::__tree<std::__2::__value_type<ChunkType, Chunk>, std::__2::__map_value_compare<ChunkType, std::__2::__value_type<ChunkType, Chunk>, std::__2::less<ChunkType>, true>, std::__2::allocator<std::__2::__value_type<ChunkType, Chunk> > >::destroy(std::__2::__tree_node<std::__2::__value_type<ChunkType, Chunk>, void*>*) (:4200/cloudpilot_web.wasm:wasm-function[2176]:0x611c6)
    at emscripten_bind_Cloudpilot_LoadState_2 (:4200/cloudpilot_web.wasm:wasm-function[2365]:0x75cbc)
    at push.O5fm.Module._emscripten_bind_Cloudpilot_LoadState_2 (cloudpilot_web.js:4654)
    at Cloudpilot.push.O5fm.Cloudpilot.LoadState.Cloudpilot.LoadState (cloudpilot_web.js:5394)
    at Cloudpilot.loadState (Cloudpilot.ts:265)

The same code runs fine with 2.0.14 and also natively with ASAN and UBSAN enabled. It is worth mentioning that issue only appears if I build with -flto, without LTO the code runs fine. Seems like LTO erroneously removes code generated for std::map and replaces it with unreachable. Unfortunately the readme is not up to date, but I can provide instructions for reproducing the issue.

Coincidentally (?) I also get link time errors on 2.0.15 with LLD_REPORT_UNDEFINED:

wasm-ld: error: symbol exported via --export not found: __start_em_asm
wasm-ld: error: symbol exported via --export not found: __stop_em_asm

I don't have time for bisecting now, but I can do so later this week.

sbc100 · 2021-03-22T21:29:52Z

Thanks for the report @DirtyHairy. The LTO issue does look concerning, and we will probably need to bisect it down. I imagine its an llvm-side change.

The symbol exported via --export not found: __start_em_asm issue with LLD_REPORT_UNDEFINED I am aware of and thinking about how to work around that.

sbc100 · 2021-03-22T21:33:18Z

opened #13736 for the LLD_REPORT_UNDEFINED issue.

DirtyHairy · 2021-03-23T07:54:33Z

Thanks alot @sbc100 ! I'll try to find time to bisect the issue over the next few days

DirtyHairy · 2021-03-27T12:13:08Z

I have bisected the issue; the problematic commit to emscripten-releases is 7882b2e11c66da69b832a6ad7baa57c8ccbf9656

Indeed this is a LLVM change; unfortunately, the commit pulls in sizable number of changesets. I hope this helps nevertheless.

kripken · 2021-03-29T17:00:40Z

Thanks! Yes, looks like there are 94 LLVM commits there:

https://chromium.googlesource.com/emscripten-releases/+/7882b2e11c66da69b832a6ad7baa57c8ccbf9656

That will require someone to bisect locally on LLVM I think, unless one of the commits there looks relevant to the LLVM devs here.

DirtyHairy · 2021-03-29T17:58:00Z

Well, this might take a few days, but I am not using my Macbook much during the day atm, so I think I can take the ring to mount doom 😏

sbc100 · 2021-03-29T18:09:28Z

As well as bisecting to find the culprit commit, it might also be useful to reduce the repro case to something smaller that might be give us more clues.

DirtyHairy · 2021-03-29T18:19:50Z

As well as bisecting to find the culprit commit, it might also be useful to reduce the repro case to something smaller that might be give us more clues.

The affected code is pretty much isolated, so I think this might not be too hard. I'll take a stab at it after bisecting.

DirtyHairy · 2021-03-29T22:02:35Z

Well, that went down a lot faster than I thought. The faulty commit is:

llvm/llvm-project@8666463

Seems like we indeed need a reduced testcase. I'll see what I can do.

DirtyHairy · 2021-04-02T22:14:45Z

I've cooked up a reduced testcase. You can find the code here: https://github.com/DirtyHairy/emcc-lto-testcase

Simply build with make (with 2.0.15) and run node testcase.js in order to trigger the bug:

pestix@bladehenge-2:~/git/emcc-lto-testcase$ node testcase.js
exception thrown: RuntimeError: unreachable,RuntimeError: unreachable
    at std::__2::__tree<std::__2::__value_type<ChunkType, ChunkProbe>, std::__2::__map_value_compare<ChunkType, std::__2::__value_type<ChunkType, ChunkProbe>, std::__2::less<ChunkType>, true>, std::__2::allocator<std::__2::__value_type<ChunkType, ChunkProbe> > >::destroy(std::__2::__tree_node<std::__2::__value_type<ChunkType, ChunkProbe>, void*>*) (<anonymous>:wasm-function[10]:0x8df)
    at __original_main (<anonymous>:wasm-function[8]:0x864)
    at main (<anonymous>:wasm-function[20]:0x278b)
    at Module._main (/Users/pestix/git/emcc-lto-testcase/testcase.js:1555:60)
    at callMain (/Users/pestix/git/emcc-lto-testcase/testcase.js:1615:15)
    at doRun (/Users/pestix/git/emcc-lto-testcase/testcase.js:1675:23)
    at run (/Users/pestix/git/emcc-lto-testcase/testcase.js:1690:5)
    at runCaller (/Users/pestix/git/emcc-lto-testcase/testcase.js:1602:19)
    at removeRunDependency (/Users/pestix/git/emcc-lto-testcase/testcase.js:1225:7)
    at receiveInstance (/Users/pestix/git/emcc-lto-testcase/testcase.js:1365:5)

The code runs fine with 2.0.14 or if -flto is omitted during link.

From the trace I suspect that the deleted code is in the destructor of the std::map<ChunkType, ChunkProbe> allocated in SavestateProbe.cxx (an instance of which is allocated on the stack in Savestate::AllocateBuffer.

DirtyHairy · 2021-04-08T19:15:55Z

Is there anything else that I can do to help addressing this issue?

kripken · 2021-04-09T14:54:34Z

@DirtyHairy Please file an LLVM bug (and link to it here). As the commit you reduced to says, it sounds like they expected fallout from it. Attaching your testcase will hopefully be enough for them to investigate and/or decide whether to revert it.

kripken · 2021-04-09T14:58:00Z

(Actually maybe it's faster to cc the author of that commit directly?)

@preames , the last few comments have a reduced testcase and some investigation of a regression that was bisected down to your commit llvm/llvm-project@8666463

preames · 2021-04-09T23:33:18Z

Happy to try and help here, but I really don't have much context on emscripten or node. So working from a test case which involves those pieces would be a bit challenging. (For instance, it's not clear to me whether your reduced test case was using ToT clang for the make step, or some other compiler?)

If you want to help narrow this down, what would be really helpful is knowing what the IR looks like at the point the nofree transform actually happens. A simple way to get that would be to do a local build of clang w/the transform replaced with a bit of code which simply printed the current module and then crashed. (If it triggers multiple times, you might need to log and continue.)

Once we had that, we could manually inspect the IR to see if the transform is legal. If it isn't, we revert my patch. If it is, we then have something to trace back through the print-after-all output to figure out where the wrong inference fact came from.

As an aside, I just ran across an issue today (see the llvm-dev thread "Ambiguity in the nofree function attribute") around the semantics of the nofree function attribute. If you were using the allocsize attribute, then maybe that might explain a miscompile. That's really a long shot guess though, nothing more.

kripken · 2021-04-09T23:54:27Z

Thanks for the quick response @preames !

Maybe another possibility is to convert the testcase to one that runs in ToT clang? It might be as simple as replacing emcc with ToT clang, and removing the .js suffix from the output.

I tried to do that quickly now, and I'm hitting an unrelated LLVM configuration issue I can't immediately figure out,

/usr/bin/ld: ../lib/LLVMgold.so: error loading plugin: ../lib/LLVMgold.so: cannot open shared object file: No such file or directory

But if you have a working local build of LLVM ToT then it might just work for you.

DirtyHairy · 2021-04-10T09:12:39Z

Thanks for the response, @preames / @kripken !

@preames I am not sure whether I understand top-of-tree correctly. Fwiw, I can reproduce the issue with the latest revision of LLVM main, including clang.

I have just checked and cannot reproduce the bug if I build native binaries. However, as even with emscripten the bug only appears if I enable LTO at link I think that the bug may be tied to a transformation that happens during LTO when linking the emscripten libc.

It seems that emscripten links to a libc that has been built with LTO enabled if I enable LTO during link? If this is the case, then this might be the decisive difference between emscripten and the native build (I don't think this happens with libc from the XCode SDK). Is there a way to disable this (use -flto objects, but not -flto libc) for testing purposes?

@preames If you can give me a few pointers of where to instrument which file I can try to produce the IR dumps you described.

DirtyHairy · 2021-04-15T18:44:34Z

Yeah, sorry, my bad, I just noticed that the sample above runs fine with -O2 and -O3 (and crashes only with -O1). It seems that the allocation and deallocation is optimized out. The following crashes at all levels > 0:

#include <iostream>

using namespace std;

struct Foo {
    virtual ~Foo() { cout << "destructor called" << endl << flush; }
};

int main() {
    auto foo = new Foo();

    cout << &foo << endl << flush;

    delete foo;
}

kripken · 2021-04-15T18:47:40Z

I think the simplest testcase here that I can reproduce locally is this one:

// emcc -flto -O1
#include <map>
int main() {
  std::map<int, int> foo;
  foo[0] = 2;
}

It does seem like the cause is LTO with libc++, as that is shared in all the testcases.

I'd suggest making a native example of this for @preames by making a git repo with the cpp file plus a copy of libc++ from the emscripten tree (it's under system/lib/libcxx). Then all the libc++ inputs will be "in userspace", like for emcc, and no system c++ libraries will be linked. I'd expect the bug to reproduce that way.

DirtyHairy · 2021-04-15T19:10:26Z

Sorry, now I don't fully understand: how would I do a native build from this? I suppose I have to build libc++ first?

kripken · 2021-04-15T19:36:13Z

Yes, but it's very simple: you will likely need very few of the libc++ .cpp files (since it's almost all in headers). You may even be able to get away with not running any configure/cmake step, and just adding them.

In more detail, what you need to do is build with -nostdinc++, so that the system C++ standard library is not used, and add an -I for the place with the copy of libc++ so that it is used. And find the right .cpp files from that copy of libc++ and just include them as more inputs, alongside your .cpp file. That way all the C++ standard library support arrives "from userspace", and that means the compiler can LTO it. (Also, doing it this way will ensure you have the same C++ standard library as emcc does.)

Another option could be to copy code from the libc++ sources into your testcase (edit: and keep copying more until the bug reproduces).

DirtyHairy · 2021-04-15T19:42:58Z

Thanks 😏 I have put up a version of the testcase with a Makefile that does a native build in my previous testcase repo https://github.com/DirtyHairy/emcc-lto-testcase (I've moved the old testcase to a branch). Unfortunately, I cannot reproduce the crash with my LLVM build (the ToT build from last week that triggers the crash when used with emscripten).

Can you spot any error in my build process? To my understanding it should not include the system libc++ headers or link agains the system libc++

preames · 2021-04-16T18:36:54Z

Given how much this has been reduced, I might be able to make progress with something as simple as a -print-after-all log. If you want to capture that from your reproducer - I can even work with one not from clean upstream - I can see if anything obvious falls out.

DirtyHairy · 2021-04-16T20:02:53Z

Hi @preames !

I have attached a -print-after-all log of the failing std::map testcase (ir.bad.log) produced with emscripten 2.0.16 and a reference log(ir.good.log) with emscripten 2.0.14.

I can do builds from specific LLVM revisions if that helps you.

Thanks again for your support.

ir.bad.log
ir.good.log

If we have a nobuiltin function, we can't assume we know anything about the implementation. I noticed this when tracing through a log from an in the wild miscompile (emscripten-core/emscripten#9443) triggered after 8666463. We were incorrectly assuming that a custom allocator could not free. (It's not clear yet this is the only problem in said issue.) I also noticed something similiar mentioned in the commit message of ab243e when scrolling back through history. Through, from what I can tell, that commit fixed symptom not root cause. The interface we have for library function detection is extremely error prone, but given the interaction between ``nobuiltin`` decls and ``builtin`` callsites, it's really hard to imagine something much cleaner. I may iterate on that, but it'll be invasive enough I didn't want to hold an obvious functional fix on it.

preames · 2021-04-17T02:21:39Z

Digging through the log, I noticed one case where we'd incorrectly annotated a nobuiltin function. This isn't directly related to the triggering change, but it's easy to believe that change exposed the existing issue. There might also be another latent bug here.

Can I ask for a rerun with commit 11707435ccb44a9377bfed407453e0646a159636 applied? If it still reproduces, another -print-after-all with the additional flag -print-module-scope added (primarily for the bad run) would be helpful.

DirtyHairy · 2021-04-17T12:20:54Z

Partial success 😏 With LLVM built from 11707435ccb44a9377bfed407453e0646a159636 the testcase now builds and runs fine with -O1. However, it still crashes at -O2 and -O3. I have attached two logs at -O2, one with -print-module-scope, and one without it. Beware though, the -print-module-scope one decompresses to ~800MB.

Sorry for the double compression, but the log is too large with gzip, and github will not let me upload bzip2 directly.

ir.bad.log.bz2.gz
ir.bad.print-module-scope.log.bz2.gz

preames · 2021-04-19T18:27:15Z

Haven't had a chance to dig through the second set of logs yet, but I think I have a good guess as to the problem. It turns out to be a bug in my original patch w.r.t. interaction between null and the nofree specification. I've got a candidate fix up for review at https://reviews.llvm.org/D100779, and would appreciate knowing if that resolved the remaining problems here. (It may very well be we have a third issue to track down.)

DirtyHairy · 2021-04-20T07:05:10Z

I have checked, but 2218f5998b5b with the above patch applied still crashes (on the std::map). Fwiw there is some variance in behaviour between emscripten (with LLVM built from source) 2.0.16 and 2.0.14 --- with 2.0.16, the std::map testcase dies for all optimization levels while it runs fine with -O1. With the virtual destructor testcase, the crash reproduces with -O1 on 2.0.16 (and LLVM from source) --- 2.0.14 works fine, as does 2.0.16 with -O2 and -O3 😛

@preames Ping me if you need a more recent log.

preames · 2021-04-21T18:13:18Z

I think I'm going to need a more current log. I've dug through this one, and don't see any direct evidence of the transform triggering or any further obvious miscompiles.

Can I ask you to take two additional steps before sending a log? First, can you comment out the problematic transform entirely and just make sure everything passes? I'm getting suspicious that there might be multiple interacting issues here. Second, could you add code to print the function, callsite, and argument when the transform triggers? If needed, I can post a patch that does that. I think I need to see what this is actually triggering on to make progress here.

DirtyHairy · 2021-04-21T21:18:42Z

Hi @preames !

Sure; however, I think I need a few additional pointers to produce what you want. Do you have a specific transform in mind for me to disable, or do you want to go through them until we have identified the problematic one(s)? Where in LLVM should I look for the place to disable them?

This change effectively reverts 8666463, but since there have been some changes on top and I wanted to leave the tests in, it's not a mechanical revert. Why revert this now? Two main reasons: 1) There are continuing discussion around what the semantics of nofree. I am getting increasing uncomfortable with the seeming possibility we might redefine nofree in a way incompatible with these changes. 2) There was a reported miscompile triggered by this change (emscripten-core/emscripten#9443). At first, I was making good progress on tracking down the issues exposed and those issues appeared to be unrelated latent bugs. Now that we've found at least one bug in the original change, and the investigation has stalled, I'm no longer comfortable leaving this in tree. In retrospect, I probably should have reverted this earlier and investigated the issues once the triggering change was out of tree.

preames · 2021-04-22T17:55:44Z

I have gone ahead and reverted the triggering change in commit 15e19a25 since we don't appear to be making rapid progress any more. At the current moment, I don't have any evidence the (fixed) change was actually buggy any more, but I also am out of time to chase down the remaining miscompile. I may return to this in the future, but wanted to not leave things broken in the meantime.

DirtyHairy · 2021-04-22T19:01:03Z

Thanks alot @preames . To be 100% sure I just did a test build; the testcases run fine again. Sorry we couldn't drill down into the underlying issue. If want to revisit this at some later point feel free to ping me.

If we have a nobuiltin function, we can't assume we know anything about the implementation. I noticed this when tracing through a log from an in the wild miscompile (emscripten-core/emscripten#9443) triggered after 8666463. We were incorrectly assuming that a custom allocator could not free. (It's not clear yet this is the only problem in said issue.) I also noticed something similiar mentioned in the commit message of ab243e when scrolling back through history. Through, from what I can tell, that commit fixed symptom not root cause. The interface we have for library function detection is extremely error prone, but given the interaction between ``nobuiltin`` decls and ``builtin`` callsites, it's really hard to imagine something much cleaner. I may iterate on that, but it'll be invasive enough I didn't want to hold an obvious functional fix on it.

This change effectively reverts 8666463, but since there have been some changes on top and I wanted to leave the tests in, it's not a mechanical revert. Why revert this now? Two main reasons: 1) There are continuing discussion around what the semantics of nofree. I am getting increasing uncomfortable with the seeming possibility we might redefine nofree in a way incompatible with these changes. 2) There was a reported miscompile triggered by this change (emscripten-core/emscripten#9443). At first, I was making good progress on tracking down the issues exposed and those issues appeared to be unrelated latent bugs. Now that we've found at least one bug in the original change, and the investigation has stalled, I'm no longer comfortable leaving this in tree. In retrospect, I probably should have reverted this earlier and investigated the issues once the triggering change was out of tree.

paulocoutinhox · 2021-11-06T00:43:24Z

Hi,

I have the same problem here:
paulocoutinhox/pdfium-lib#54

There is any solution?

paulocoutinhox · 2021-11-06T00:50:11Z

My stacktrace:

index.js:77 Uncaught RuntimeError: unreachable
    at pdfium::base::AllocPages(void*, unsigned long, unsigned long, pdfium::base::PageAccessibilityConfiguration, pdfium::base::PageTag, bool) (index.wasm:0x71085)
    at pdfium::base::internal::PartitionBucket::SlowPathAlloc(pdfium::base::internal::PartitionRootBase*, int, unsigned long, bool*) (index.wasm:0x72900)
    at fxcrt::StringDataTemplate<char>::Create(unsigned long) (index.wasm:0x5628)
    at fxcrt::StringDataTemplate<char>::Create(char const*, unsigned long) (index.wasm:0x5695)
    at fxcrt::ByteString::ByteString(char const*) (index.wasm:0x886e)
    at CLinuxPlatform::CreateDefaultSystemFontInfo() (index.wasm:0xf501)
    at CFX_GEModule::Create(char const**) (index.wasm:0x13666)
    at FPDF_InitLibraryWithConfig (index.wasm:0x70a9e)
    at main (index.wasm:0x367b)
    at index.js:1591
$pdfium::base::AllocPages(void*, unsigned long, unsigned long, pdfium::base::PageAccessibilityConfiguration, pdfium::base::PageTag, bool) @ index.wasm:0x71085
$pdfium::base::internal::PartitionBucket::SlowPathAlloc(pdfium::base::internal::PartitionRootBase*, int, unsigned long, bool*) @ index.wasm:0x72900
$fxcrt::StringDataTemplate<char>::Create(unsigned long) @ index.wasm:0x5628
$fxcrt::StringDataTemplate<char>::Create(char const*, unsigned long) @ index.wasm:0x5695
$fxcrt::ByteString::ByteString(char const*) @ index.wasm:0x886e
$CLinuxPlatform::CreateDefaultSystemFontInfo() @ index.wasm:0xf501
$CFX_GEModule::Create(char const**) @ index.wasm:0x13666
$FPDF_InitLibraryWithConfig @ index.wasm:0x70a9e
$main @ index.wasm:0x367b
(anonymous) @ index.js:1591
callMain @ index.js:5933
doRun @ index.js:5990
(anonymous) @ index.js:6001
setTimeout (async)
run @ index.js:5997
runCaller @ index.js:5911
removeRunDependency @ index.js:1520
receiveInstance @ index.js:1680
receiveInstantiationResult @ index.js:1697
Promise.then (async)
(anonymous) @ index.js:1726
Promise.then (async)
instantiateAsync @ index.js:1723
createWasm @ index.js:1754
(anonymous) @ index.js:5465

paulocoutinhox · 2021-11-28T17:50:01Z

I tried version 3.0.0 and the problem still :(

buu700 · 2022-07-11T20:44:47Z

I ended up solving the problem in my project. It turned out that two different C files were defining slightly different versions of a particular function (one was void while the other returned an int status code).

The original fastcomp implementation apparently worked around the problem, while the LLVM backend for whatever reason handles it less gracefully. This may or may not have been the case at the time I opened the issue, but I do see now that there's a wasm-ld: warning: function signature mismatch message that gets printed when the problem is present.

Would it make sense for cases like this to be treated as errors rather than warnings? (Although I do have ERROR_ON_UNDEFINED_SYMBOLS enabled, which may or may not affect the behavior here.)

@kripken feel free to close this ticket unless it's still helpful to keep open.

kripken · 2022-07-12T15:58:42Z

@buu700 Thanks for the update. Interesting, sounds like this might be undefined behavior then, which the LLVM backend optimizes in more dangerous ways than fastcomp did.

Yes, let's close this as there isn't anything specific to fix at this time. If we can get specific reproducers for the other problems mentioned throughout this issue let's file those separately.

About wasm-ld: warning: function signature mismatch, I hope that would be an error if you link with -Werror to turn warnings to errors. @sbc100 does that flag apply to wasm-ld warnings too?

sbc100 · 2022-07-12T16:03:40Z

Regarding wasm-ld warnings, yes you can do -Wl,--fatal-warnings.

If we have a nobuiltin function, we can't assume we know anything about the implementation. I noticed this when tracing through a log from an in the wild miscompile (emscripten-core/emscripten#9443) triggered after 8666463. We were incorrectly assuming that a custom allocator could not free. (It's not clear yet this is the only problem in said issue.) I also noticed something similiar mentioned in the commit message of ab243e when scrolling back through history. Through, from what I can tell, that commit fixed symptom not root cause. The interface we have for library function detection is extremely error prone, but given the interaction between ``nobuiltin`` decls and ``builtin`` callsites, it's really hard to imagine something much cleaner. I may iterate on that, but it'll be invasive enough I didn't want to hold an obvious functional fix on it.

This change effectively reverts 86664638, but since there have been some changes on top and I wanted to leave the tests in, it's not a mechanical revert. Why revert this now? Two main reasons: 1) There are continuing discussion around what the semantics of nofree. I am getting increasing uncomfortable with the seeming possibility we might redefine nofree in a way incompatible with these changes. 2) There was a reported miscompile triggered by this change (emscripten-core/emscripten#9443). At first, I was making good progress on tracking down the issues exposed and those issues appeared to be unrelated latent bugs. Now that we've found at least one bug in the original change, and the investigation has stalled, I'm no longer comfortable leaving this in tree. In retrospect, I probably should have reverted this earlier and investigated the issues once the triggering change was out of tree.

buu700 added a commit to cyph/sidh.js that referenced this issue Sep 17, 2019

rebuild with latest-fastcomp to work around emscripten-core/emscripte…

c7f9af4

…n#9443

CetinSert mentioned this issue Nov 6, 2021

Get error "Uncaught RuntimeError: unreachable" when initialize paulocoutinhox/pdfium-lib#54

Closed

DoDoENT mentioned this issue Nov 6, 2021

Unable to use ThinLTO #15427

Open

kripken closed this as completed Jul 12, 2022

[LLVM backend regression] RuntimeError: unreachable #9443

[LLVM backend regression] RuntimeError: unreachable #9443

Comments

buu700 commented Sep 17, 2019

kripken commented Sep 17, 2019

TrevorSundberg commented Feb 15, 2020 • edited Loading

davlhd commented Mar 3, 2020

kripken commented Mar 4, 2020

davlhd commented Jun 4, 2020

kripken commented Jun 4, 2020

davlhd commented Jun 24, 2020

DoDoENT commented Mar 19, 2021

kripken commented Mar 19, 2021

DoDoENT commented Mar 20, 2021

kripken commented Mar 20, 2021

sbc100 commented Mar 22, 2021

DirtyHairy commented Mar 22, 2021 • edited Loading

sbc100 commented Mar 22, 2021

sbc100 commented Mar 22, 2021

DirtyHairy commented Mar 23, 2021

DirtyHairy commented Mar 27, 2021

kripken commented Mar 29, 2021

DirtyHairy commented Mar 29, 2021 • edited Loading

sbc100 commented Mar 29, 2021

DirtyHairy commented Mar 29, 2021

DirtyHairy commented Mar 29, 2021

DirtyHairy commented Apr 2, 2021 • edited Loading

DirtyHairy commented Apr 8, 2021

kripken commented Apr 9, 2021 • edited Loading

kripken commented Apr 9, 2021

preames commented Apr 9, 2021

kripken commented Apr 9, 2021

DirtyHairy commented Apr 10, 2021

DirtyHairy commented Apr 15, 2021 • edited Loading

kripken commented Apr 15, 2021

DirtyHairy commented Apr 15, 2021

kripken commented Apr 15, 2021 • edited Loading

DirtyHairy commented Apr 15, 2021 • edited Loading

preames commented Apr 16, 2021

DirtyHairy commented Apr 16, 2021

preames commented Apr 17, 2021

DirtyHairy commented Apr 17, 2021

preames commented Apr 19, 2021

DirtyHairy commented Apr 20, 2021 • edited Loading

preames commented Apr 21, 2021

DirtyHairy commented Apr 21, 2021

preames commented Apr 22, 2021

DirtyHairy commented Apr 22, 2021

paulocoutinhox commented Nov 6, 2021

paulocoutinhox commented Nov 6, 2021

paulocoutinhox commented Nov 28, 2021

buu700 commented Jul 11, 2022

kripken commented Jul 12, 2022

sbc100 commented Jul 12, 2022

TrevorSundberg commented Feb 15, 2020 •

edited

Loading

DirtyHairy commented Mar 22, 2021 •

edited

Loading

DirtyHairy commented Mar 29, 2021 •

edited

Loading

DirtyHairy commented Apr 2, 2021 •

edited

Loading

kripken commented Apr 9, 2021 •

edited

Loading

DirtyHairy commented Apr 15, 2021 •

edited

Loading

kripken commented Apr 15, 2021 •

edited

Loading

DirtyHairy commented Apr 15, 2021 •

edited

Loading

DirtyHairy commented Apr 20, 2021 •

edited

Loading