Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emscripten calls null GOT entry when side module loaded with allowUndefined: true #22052

Open
hoodmane opened this issue Jun 4, 2024 · 16 comments · May be fixed by #22053
Open

Emscripten calls null GOT entry when side module loaded with allowUndefined: true #22052

hoodmane opened this issue Jun 4, 2024 · 16 comments · May be fixed by #22053

Comments

@hoodmane
Copy link
Collaborator

hoodmane commented Jun 4, 2024

The following code example looks up GOT.__cxa_throw which is 0 and then calls it which throws TypeError: getWasmTableEntry(...) is not a function. Calling reportUndefinedSymbols or removing allowUndefined: true resolves the crash.

module.cpp

#include <stdexcept>

extern "C"
int cpp_func() { 
    try { 
        throw std::runtime_error(""); // crashes trying to throw this
    } catch (const std::exception &e) {
        return 0;
    }
    return 1;
} 

main.c

#include <dlfcn.h>
#include "stdio.h"
#include "emscripten.h"

typedef int (*InitFunc)(void);

int main(void) {
    EM_ASM({
        loadDynamicLibrary("module.so", { allowUndefined: true });
    });

    void* handle = dlopen("module.so", RTLD_NOW);
    printf("handle: %p\n", handle);
    if (!handle) {
        return 1;
    }
    InitFunc init_func = (InitFunc)dlsym(handle, "cpp_func");
    printf("init_func: %p\n", init_func);
    if (!init_func) {
        return 1;
    }
    int res = init_func(); // Crashes in init_func
    printf("res: %d\n", res);
    return 0;
}

build.sh

em++ -c module.cpp -fPIC -fexceptions -g2 -O2 
em++ module.o -o module.so \
    -sWASM_BIGINT -sSIDE_MODULE=1 -fexceptions -g2 -O2

emcc -c main.c -fPIC -fexceptions -g2 -O2 
emcc main.o -o main.js \
    -sWASM_BIGINT -sMAIN_MODULE=1 -fexceptions -g2 -O2

Then run with node main.js.

Version of emscripten/emsdk:
Checked on 3.1.52, 3.1.58 and tip of tree:

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.62-git (0a0fa668c63cdaf14cfccb2ce62bc5afa46b4170)
clang version 19.0.0git (https:/github.com/llvm/llvm-project 0e62d5cf55479981da5e05e406bbca4afb3cdc4f)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /home/rchatham/Documents/programming/emsdk/upstream/bin
Build config: +assertions
hoodmane added a commit to hoodmane/pyodide that referenced this issue Jun 4, 2024
After many hours of debugging, I minimized the problem down to this:
emscripten-core/emscripten#22052

Thanks to @henryiii for reporting. See thread here for my comments while I was
debugging:
scikit-hep/boost-histogram#938
@hoodmane
Copy link
Collaborator Author

hoodmane commented Jun 4, 2024

I think this patch to GOTHandler.get fixes the problem:

--- a/src/library_dylink.js
+++ b/src/library_dylink.js
@@ -176,6 +176,16 @@ var LibraryDylink = {
         // correctly.
         rtn.required = true;
       }
+      var value = resolveGlobalSymbol(symName, true).sym;
+      if (!value) {
+        return rtn;
+      }
+      if (typeof value == 'function') {
+        /** @suppress {checkTypes} */
+        rtn.value = addFunction(value, value.sig);
+      } else if (typeof value == 'number') {
+        rtn.value = value;
+      }
       return rtn;
     }
   },

hoodmane added a commit to hoodmane/emscripten that referenced this issue Jun 4, 2024
hoodmane added a commit to hoodmane/emscripten that referenced this issue Jun 4, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
…fined: true`

Resolves emscripten-core#22052
@sbc100
Copy link
Collaborator

sbc100 commented Jun 4, 2024

We should probably add a test for this? I wonder if there is some way to build a test that doesn't depend on using EM_JS?

Do you know exactly which types of symbols are effected by this? Is __cxa_throw effected because its a JS symbol? Or would any symbol be similarly effected?

@hoodmane
Copy link
Collaborator Author

hoodmane commented Jun 5, 2024

We should probably add a test for this?

+1 I'll add a test to #22053.

I wonder if there is some way to build a test that doesn't depend on using EM_JS?

loadDylibs sets allowUndefined: true and doesn't ever seem to call reportUndefinedSymbols(). I bet we can use that. I assume loadDylibs() is the code path we go through if we pass libsomething.so on the command line?

Do you know exactly which types of symbols are effected by this? would any symbol be similarly effected?

I am not sure. Many symbols don't seem to have this problem. I will investigate further and get back to you.

@hoodmane
Copy link
Collaborator Author

hoodmane commented Jun 6, 2024

Nevermind, loadDylibs() does call reportUndefinedSymbols() so that doesn't work.

hoodmane added a commit to pyodide/pyodide that referenced this issue Jun 7, 2024
After many hours of debugging, I minimized the problem down to this: emscripten-core/emscripten#22052

Thanks to @henryiii for reporting. See thread here for my comments while I was debugging:
scikit-hep/boost-histogram#938

Resolves #2964.
hoodmane added a commit to hoodmane/pyodide that referenced this issue Jun 7, 2024
After many hours of debugging, I minimized the problem down to this: emscripten-core/emscripten#22052

Thanks to @henryiii for reporting. See thread here for my comments while I was debugging:
scikit-hep/boost-histogram#938

Resolves pyodide#2964.
@hoodmane
Copy link
Collaborator Author

hoodmane commented Jun 9, 2024

I think the first bad commit is emscripten-releases e4ea05676d38c899d486aeb1553af4a0b5432317

Roll llvm-project from b31414bf4f98 to a7f4576ff4e2 (58 revisions)

@hoodmane
Copy link
Collaborator Author

hoodmane commented Jun 9, 2024

No wait that one is e4ea05676d38c899d486aeb1553af4a0b5432317 good first bad is the next commit 9063e25ecf39470a96689f3896ca5d4462928eb6:

Roll emscripten from f07067df82dd to 37055c7b57d0 (1 revision)

which is:

Update libcxx and libcxxabi to LLVM 18.1.2 (#21638)

This updates libcxx and libcxxabi to LLVM 18.1.2. Due to
llvm/llvm-project#65534 we need to update both
at the same time.

@hoodmane
Copy link
Collaborator Author

Reverting #21638 on top of the main branch has no conflicts and makes the problem go away. Any ideas why that might be @aheejin?

@sbc100
Copy link
Collaborator

sbc100 commented Jun 21, 2024

It seems like the issue was likely always there but somehow the C++ std library changes the set of undefined symbols present in the shared libraries. This does sounds like something really should be fixed.

On the other hand, only uses who load code directly through the loadDynamicLibrary will be effected. Since this is essentially a helper function for implementing dlopen we don't have as much testing for the use of this function directly like this. @hoodmane is there any way we could get you to transition to more traditional dlopen? I'm guessing that answer is no.. otherwise you would already have done it :)

@hoodmane
Copy link
Collaborator Author

hoodmane commented Jun 21, 2024

We'd like to transition to dlopen, I think the main barrier is that it is a sync API and there are quite a few reasons to prefer asynchronous loading.

cc @ryanking13 who might have a better idea than I do about this.

@sbc100
Copy link
Collaborator

sbc100 commented Jun 21, 2024

Do you know about that async version of dlopen? We have dlopen_emscripten (which take callbacks) and emscripten_dlopen_promise (which returns a promise)

@hoodmane
Copy link
Collaborator Author

No I was not aware of those. We're trying to call into this from JavaScript not from C so there would be some extra code in order to use these C apis, but if they work better that could be a reason to switch. I think there are some other subtle differences between the shared library dependency search paths in native dlopen vs emscripten's dlopen that we're currently working around. Perhaps if we can figure out a few more bugfixes we can switch.

It would be nice to see typedef __externref_t em_promise_t; some day so we wouldn't need this HandleAllocator stuff.

@ryanking13
Copy link
Contributor

Do you know about that async version of dlopen? We have dlopen_emscripten (which take callbacks) and emscripten_dlopen_promise (which returns a promise)

This would be worth taking a look at, thanks. Another reason we use the JS function directly is to locate shared libraries. I opened a separate issue about it #22126.

@sbc100
Copy link
Collaborator

sbc100 commented Jun 21, 2024

No I was not aware of those. We're trying to call into this from JavaScript not from C so there would be some extra code in order to use these C apis, but if they work better that could be a reason to switch. I think there are some other subtle differences between the shared library dependency search paths in native dlopen vs emscripten's dlopen that we're currently working around. Perhaps if we can figure out a few more bugfixes we can switch.

It would be nice to see typedef __externref_t em_promise_t; some day so we wouldn't need this HandleAllocator stuff.

That is very unlikely to happen sadly, since __externref_t cannot be stored to memory, and cannot live within a struct, so it would make the API way less useful for normal C/C++ programmers. Sadly C/C++ is just not good at holding onto non-store-able things.

@hoodmane
Copy link
Collaborator Author

I made this library to deal with adapting between __externref_t and addressable memory:
https://github.com/hoodmane/hiwire

I think it's beneficial to split up the steps of

  1. call function that returns an __externref_t and
  2. making an addressable handle to that object

A lot of the time people just use return values immediately and don't need to store them, so in all of those cases this saves work. The JavaScript functions don't need to do extra steps to lookup the handle value; if the handle lookup needs to happen it occurs in C. And then we don't need special case destroy functions since they are all shared. When I switched to this approach it eliminated a lot of boilerplate in all the EM_JS functions in Pyodide.

This is a bit of a sidetrack though =)

@sbc100
Copy link
Collaborator

sbc100 commented Jun 21, 2024

I made this library to deal with adapting between __externref_t and addressable memory: https://github.com/hoodmane/hiwire

I think it's beneficial to split up the steps of

  1. call function that returns an __externref_t and
  2. making an addressable handle to that object

A lot of the time people just use return values immediately and don't need to store them, so in all of those cases this saves work. The JavaScript functions don't need to do extra steps to lookup the handle value; if the handle lookup needs to happen it occurs in C. And then we don't need special case destroy functions since they are all shared. When I switched to this approach it eliminated a lot of boilerplate in all the EM_JS functions in Pyodide.

This is a bit of a sidetrack though =)

Sure that sounds like a good way of layering it. I would still probably want a simple variant all such function that just return a handle though, and perhaps a lower/level alternative one that returns the raw externref (which, as I say, I imagine very few folks would want in practice).

IIRC, we have looked, for example, and moving the handle allocator to native code, but IIRC is was strictly worse since the JS side is extremely good and doing fast hashmaps and doing that in C has some overhead. I guess there are somethings that JS really is better at.

@hoodmane
Copy link
Collaborator Author

a lower/level alternative one that returns the raw externref
I imagine very few folks would want in practice

Of course different applications are different, but in Pyodide there are only 8 places we explicitly create a handle now, whereas there were 87 places before we switched to externref, so we only need to make a handle under 10% of the time. This made a pretty large difference in performance for us.

FWIW, I think when I changed the implementation from JavaScript to C I didn't see any change in performance. We don't implement a hash map in C, we just have the table and an array of metadata. You could also do it with a free list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants