-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop building MAIN_MODULE with RELOCATABLE (-fPIC) #12682
Comments
I spend a while looking into this yesterday and hit a bit of a road block. The goal here is to build the main module without The problem is that as of today when you build without Basically we have conflated to concpets of PIC code and dynamically linked code (GOT usage for external symbols). I am now leaning towards using GOT relocations conservatively in more cases (not just Another minor positive is that since we can now name globals in the name section it should make it disassembing unoptimized executables nicer. |
Interesting! How does native code handle this situation? Can the relocation model be controlled by command line? Perhaps Emscripten could then opt into the more conservative mode rather than making it the LLVM default. Or we could make that conservative mode the LLVM default and folks who want the old behavior would be able to opt out. |
Interesting... How big is the extra overhead here, and how important is it to reduce? |
Are you talking about the extra overhead introduced by building the MAIN_MODULE as RELOCTABLE today? Or the extra overhead of my proposal to enable I believe all the overhead introduced by The overhead of using |
I see, thanks @sbc100 That does sound like a strong motivation to do it. I didn't realize there was a non-perf aspect here.
|
That is a good question. I'd would need to do some measurements but roughly speaking the cost would be:
The latter shouldn't effect code size, and since the global is a immutable it should probably not effect the runtime either since presumably any engine is going to treat
I can't imagine the impact of |
Good question. I will investigate. I'm guessing if you build a native binary without
Yes
Indeed. I'm thinking that anything we do should probably be behind wasm32-unknown-emscripten.. at least to start with. |
Well, a baseline compiler would possibly not optimize this. Even an optimizing compiler might not if it optimizes functions in parallel first before looking at global state, but as the globals arrive first, it does seem like they could do this. In the worst case this would replace a constant with a load from memory. That doesn't seem too bad.
Yes, |
Hi, @sbc100 @kripken I think one big downside of building the MAIN_MODULE as RELOCTABLE is the extra instructions introduced to calculate the memory address by adding the memory base as like below:
This is not needed for MAIN_MODULE since the memory base is always 0. It increase the code size and slow the execution. The function pointer also has similar problem. |
I wrote up a short design doc for how to move forward with this: https://docs.google.com/document/d/1viN3qTS5QzeDP7NR0pGg9D5JuuOrQzsORumMswYGpsA/edit?usp=sharing&resourcekey=0-2Rnysxch2EuXNT3cvoD1MA |
Indeed, this is of the primary motivators for this change. Thanks for pointing this out explicitly. |
@sbc100, under the solution you wrote up, would there be a way to have some symbols directly imported and other symbols go through the GOT? I'm thinking of a situation in which some symbols are dynamically loaded from shared libraries but some are meant to be normal JS imports. |
The GOT should only be used when symbol address are imported/exported. For first class functions they are imported in the same way as in the static build. This is true already for MAIN_MODULE/SIDE_MODULE builds. However, if the address of a JS function is required it will indeed be imported as |
After more experimenting I'm leaning towards not using any new flags or reocations models and just sticking to compiling with This means that resulting binary will contain accessor that look like the above but against a constant base:
I'm assuming that binaryen can take care of the relaxation of all of these to just:
Does that seem reasonable? (@kripken?) Obviously it would better for wasm-ld to be perform this relaxation one day but we don't currently do any linker relaxation in wasm-ld so that would be a much bigger change. |
@sbc100 Yes, Binaryen can do such optimizations. |
…4467) There is no reason the `__stack_pointer` global can't be exported from the module, and in fact I'm experimenting with a non-relocatable main module that requires this. See emscripten-core/emscripten#12682 This heuristic still kind of sucks but should always be good enough for llvm output that always puts the stack pointer first.
This is a new mode for handling unresolved symbols that allows all symbols to be imported in the same that they would be in the case of `-fpie` or `-shared`, but generting an otherwise fixed/non-relocatable binary. Code linked in this way should still be compiled with `-fPIC` so that data symbols can be resolved via imports. This essentially allows the building of static binaries that have dynamic imports. See: emscripten-core/emscripten#12682 As with other uses of the experimental dynamic linking ABI, this behaviour will produce a warning unless run with `--experimental-pic`. Differential Revision: https://reviews.llvm.org/D91577
This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant. |
Are you going to work on that? It'd help me a lot |
@sbc100 We have a large project - porting large codebase from native code to WASM (we can port our codebase using porting dynamic libraries on startup but we get performance hit on startup, so it's logical to convert dynamic libraries on startup to static and then we get enormous wasm_apply_data_relocs in this case) |
@Stuonts can you help me to understand what causes the explosion of data relocations. perhaps you can answer a few questions:
|
Hello @sbc100 , I am Stuonts's collegue.
|
Regarding |
Last time we tried it we had some kind of memory corruption issues that only went away when we switched back to MAIN_MODULE=1. I need to try again to give exact details |
Hello @sbc100, so here are my tests for debug build(but with light optimization -O1) |
I wonder if we can figure out why you have so many data relocations in your program? I fear that the size of the |
@sbc100 I need to make sure I am allowed to attach .wasm file, because legal issues. I can give you sizes of imports/exports maybe or count of exported/imported functions. I can confirm that we have many exported classes, where every method is exported and classes have lots of methods. |
@sbc100 So i run Why does main binary has so many exports, I was assuming it is supposed to export only main function and thats pretty much it. And I certnainly didn't expect main binary to have 4-5 times more exports than imports. |
Hello, @sbc100. I am trying to change __wasm_apply_data_relocs to generate loop so it doesn't grow with relocations size, as you written in your TODO here, because it is a major blocker for our big webassembly port |
Sadly I don't think its possible to write this as a loop. This is because each of the symbol address is coming in as an imported global, and the |
@sbc100 So here is a snippet of what I assume is code to apply one relocation:
I see global.get being used with $__memory_base, which is "internal immutable global set to 1024". But what about i32.const here? Could we for example use local.get and some local variable to read offset for specific relocation and add that to memory_base and then store it? Is it not possible to create array of offsets known at compile time and then go over that array, read value into local variable and have the snippet above in a loop, but have hardcoded constants replaced with values of local var into which we put offset for specific relocation? |
Why do you say BTW, if a global really is internal and immutable then binaryen will convert Also, the whole point of relocations is for the case when the binaryen don't know the memory base. If the memory base is known at link time you don't need any relocations. |
The real problem for turning relocations into a loop is not |
One thing that came up at the CG meeting last week was the idea of imported data segments. If relocations could be stored in data segments rather than globals, they could be copied into memory and iterated over. That idea is nowhere near close to being reality, though. Maybe this would be a useful follow-on proposal after extended const lands? |
@sbc100 About __memory_base being internal const global, I just copied quote from this discussion midlessly, now I see how that doesn't make any sense, because unknown memory base is why we need relocations as you pointed out. I looked at WAT of various functions some more and what I wanted to try is to have something like this, imagine we have 2 generated arrays with size equal to number of relocations, where we would fill in respective constants that are used by i32.store in snippet I wrote above
Now __wasm_apply_data_relocs would look something like this
This function would have constant size of 2 hardcoded relocs plus a loop, but we would have 2 extra int arrays with size equal to relocations count, which I think is well worth it? |
The problem here is that the There is no way for the dynamic linker to fill in an array for symbol locations. |
But my wasm_apply_data reloc is all like this
I don't see any GOT.func.printf, it is just a bunch of arbitrary constants that this code seems to know at compile time already |
The problem is that there way the we have defined a symbol in the dynamic linker is a wasm global which stores its address (e.g. There is no relocation format that the dynamic linker understands, the relocations are all embedded/hidden in the binary and based off of global imports. |
I don't see any GOT.func.printf, it is just a bunch of arbitrary constants that this code seems to know at compile time already https://codebrowser.dev/llvm/lld/wasm/InputChunks.cpp.html#363 Am I missing something?`
I see. Yes, for such internal relocations we could use a loop. I believe this is because the dynamic linker knows that those symbols are defined in the main module. For the side module I think you would see more GOT-based relocations (i.e. hasGOTIndex() would be true in the above code). Given that I agree we could split the relocations into two type, internal and external. For internal relocations, we could then use a loop rather than repeating the code sequence. |
Ok I lied, sorry, I didn't scroll long enough as my apply_data_reloc is ginormous. I do have GOT.func bits. I was going to suggest to at least apply this loop approach for bits that only need constants and do not need GOT.func, thanks! |
If this is only applicable to main module it is fine, we only have size issues with apply_data_reloc in MAIN_MODULE. Side modules do not have this issue as they are much smaller |
There is no need to the main module itself to be relocatable. We should be able to get some code size and performance wins by making it static.
The text was updated successfully, but these errors were encountered: