Convert Imports to Indirect Calls for Dynamic Linking #2523

awtcode · 2019-12-11T09:55:23Z

Inter-module calls in dynamic linking currently go thru JS stubs. Convert the calls to Wasm Indirect calls for the following reasons:

Faster performance
Skip legalization
Remove JS stubs (To be done in a later PR)

The Emscripten portion of the work is done in this PR emscripten-core/emscripten#10003

… convert_import_indirect

awtcode · 2019-12-11T15:00:30Z

@sbc100 , this is the corresponding work in Binaryen.

sbc100

Is this showing improvements for you?

Are you running it both on the side module and the main module?

In general I think this looks a lot like what we talked about. I'd like to see a test or two added in the test/lld directory to see the effect of this on the code.

Thanks for your work on the Kevin. I don't think we are ready to land this yet but maybe with a few iterations and a little more investigation on our side. @kripken suggested we come up with a design doc for example.

sbc100 · 2019-12-11T23:14:39Z

src/passes/ImportsToIndirectCalls.cpp

+ */
+
+//
+// Turn indirect calls into direct calls. This will improve the runtime


This comments seem backwards, no?

I think needs some more documentation here. IIRC the cost we trying to avoid is being forced to go through JS, even for calls to other wasm modules. In this case going through the table is an alternative form of indirection that should be faster?

Oops sorry, will update the wrong comment.

sbc100 · 2019-12-11T23:15:10Z

src/passes/pass.cpp

               createUnteePass);
  registerPass("vacuum", "removes obviously unneeded code", createVacuumPass);
+  registerPass("ImportsToIndirectCalls",
+               "turns imports into indirect calls",


How about "turn calls to imported functions into indirect calls"?

Updated it to "convert calls to imported functions into indirect calls for the main module"

sbc100 · 2019-12-11T23:15:59Z

src/tools/wasm-emscripten-finalize.cpp

 #include "wasm-io.h"
 #include "wasm-printing.h"
 #include "wasm-validator.h"
+#include "C:\git\binaryen\binaryen\src\passes/passes.h"


Oops, I forgot about this. Will switch to relative paths.

sbc100 · 2019-12-11T23:17:20Z

src/tools/wasm-emscripten-finalize.cpp

    }
  }

+  // Convert the imports to indirects before the Legalization pass to 


Convert calls to imported functions into indirect calls..

Is it the legalization that is the problem or the fact that we have to go through JS at all?

Both of them slow down the performance.

sbc100 · 2019-12-11T23:29:21Z

src/passes/ImportsToIndirectCalls.cpp

+               std::istreambuf_iterator<char>());
+
+    // index.wasm.libfile will have the following format:
+    // ['_glClearStencil', '_glUniformMatrix2fv', '_eglGetCurrentSurface']


This seems like it would much simpler if the format was just no symbol per line:

_glClearStencil _glUniformMatrix2fv _eglGetCurrentSurface

Any maybe calls this a "symbols file" .. index.wasm.symbols

How about 'index.wasm.js_symbols' since the file contains JS symbols?

sbc100 · 2019-12-11T23:31:50Z

src/passes/ImportsToIndirectCalls.cpp

+
+    if (!module->table.exists) {
+      Fatal() << "Table does not exist in module!!!";
+      return;


maybe we can just create on in this case?

sbc100 · 2019-12-11T23:48:04Z

src/passes/ImportsToIndirectCalls.cpp

+  void visitCall(Call* curr) {
+    auto name = curr->target;
+
+    if (!name.isNull()) {


Use early return here:

if (name.isNull()) { return; }

sbc100 · 2019-12-11T23:48:47Z

src/passes/ImportsToIndirectCalls.cpp

+      auto* func = module->getFunction(name);
+
+      if (func) {
+        if (func->imported()) {


Use early return here:

if (func || func->imported()) { return; }

sbc100 · 2019-12-12T00:09:17Z

src/asm2wasm.h

    passRunner.add("merge-blocks");
    passRunner.add("optimize-instructions");
    passRunner.add("post-emscripten");
+    passRunner.add("ImportsToIndirectCalls");


You probably don't want to add this here.

This was a mistake. Will remove it.

sbc100 · 2019-12-12T00:14:17Z

src/tools/wasm-emscripten-finalize.cpp


+  // Convert the imports to indirects before the Legalization pass to 
+  // avoid unnecessary legalization
+  if (!options.passOptions.arguments["library-file"].empty()) {


Can you give this argument a better name too? How about "js-symbols"?

Sure, sounds good.

awtcode · 2019-12-12T08:30:50Z

Is this showing improvements for you?

I have updated the Emscripten PR with the benchmark results. On Chrome, we see a 13x improvement while on FF, we see a 7x improvement.

Are you running it both on the side module and the main module?

We only need to run this on the main module because the previous PR emscripten-core/emscripten#9917 already ensures that we directly call an exported Wasm function rather than the JS stub. This is already faster than what this PR provides, which is to make an indirect call to an exported Wasm function.

In general I think this looks a lot like what we talked about. I'd like to see a test or two added in the test/lld directory to see the effect of this on the code.

Sure, I can always add my benchmark as a testcase to the repo.

Thanks for your work on the Kevin. I don't think we are ready to land this yet but maybe with a few iterations and a little more investigation on our side. @kripken suggested we come up with a design doc for example.

Sounds good, can you point me to an example design doc in case you have a specific format in mind?

kripken · 2019-12-16T18:02:06Z

Thinking about this some more, I still wish we could simplify everything - make all calls indirect, for example - but I agree that adds some risk and downsides. So I am mostly ok with this approach, if @sbc100 you think this is maintainable and does not add any problems for migrating to a long-term better model for dynamic linking.

I'd suggest restructuring this as follows: Not run the pass from finalize, and instead, it will be run from emscripten. A pass can take arguments (see e.g. ExtractFunction and Asyncify for examples), so it could have an ignoreList param that has imports to ignore (which would be JS imports).

Then on the Emscripten side, we can run this at the appropriate time. The main BINARYEN_PASSES infrastructure is what drives the asyncify pass, for example, and this might make sense there. Note that this happens after the JS compiler runs, so we don't need to run it twice - the single invocation can provide us the list of JS imports, and we can stash it in an internal setting.

awtcode · 2019-12-17T04:18:00Z

I'd suggest restructuring this as follows: Not run the pass from finalize, and instead, it will be run from emscripten. A pass can take arguments (see e.g. ExtractFunction and Asyncify for examples), so it could have an ignoreList param that has imports to ignore (which would be JS imports).

@kripken , running this pass in finalize has the added benefit of doing the imports to indirects conversion before legalization and that will help in both reducing the size of the binary as well as runtime performance. Should we move the legalization pass out of finalize as well?

kripken · 2019-12-20T18:12:12Z

@awtcode Interesting question about the legalize pass. Yes, it seems like this should happen before legalize. In fact it seems like moving legalize to the very end is the most efficient thing in general. I think that's worth trying, although there may be issues I don't remember right now. @sbc100 what do you think?

kripken · 2020-01-10T22:25:56Z

I did some investigating of moving legalization all the way to the end of the emcc.py pipeline. The first serious issue I hit was that if we legalize in finalize, which is early, then we find out if we need getTempRet0/setTempRet0 at that time, and can tell the JS compiler to emit it or not. If we legalize late, we'd be forced to always emit those, and rely on optimizations to remove them like metadce. That feels like it might be annoying enough that it's not worth the change, curious what others think.

sbc100 · 2020-01-10T22:27:04Z

If we always emit getTempRet0/setTempRet0 can't meta-dce then remove them?

kripken · 2020-01-13T18:01:57Z

Yes, metadce would, so this is not going to increase fully optimized builds. It's still annoying though I think.

Track the beginning and end of each function, both when reading and writing. We track expressions and functions separately, instead of having a single big map of (oldAddr) => (newAddr) because of the potentially ambiguous case of the final expression in a function: it's end might be identical in offset to the end of the function. So we have two different things that map to the same offset. However, if the context is "the end of the function" then the updated address is the new end of the function, even if the function ends with a different instruction now, as the old last instruction might have moved or been optimized out. Concretely, we have getNewExprAddr and getNewFuncAddr, so we can ask to update the location of either an expression or a function, and use that contextual information. This checks for the DIE tag in order to know what we are looking for. To be safe, if we hit an unknown tag, we halt, so that we don't silently miss things. As the test updates show, the new things we can do thanks to this PR are to update compile unit and subprogram low_pc locations. Note btw that in the first test (dwarfdump_roundtrip_dwarfdump.bin.txt) we change 5 to 0: that is correct since that test does not write out DWARF (it intentionally has no -g), so we do not track binary locations while writing, and so we have nothing to update to (the other tests show actual updating). Also fix the order in the python test runner code to show a diff of expected to encountered, and not the reverse, which confused me.

From llvm/llvm-project@adf7a0a

This only touches test code. The files are compiled with latest LLVM + https://reviews.llvm.org/D71681 in order to get more realistic DWARF content.

This adds EH instruction support for `CFGWalker`. This also implements `call` instruction handling within a try-catch; every call can possibly throw and unwind to the innermost catch block. This adds tests for RedundantSetElimination pass, which uses `CFGWalker`.

Update high_pc values. These are interesting as they may be a relative offset compared to the low_pc. For functions we already had both a start and an end. Add such tracking for instructions as well.

This will make it easier to switch to something else for offsets in wasm binaries if we get >4GB files.

Instead of reinventing the wheel on our side, this adds ExpressionAnalyzer bindings to the C- and JS-APIs, which can be useful for generators. For example, a generator may decide to simplify a compilation step if a subexpression doesn't have any side effects, or simply skip emitting something that is likely to compile to a drop or an empty block right away.

LLVM points to the start of the function in some debug line entries - right after the size LEB of the function, which is where the locals are declared, and before any instructions.

It is convenient to have the full command when debugging fuzzing errors. The fuzzer sometimes fails before running `wasm-reduce` and being able to reproduce the command right away from the log is very handy in that case.

Instead of hackishly advancing the read position in the binary buffer, call readExpression which will do that, and also do all the debug info handling for us.

Binaryen.js now uses offset instead of byteOffset when inspecting a memory segment, matching the arguments on memory segment creation. Also adds inspection of the passive property. Previously, one would specify { offset, data, passive } on creation and get back { byteOffset, data } upon inspection. This PR unifies both to the keys on creation while also adding the respective C-API to retrieve passive status, which was missing.

Fixes the testcase in WebAssembly#2343 (comment) Looks like that's from Rust. Not sure why it would have an invalid abbreviation code, but perhaps the LLVM there emits dwarf differently than we've tested on so far. May be worth investigating further, but for now emit a warning, skip that element, and don't crash. Also fix valgrind warnings about Span values not being initialized, which was invalid and bad as well (wasted memory in our maps, and might have overlapped with real values), and interfered with figuring this out.

…WebAssembly#2603) Control flow structures have those in addition to the normal span of (start, end), and we need to track them too. Tracking them during reading requires us to track control flow structures while parsing, so that we can know to which structure an end/else/catch refers to. We track these locations using a map on the side of instruction to its "extra" locations. That avoids increasing the size of the tracking info for the much more common non-control flow instructions. Note that there is one more 'end' location, that of the function (not referring to any instruction). I left that to a later PR to not increase this one too much.

DWARF from LLVM can refer to the first byte belonging to the function, where the size LEB is, or to the first byte after that, where the local declarations are, or the end opcode, or to one byte past that which is one byte past the bytes that belong to the function. We aren't sure why LLVM does this, but track it all for now. After this all debug line positions are identified. However, in some cases a debug line refers to one past the end of the function, which may be an LLVM bug. That location is ambiguous as it could also be the first byte of the next function (what made this discovery possible was when this happened to the last function, after which there is another section).

While line and address values of 0 should be skipped, it seems like column 0 are valid lines emitted by LLVM.

We need to track end_sequence directly, and use either end_sequence or copy (copy emits a line without marking it as ending a sequence). After this, fib2 debug line output looks perfect.

Just some trivial fixes: * Properly reset prologue after each line (unlike others, this flag should be reset immediately). * Test for a function's end address first, as LLVM output appears to use 1-past-the-end-of-the-function as a location in that function, and not the next (note the first byte of the next function, which is ambiguously identical to that value, is used at least in low_pc; I'm not sure if it's used in debug lines too). * Ignore the same address if LLVM emitted it more than once, which it does sometimes.

Pretty straightforward given all we have so far. Note that fannkuch3_manyopts has an example of a sequence of ranges of which some must be skipped while others must not, showing we handle that by skipping the bad ones and updating the remaining. That is, if that we have a sequence of two (begin, end) spans [(10, 20), (30, 40)] It's possible (10, 20) maps in the new binary to (110, 120) while (30, 40) was eliminated by the optimizer and we have nothing valid to map it to. In that case we emit [(110, 120)]

…Assembly#2613) Chrome is currently decoding the segment indices as signed numbers, so some ranges of indices greater than 63 do not work. As a temporary workaround, limit the number of segments produced by MemoryPacking to 63 when bulk-memory is enabled.

# Conflicts: # src/wasm/wasm-debug.cpp # test/passes/fannkuch3.bin.txt # test/passes/fannkuch3_manyopts.bin.txt

awtcode · 2020-12-27T03:06:53Z

Not required anymore.

cadcode added 10 commits September 9, 2019 15:05

First commit for converting imports to indirect calls.

1f26c4d

Update branch with master.

d401dac

Add ImportsToIndirectCalls.cpp to cmake.

b538f2d

Merge branch 'master' of https://github.com/WebAssembly/binaryen into…

0040b3e

… convert_import_indirect

Run the conversion pass

3010213

Conversion only works for non i64 at this point in time.

2995f77

Execute pass only when library file is present

cc897a2

Merge branch 'master' of https://github.com/WebAssembly/binaryen into…

5388c45

… convert_import_indirect

Merge branch 'master' of https://github.com/WebAssembly/binaryen into…

24083a4

… convert_import_indirect

Clean up the code

fc569f0

awtcode mentioned this pull request Dec 11, 2019

Convert Imports to Indirect Calls for Dynamic Linking emscripten-core/emscripten#10003

Closed

sbc100 reviewed Dec 12, 2019

View reviewed changes

cadcode added 2 commits December 12, 2019 18:05

Incorporate first round of review comments.

9c329d3

Merge branch 'origin_master' into convert_import_indirect

1971a5f

cadcode added 3 commits December 17, 2019 18:15

Incorporate review comments and update to latest master

6809716

Merge branch 'origin_master' into convert_import_indirect

1a35fc4

Merge branch 'origin_master' into convert_import_indirect

1411136

Remove unnecessary variable

fc33176

cadcode and others added 4 commits January 16, 2020 18:26

Merge branch 'origin_master' into convert_import_indirect

6e3a92d

Add Type class when using i32.

18b8f3b

Do not convert emscripten_asm_* JS imports

f281824

kripken and others added 19 commits January 23, 2020 17:14

Update LLVM to support WASM_location (WebAssembly#2596)

276f2e4

From llvm/llvm-project@adf7a0a

Update DWARF testcases (WebAssembly#2594)

45e87dd

This only touches test code. The files are compiled with latest LLVM + https://reviews.llvm.org/D71681 in order to get more realistic DWARF content.

DWARF: high_pc computation (WebAssembly#2595)

058ab50

Update high_pc values. These are interesting as they may be a relative offset compared to the low_pc. For functions we already had both a start and an end. Add such tracking for instructions as well.

Use BinaryLocation instead of hardcoding uint32_t (WebAssembly#2598)

7923777

This will make it easier to switch to something else for offsets in wasm binaries if we get >4GB files.

Update debug line info with function entries (WebAssembly#2600)

868b05c

LLVM points to the start of the function in some debug line entries - right after the size LEB of the function, which is where the locals are declared, and before any instructions.

Remove limit in the log length in fuzz_opt.py (WebAssembly#2601)

177571e

It is convenient to have the full command when debugging fuzzing errors. The fuzzer sometimes fails before running `wasm-reduce` and being able to reproduce the command right away from the log is very handy in that case.

Simplify binary parsing a little (WebAssembly#2602)

ee44b9f

Instead of hackishly advancing the read position in the binary buffer, call readExpression which will do that, and also do all the debug info handling for us.

DWARF: Allow debug lines with column 0 (WebAssembly#2609)

a4f77d2

While line and address values of 0 should be skipped, it seems like column 0 are valid lines emitted by LLVM.

DWARF: Use end_sequence and copy properly (WebAssembly#2610)

c314725

We need to track end_sequence directly, and use either end_sequence or copy (copy emits a line without marking it as ending a sequence). After this, fib2 debug line output looks perfect.

Merge branch 'master' into convert_import_indirect

e5ff4b0

# Conflicts: # src/wasm/wasm-debug.cpp # test/passes/fannkuch3.bin.txt # test/passes/fannkuch3_manyopts.bin.txt

awtcode closed this Dec 27, 2020

Convert Imports to Indirect Calls for Dynamic Linking #2523

Convert Imports to Indirect Calls for Dynamic Linking #2523

Uh oh!

Conversation

awtcode commented Dec 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

awtcode commented Dec 11, 2019

Uh oh!

sbc100 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awtcode Dec 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awtcode commented Dec 12, 2019

Uh oh!

kripken commented Dec 16, 2019

Uh oh!

awtcode commented Dec 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kripken commented Dec 20, 2019

Uh oh!

kripken commented Jan 10, 2020

Uh oh!

sbc100 commented Jan 10, 2020

Uh oh!

kripken commented Jan 13, 2020

Uh oh!

awtcode commented Dec 27, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

awtcode commented Dec 11, 2019 •

edited

Loading

awtcode Dec 12, 2019 •

edited

Loading

awtcode commented Dec 17, 2019 •

edited

Loading