Skip to content

Commit

Permalink
Reintroduce wasm-merge (WebAssembly#5709)
Browse files Browse the repository at this point in the history
We used to have a wasm-merge tool but removed it for a lack of use cases. Recently
use cases have been showing up in the wasm GC space and elsewhere, as people are
using more diverse toolchains together, for example a project might build some C++
code alongside some wasm GC code. Merging those wasm files together can allow
for nice optimizations like inlining and better DCE etc., so it makes sense to have a
tool for merging.

Background:
* Removal: WebAssembly#1969
* Requests:
  * wasm-merge - why it has been deleted WebAssembly#2174
  * Compiling and linking wat files WebAssembly#2276
  * wasm-link? WebAssembly#2767

This PR is a compete rewrite of wasm-merge, not a restoration of the original
codebase. The original code was quite messy (my fault), and also, since then
we've added multi-memory and multi-table which makes things a lot simpler.

The linking semantics are as described in the "wasm-link" issue WebAssembly#2767 : all we do
is merge normal wasm files together and connect imports and export. That is, we
have a graph of modules and their names, and each import to a module name can
be resolved to that module. Basically, like a JS bundler would do for JS, or, in other
words, we do the same operations as JS code would do to glue wasm modules
together at runtime, but at compile time. See the README update in this PR for a
concrete example.

There are no plans to do more than that simple bundling, so this should not
really overlap with wasm-ld's use cases.

This should be fairly fast as it works in linear time on the total input code. However,
it won't be as fast as wasm-ld, of course, as it does build Binaryen IR for each
module. An advantage to working on Binaryen IR is that we can easily do some
global DCE after merging, and further optimizations are possible later.
  • Loading branch information
kripken authored and radekdoulik committed Jul 12, 2024
1 parent 5ab990d commit bd731ae
Show file tree
Hide file tree
Showing 30 changed files with 2,171 additions and 11 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@ full changeset diff at the end of each section.
Current Trunk
-------------

- Add a `wasm-merge` tool. This is a full rewrite of the previous `wasm-merge`
tool that was removed from the tree in the past. The new version is much
simpler after recent improvements to multi-memory and multi-table. The
rewrite was motivated by new use cases for merging modules in the context of
WasmGC.
- Some C and JS API functions now refer to data and element segments by name
instead of index.
- The --nominal and --hybrid command line options and related API functions have
Expand Down
147 changes: 147 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,9 @@ This repository contains code that builds the following tools in `bin/`:
performs emscripten-specific passes over it.
* **wasm-ctor-eval**: A tool that can execute functions (or parts of functions)
at compile time.
* **wasm-merge**: Merges multiple wasm files into a single file, connecting
corresponding imports to exports as it does so. Like a bundler for JS, but
for wasm.
* **binaryen.js**: A standalone JavaScript library that exposes Binaryen methods for [creating and optimizing Wasm modules](https://github.com/WebAssembly/binaryen/blob/main/test/binaryen.js/hello-world.js). For builds, see [binaryen.js on npm](https://www.npmjs.com/package/binaryen) (or download it directly from [github](https://raw.githubusercontent.com/AssemblyScript/binaryen.js/master/index.js), [rawgit](https://cdn.rawgit.com/AssemblyScript/binaryen.js/master/index.js), or [unpkg](https://unpkg.com/binaryen@latest/index.js)). Minimal requirements: Node.js v15.8 or Chrome v75 or Firefox v78.

Usage instructions for each are below.
Expand Down Expand Up @@ -562,6 +565,150 @@ as mentioned earlier, but there is no limitation on what you can execute here.
Any export from the wasm can be executed, if its contents are suitable. For
example, in Emscripten `wasm-ctor-eval` is even run on `main()` when possible.

### wasm-merge

`wasm-merge` combines wasm files together. For example, imagine you have a
project that uses wasm files from multiple toolchains. Then it can be helpful to
merge them all into a single wasm file before shipping, since in a single wasm
file the calls between the modules become just normal calls inside a module,
which allows them to be inlined, dead code eliminated, and so forth, potentially
improving speed and size.

For example, imagine we have these two wasm files:

```wat
;; a.wasm
(module
(import "second" "bar" (func $second.bar))
(export "main" (func $func))
(func $func
(call $second.bar)
)
)
```

```wat
;; b.wasm
(module
(import "outside" "log" (func $log (param i32)))
(export "bar" (func $func))
(func $func
(call $log
(i32.const 42)
)
)
)
```

The filenames on your local drive are `a.wasm` and `b.wasm`, but for merging /
bundling purposes let's say that the first is known as `"first"` and the second
as `"second"`. That is, we want the first module's import of `"second.bar"` to
call the function `$func` in the second module. Here is a wasm-merge command for
that:

```
wasm-merge a.wasm first b.wasm second -o output.wasm
```

We give it the first wasm file, then its name, and then the second wasm file
and then its name. The merged output is this:

```wat
(module
(import "second" "bar" (func $second.bar))
(import "outside" "log" (func $log (param i32)))
(export "main" (func $func))
(export "bar" (func $func_2))
(func $func
(call $func_2)
)
(func $func_2
(call $log
(i32.const 42)
)
)
)
```

`wasm-merge` combined the two files into one, merging their functions, imports,
etc., all while fixing up name conflicts and connecting corresponding imports to
exports. In particular, note how `$func` calls `$func_2`, which is exactly what
we wanted: `$func_2` is the function from the second module (renamed to avoid a
name collision).

Note that the wasm output in this example could benefit from additional
optimization. First, the call to `$func_2` can now be easily inlined, so we can
run `wasm-opt -O3` to do that for us. Also, we may not need all the imports and
exports, for which we can run
[wasm-metadce](https://github.com/WebAssembly/binaryen/wiki/Pruning-unneeded-code-in-wasm-files-with-wasm-metadce#example-pruning-exports).
A good workflow could be to run `wasm-merge`, then `wasm-metadce`, then finish
with `wasm-opt`.

`wasm-merge` is kind of like a bundler for wasm files, in the sense of a "JS
bundler" but for wasm. That is, with the wasm files above, imagine that we had
this JS code to instantiate and connect them at runtime:

```js
// Compile the first module.
var first = await fetch("a.wasm");
first = new WebAssembly.Module(first);

// Compile the first module.
var second = await fetch("b.wasm");
second = new WebAssembly.Module(second);

// Instantiate the second, with a JS import.
second = new WebAssembly.Instance(second, {
outside: {
log: (value) => {
console.log('value:', value);
}
}
});

// Instantiate the first, importing from the second.
first = new WebAssembly.Instance(first, {
second: second.exports
});

// Call the main function.
first.exports.main();
```

What `wasm-merge` does is basically what that JS does: it hooks up imports to
exports, resolving names using the module names you provided. That is, by
running `wasm-merge` we are moving the work of connecting the modules from
runtime to compile time. As a result, after running `wasm-merge` we need a lot
less JS to get the same result:

```js
// Compile the single module.
var merged = await fetch("merged.wasm");
merged = new WebAssembly.Module(merged);

// Instantiate it with a JS import.
merged = new WebAssembly.Instance(merged, {
outside: {
log: (value) => {
console.log('value:', value);
}
}
});

// Call the main function.
merged.exports.main();
```

We still need to fetch and compile the merged wasm, and to provide it the JS
import, but the work to connect two wasm modules is not needed any more.

## Testing

```
Expand Down
69 changes: 65 additions & 4 deletions scripts/fuzz_opt.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,11 @@ def random_size():
return random.randint(INPUT_SIZE_MIN, 2 * INPUT_SIZE_MEAN - INPUT_SIZE_MIN)


def make_random_input(input_size, raw_input_data):
with open(raw_input_data, 'wb') as f:
f.write(bytes([random.randint(0, 255) for x in range(input_size)]))


def run(cmd, stderr=None, silent=False):
if not silent:
print(' '.join(cmd))
Expand Down Expand Up @@ -1284,6 +1289,62 @@ def handle(self, wasm):
compare_between_vms(fix_output(wasm_exec), fix_output(evalled_wasm_exec), 'CtorEval')


# Tests wasm-merge
class Merge(TestCaseHandler):
frequency = 0.15

def handle(self, wasm):
# generate a second wasm file to merge. note that we intentionally pick
# a smaller size than the main wasm file, so that reduction is
# effective (i.e., as we reduce the main wasm to small sizes, we also
# end up with small secondary wasms)
# TODO: add imports and exports that connect between the two
wasm_size = os.stat(wasm).st_size
second_size = min(wasm_size, random_size())
second_input = abspath('second_input.dat')
make_random_input(second_size, second_input)
second_wasm = abspath('second.wasm')
run([in_bin('wasm-opt'), second_input, '-ttf', '-o', second_wasm] + FUZZ_OPTS + FEATURE_OPTS)

# sometimes also optimize the second module
if random.random() < 0.5:
opts = get_random_opts()
run([in_bin('wasm-opt'), second_wasm, '-o', second_wasm, '-all'] + FEATURE_OPTS + opts)

# merge the wasm files. note that we must pass -all, as even if the two
# inputs are MVP, the output may have multiple tables and multiple
# memories (and we must also do that in the commands later down).
#
# Use --skip-export-conflicts as we only look at the first module's
# exports for now - we don't care about the second module's.
# TODO: compare the second module's exports as well, but we'd need
# to handle renaming of conflicting exports.
merged = abspath('merged.wasm')
run([in_bin('wasm-merge'), wasm, 'first',
abspath('second.wasm'), 'second', '-o', merged,
'--skip-export-conflicts'] + FEATURE_OPTS + ['-all'])

# sometimes also optimize the merged module
if random.random() < 0.5:
opts = get_random_opts()
run([in_bin('wasm-opt'), merged, '-o', merged, '-all'] + FEATURE_OPTS + opts)

# verify that merging in the second module did not alter the output.
output = run_bynterp(wasm, ['--fuzz-exec-before', '-all'])
output = fix_output(output)
merged_output = run_bynterp(merged, ['--fuzz-exec-before', '-all'])
merged_output = fix_output(merged_output)

# a complication is that the second module's exports are appended, so we
# have extra output. to handle that, just prune the tail, so that we
# only compare the original exports from the first module.
# TODO: compare the second module's exports to themselves as well, but
# they may have been renamed due to overlaps...
merged_output = merged_output[:len(output)]

compare_between_vms(output, merged_output, 'Merge')


# Check that the text format round-trips without error.
class RoundtripText(TestCaseHandler):
frequency = 0.05
Expand All @@ -1306,6 +1367,7 @@ def handle(self, wasm):
Asyncify(),
TrapsNeverHappen(),
CtorEval(),
Merge(),
# FIXME: Re-enable after https://github.com/WebAssembly/binaryen/issues/3989
# RoundtripText()
]
Expand All @@ -1329,7 +1391,7 @@ def test_one(random_input, given_wasm):
randomize_fuzz_settings()
pick_initial_contents()

opts = randomize_opt_flags()
opts = get_random_opts()
print('randomized opts:', '\n ' + '\n '.join(opts))
print()

Expand Down Expand Up @@ -1503,7 +1565,7 @@ def write_commands(commands, filename):
("--type-merging",)}


def randomize_opt_flags():
def get_random_opts():
flag_groups = []
has_flatten = False

Expand Down Expand Up @@ -1643,8 +1705,7 @@ def randomize_opt_flags():
'iters/sec, ', total_wasm_size / elapsed,
'wasm_bytes/sec, ', ignored_vm_runs,
'ignored\n')
with open(raw_input_data, 'wb') as f:
f.write(bytes([random.randint(0, 255) for x in range(input_size)]))
make_random_input(input_size, raw_input_data)
assert os.path.getsize(raw_input_data) == input_size
# remove the generated wasm file, so that we can tell if the fuzzer
# fails to create one
Expand Down
2 changes: 1 addition & 1 deletion scripts/update_help_checks.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@

TOOLS = ['wasm-opt', 'wasm-as', 'wasm-dis', 'wasm2js', 'wasm-ctor-eval',
'wasm-shell', 'wasm-reduce', 'wasm-metadce', 'wasm-split',
'wasm-fuzz-types', 'wasm-emscripten-finalize']
'wasm-fuzz-types', 'wasm-emscripten-finalize', 'wasm-merge']


def main():
Expand Down
48 changes: 42 additions & 6 deletions src/ir/module-utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -132,12 +132,9 @@ inline DataSegment* copyDataSegment(const DataSegment* segment, Module& out) {
return out.addDataSegment(std::move(ret));
}

inline void copyModule(const Module& in, Module& out) {
// we use names throughout, not raw pointers, so simple copying is fine
// for everything *but* expressions
for (auto& curr : in.exports) {
out.addExport(new Export(*curr));
}
// Copies named toplevel module items (things of kind ModuleItemKind). See
// copyModule() for something that also copies exports, the start function, etc.
inline void copyModuleItems(const Module& in, Module& out) {
for (auto& curr : in.functions) {
copyFunction(curr.get(), out);
}
Expand All @@ -159,6 +156,15 @@ inline void copyModule(const Module& in, Module& out) {
for (auto& curr : in.dataSegments) {
copyDataSegment(curr.get(), out);
}
}

inline void copyModule(const Module& in, Module& out) {
// we use names throughout, not raw pointers, so simple copying is fine
// for everything *but* expressions
for (auto& curr : in.exports) {
out.addExport(std::make_unique<Export>(*curr));
}
copyModuleItems(in, out);
out.start = in.start;
out.customSections = in.customSections;
out.debugInfoFileNames = in.debugInfoFileNames;
Expand Down Expand Up @@ -354,6 +360,36 @@ template<typename T> inline void iterImports(Module& wasm, T visitor) {
iterImportedTags(wasm, visitor);
}

// Iterates over all importable module items. The visitor provided should have
// signature void(ExternalKind, Importable*).
template<typename T> inline void iterImportable(Module& wasm, T visitor) {
for (auto& curr : wasm.functions) {
if (curr->imported()) {
visitor(ExternalKind::Function, curr.get());
}
}
for (auto& curr : wasm.tables) {
if (curr->imported()) {
visitor(ExternalKind::Table, curr.get());
}
}
for (auto& curr : wasm.memories) {
if (curr->imported()) {
visitor(ExternalKind::Memory, curr.get());
}
}
for (auto& curr : wasm.globals) {
if (curr->imported()) {
visitor(ExternalKind::Global, curr.get());
}
}
for (auto& curr : wasm.tags) {
if (curr->imported()) {
visitor(ExternalKind::Tag, curr.get());
}
}
}

// Helper class for performing an operation on all the functions in the module,
// in parallel, with an Info object for each one that can contain results of
// some computation that the operation performs.
Expand Down
1 change: 1 addition & 0 deletions src/tools/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ binaryen_add_executable(wasm-ctor-eval wasm-ctor-eval.cpp)
if(NOT BUILD_EMSCRIPTEN_TOOLS_ONLY)
binaryen_add_executable(wasm-shell wasm-shell.cpp)
binaryen_add_executable(wasm-reduce wasm-reduce.cpp)
binaryen_add_executable(wasm-merge wasm-merge.cpp)
binaryen_add_executable(wasm-fuzz-types "${fuzzing_SOURCES};wasm-fuzz-types.cpp")
endif()

Expand Down
Loading

0 comments on commit bd731ae

Please sign in to comment.