Reintroduce wasm-merge #5709

kripken · 2023-05-09T16:56:03Z

We used to have a wasm-merge tool but removed it for a lack of use cases. Recently
use cases have been showing up in the wasm GC space and elsewhere, as people are
using more diverse toolchains together, for example a project might build some C++
code alongside some wasm GC code. Merging those wasm files together can allow
for nice optimizations like inlining and better DCE etc., so it makes sense to have a
tool for merging.

Removal: #1969

Requests:

This PR is a compete rewrite of wasm-merge, not a restoration of the original
codebase. The original code was quite messy (my fault), and also, since then
we've added multi-memory and multi-table which makes things a lot simpler.

The linking semantics are as described in the "wasm-link" issue #2767 : all we do
is merge normal wasm files together and connect imports and export. That is, we
have a graph of modules and their names, and each import to a module name can
be resolved to that module. Basically, like a JS bundler would do for JS, or, in other
words, we do the same operations as JS code would do to glue wasm modules
together at runtime, but at compile time. See the README update in this PR for a
concrete example.

There are no plans to do more than that simple bundling, so this should not
really overlap with wasm-ld's use cases.

This should be fairly fast as it works in linear time on the total input code. However,
it won't be as fast as wasm-ld, of course, as it does build Binaryen IR for each
module. An advantage to working on Binaryen IR is we could optimize right after
merging, but in this first version no optimizations are done yet.

Perhaps this new tool should be given a new name, to differentiate it from the
old wasm-merge? Other name options might be wasm-link or wasm-bundle
perhaps.

kripken · 2023-05-10T23:02:49Z

For the CLI UI for naming modules, I propose that the optional flag --name=foo means the next given input file will be assigned name "foo". Otherwise, if the next given input file has a module name (in the text or in the name section), then that name should be used. Otherwise, the base filename (without extension) of the input file should be used as the module name.

The one maybe odd part there is the mixture of positional and non-positional items. Making one depend on the other is not something I'm familiar with in commandline tools. Or is there precedent?

kripken · 2023-05-11T17:17:16Z

Another option for the commandline UI might be to forgo positional arguments entirely, as they are potentially confusing when mixed with others. We could have this:

wasm-merge --input=a.wasm --name=foo --input=b.wasm --name=bar --output=merged.wasm

Basically we could have --input/-i for inputs like we have --output/-o for the output.

tlively · 2023-05-11T17:30:10Z

One similar use of --flags to affect positional arguments is the --whole-archive and --no-whole-archive linker flags.

tlively · 2023-05-11T18:35:43Z

Code and tests LGTM. I believe the only remaining questions are 1) the CLI UI for naming modules, and 2) what options to provide and what to do by default for conflicting export names.

kripken · 2023-05-11T22:40:23Z

One similar use of --flags to affect positional arguments is the --whole-archive and --no-whole-archive linker flags.

Very good point... I guess there is precedent for such stuff.

I don't really have a strong feeling of what's best for the CLI UI. Perhaps we can bikeshed this on the Monday meeting if there's time.

For handling of export conflicts, do you agree we should have 3 modes like I was getting at before?

Rename to avoid conflicts. (Helps if the user knows and expects conflicts, e.g. a main in each module.)
Error on conflicts. (Avoids surprises.)
Avoid conflicts by keeping the exports of exactly one module. (Best for the main module + side modules use case.)

If so perhaps 2 should be the default, as the least surprising, as you've been suggesting?

tlively · 2023-05-12T13:39:21Z

For handling of export conflicts, do you agree we should have 3 modes like I was getting at before?

Yes! Sounds good.

If so perhaps 2 should be the default, as the least surprising, as you've been suggesting?

That sounds great to me.

kripken · 2023-05-12T19:54:30Z

Ok, 3 modes + new default are now implemented.

tlively · 2023-05-15T18:23:19Z

src/tools/wasm-merge.cpp

+  // this is useful when the first module is the main program and the others are
+  // libraries of code that it uses, but that do not have any exports intended
+  // to be used by anyone other than the main program.
+  KeepOnlyFirstModuleExports,


In general I think it would be more useful for this third mode to collect all the exports and use module order only to resolve conflicts. That would still expose any non-conflicting exports from later modules. The reason this is more useful is that exporting more things than the embedder expects does not cause linkage failures, so this would cover both the use cases covered by KeepOnlyFirstModuleExports as well as use cases that need the other exports to be exposed as well.

Sorry I didn't catch this earlier when you were describing the 3 modes you had in mind!

Interesting. That would allow more things to work. But it would also add more effort for the user in the common case this is meant to address, which is a main module + side modules, since we never want the side module's exports to remain. But maybe that's just more work we should expect the user to do, hmm... they also likely would want to prune some main module exports.

After more thought this does seem better. Changed to that.

tlively

LGTM 🎉 🎉 🎉

test/lit/merge/export_options.wat

Co-authored-by: Thomas Lively <tlively@google.com>

We used to have a wasm-merge tool but removed it for a lack of use cases. Recently use cases have been showing up in the wasm GC space and elsewhere, as people are using more diverse toolchains together, for example a project might build some C++ code alongside some wasm GC code. Merging those wasm files together can allow for nice optimizations like inlining and better DCE etc., so it makes sense to have a tool for merging. Background: * Removal: WebAssembly#1969 * Requests: * wasm-merge - why it has been deleted WebAssembly#2174 * Compiling and linking wat files WebAssembly#2276 * wasm-link? WebAssembly#2767 This PR is a compete rewrite of wasm-merge, not a restoration of the original codebase. The original code was quite messy (my fault), and also, since then we've added multi-memory and multi-table which makes things a lot simpler. The linking semantics are as described in the "wasm-link" issue WebAssembly#2767 : all we do is merge normal wasm files together and connect imports and export. That is, we have a graph of modules and their names, and each import to a module name can be resolved to that module. Basically, like a JS bundler would do for JS, or, in other words, we do the same operations as JS code would do to glue wasm modules together at runtime, but at compile time. See the README update in this PR for a concrete example. There are no plans to do more than that simple bundling, so this should not really overlap with wasm-ld's use cases. This should be fairly fast as it works in linear time on the total input code. However, it won't be as fast as wasm-ld, of course, as it does build Binaryen IR for each module. An advantage to working on Binaryen IR is that we can easily do some global DCE after merging, and further optimizations are possible later.

kripken added 30 commits April 25, 2023 16:36

start.somewhere

82e796d

framework

6f20502

framework

2e891a3

builds

c7d397c

yolo

5957325

refactor

a444694

comment

f9a8209

one file builds

c2c0015

another

35b97b2

work

bb0ab00

work

4ba764b

mre

9a71da9

more

062c9d1

more

3d93ac9

more

579bda8

more

02d1c5a

more

13fbd79

more

6293f52

more

7bf2cd9

more

7250203

more

281b9f4

more

ec225f4

more

1117f19

more

ebfa4fe

more

98a42fb

more

901173a

more

d5c3fc3

more

a19961c

more

09f9b3e

more

797e395

kripken added 2 commits May 10, 2023 15:58

fix english

1e1e7bd

text fixes

4b4619f

kripken added 2 commits May 11, 2023 10:13

reorder to try to match CHECK lines

2c2b132

apply auto-updater output

fbc1419

kripken added 8 commits May 12, 2023 11:16

Merge remote-tracking branch 'origin/main' into wasm-merge-comeback

d1eb2be

work

169140d

forat

7d47af0

work

1a0661d

rename

15e7be7

fix

ce77013

sad

730310c

test

96c5d6a

kripken added 2 commits May 12, 2023 13:13

simplify fuzzer

296e5be

update help

c326d69

tlively reviewed May 15, 2023

View reviewed changes

Switch 3rd option to skip conflicts

acb55eb

tlively approved these changes May 15, 2023

View reviewed changes

test/lit/merge/export_options.wat Outdated Show resolved Hide resolved

kripken and others added 2 commits May 15, 2023 14:58

fix.fuzzer

a539250

Update test/lit/merge/export_options.wat

fc06d9a

Co-authored-by: Thomas Lively <tlively@google.com>

kripken merged commit 972e659 into main May 16, 2023

kripken deleted the wasm-merge-comeback branch May 16, 2023 18:03

munjalpatel mentioned this pull request Jun 28, 2024

Re-write to use Wasmex extism/elixir-sdk#3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reintroduce wasm-merge #5709

Reintroduce wasm-merge #5709

kripken commented May 9, 2023

kripken commented May 10, 2023

kripken commented May 11, 2023

tlively commented May 11, 2023

tlively commented May 11, 2023

kripken commented May 11, 2023

tlively commented May 12, 2023

kripken commented May 12, 2023

tlively May 15, 2023

tlively May 15, 2023

kripken May 15, 2023

kripken May 15, 2023

tlively left a comment

Reintroduce wasm-merge #5709

Reintroduce wasm-merge #5709

Conversation

kripken commented May 9, 2023

kripken commented May 10, 2023

kripken commented May 11, 2023

tlively commented May 11, 2023

tlively commented May 11, 2023

kripken commented May 11, 2023

tlively commented May 12, 2023

kripken commented May 12, 2023

tlively May 15, 2023

Choose a reason for hiding this comment

tlively May 15, 2023

Choose a reason for hiding this comment

kripken May 15, 2023

Choose a reason for hiding this comment

kripken May 15, 2023

Choose a reason for hiding this comment

tlively left a comment

Choose a reason for hiding this comment