Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reintroduce wasm-merge #5709

Merged
merged 228 commits into from
May 16, 2023
Merged

Reintroduce wasm-merge #5709

merged 228 commits into from
May 16, 2023

Conversation

kripken
Copy link
Member

@kripken kripken commented May 9, 2023

We used to have a wasm-merge tool but removed it for a lack of use cases. Recently
use cases have been showing up in the wasm GC space and elsewhere, as people are
using more diverse toolchains together, for example a project might build some C++
code alongside some wasm GC code. Merging those wasm files together can allow
for nice optimizations like inlining and better DCE etc., so it makes sense to have a
tool for merging.

Removal: #1969

Requests:

This PR is a compete rewrite of wasm-merge, not a restoration of the original
codebase. The original code was quite messy (my fault), and also, since then
we've added multi-memory and multi-table which makes things a lot simpler.

The linking semantics are as described in the "wasm-link" issue #2767 : all we do
is merge normal wasm files together and connect imports and export. That is, we
have a graph of modules and their names, and each import to a module name can
be resolved to that module. Basically, like a JS bundler would do for JS, or, in other
words, we do the same operations as JS code would do to glue wasm modules
together at runtime, but at compile time. See the README update in this PR for a
concrete example.

There are no plans to do more than that simple bundling, so this should not
really overlap with wasm-ld's use cases.

This should be fairly fast as it works in linear time on the total input code. However,
it won't be as fast as wasm-ld, of course, as it does build Binaryen IR for each
module. An advantage to working on Binaryen IR is we could optimize right after
merging, but in this first version no optimizations are done yet.

Perhaps this new tool should be given a new name, to differentiate it from the
old wasm-merge? Other name options might be wasm-link or wasm-bundle
perhaps.

@kripken
Copy link
Member Author

kripken commented May 10, 2023

For the CLI UI for naming modules, I propose that the optional flag --name=foo means the next given input file will be assigned name "foo". Otherwise, if the next given input file has a module name (in the text or in the name section), then that name should be used. Otherwise, the base filename (without extension) of the input file should be used as the module name.

The one maybe odd part there is the mixture of positional and non-positional items. Making one depend on the other is not something I'm familiar with in commandline tools. Or is there precedent?

@kripken
Copy link
Member Author

kripken commented May 11, 2023

Another option for the commandline UI might be to forgo positional arguments entirely, as they are potentially confusing when mixed with others. We could have this:

wasm-merge --input=a.wasm --name=foo --input=b.wasm --name=bar --output=merged.wasm

Basically we could have --input/-i for inputs like we have --output/-o for the output.

@tlively
Copy link
Member

tlively commented May 11, 2023

One similar use of --flags to affect positional arguments is the --whole-archive and --no-whole-archive linker flags.

@tlively
Copy link
Member

tlively commented May 11, 2023

Code and tests LGTM. I believe the only remaining questions are 1) the CLI UI for naming modules, and 2) what options to provide and what to do by default for conflicting export names.

@kripken
Copy link
Member Author

kripken commented May 11, 2023

One similar use of --flags to affect positional arguments is the --whole-archive and --no-whole-archive linker flags.

Very good point... I guess there is precedent for such stuff.

I don't really have a strong feeling of what's best for the CLI UI. Perhaps we can bikeshed this on the Monday meeting if there's time.

For handling of export conflicts, do you agree we should have 3 modes like I was getting at before?

  1. Rename to avoid conflicts. (Helps if the user knows and expects conflicts, e.g. a main in each module.)
  2. Error on conflicts. (Avoids surprises.)
  3. Avoid conflicts by keeping the exports of exactly one module. (Best for the main module + side modules use case.)

If so perhaps 2 should be the default, as the least surprising, as you've been suggesting?

@tlively
Copy link
Member

tlively commented May 12, 2023

For handling of export conflicts, do you agree we should have 3 modes like I was getting at before?

Yes! Sounds good.

If so perhaps 2 should be the default, as the least surprising, as you've been suggesting?

That sounds great to me.

@kripken
Copy link
Member Author

kripken commented May 12, 2023

Ok, 3 modes + new default are now implemented.

// this is useful when the first module is the main program and the others are
// libraries of code that it uses, but that do not have any exports intended
// to be used by anyone other than the main program.
KeepOnlyFirstModuleExports,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think it would be more useful for this third mode to collect all the exports and use module order only to resolve conflicts. That would still expose any non-conflicting exports from later modules. The reason this is more useful is that exporting more things than the embedder expects does not cause linkage failures, so this would cover both the use cases covered by KeepOnlyFirstModuleExports as well as use cases that need the other exports to be exposed as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't catch this earlier when you were describing the 3 modes you had in mind!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. That would allow more things to work. But it would also add more effort for the user in the common case this is meant to address, which is a main module + side modules, since we never want the side module's exports to remain. But maybe that's just more work we should expect the user to do, hmm... they also likely would want to prune some main module exports.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After more thought this does seem better. Changed to that.

Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉 🎉 🎉

test/lit/merge/export_options.wat Outdated Show resolved Hide resolved
kripken and others added 2 commits May 15, 2023 14:58
Co-authored-by: Thomas Lively <tlively@google.com>
@kripken kripken merged commit 972e659 into main May 16, 2023
@kripken kripken deleted the wasm-merge-comeback branch May 16, 2023 18:03
radekdoulik pushed a commit to dotnet/binaryen that referenced this pull request Jul 12, 2024
We used to have a wasm-merge tool but removed it for a lack of use cases. Recently
use cases have been showing up in the wasm GC space and elsewhere, as people are
using more diverse toolchains together, for example a project might build some C++
code alongside some wasm GC code. Merging those wasm files together can allow
for nice optimizations like inlining and better DCE etc., so it makes sense to have a
tool for merging.

Background:
* Removal: WebAssembly#1969
* Requests:
  * wasm-merge - why it has been deleted WebAssembly#2174
  * Compiling and linking wat files WebAssembly#2276
  * wasm-link? WebAssembly#2767

This PR is a compete rewrite of wasm-merge, not a restoration of the original
codebase. The original code was quite messy (my fault), and also, since then
we've added multi-memory and multi-table which makes things a lot simpler.

The linking semantics are as described in the "wasm-link" issue WebAssembly#2767 : all we do
is merge normal wasm files together and connect imports and export. That is, we
have a graph of modules and their names, and each import to a module name can
be resolved to that module. Basically, like a JS bundler would do for JS, or, in other
words, we do the same operations as JS code would do to glue wasm modules
together at runtime, but at compile time. See the README update in this PR for a
concrete example.

There are no plans to do more than that simple bundling, so this should not
really overlap with wasm-ld's use cases.

This should be fairly fast as it works in linear time on the total input code. However,
it won't be as fast as wasm-ld, of course, as it does build Binaryen IR for each
module. An advantage to working on Binaryen IR is that we can easily do some
global DCE after merging, and further optimizations are possible later.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants