Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm-link? #2767

Closed
kripken opened this issue Apr 15, 2020 · 27 comments
Closed

wasm-link? #2767

kripken opened this issue Apr 15, 2020 · 27 comments

Comments

@kripken
Copy link
Member

kripken commented Apr 15, 2020

In the past we had wasm-merge and then removed it for lack of use cases and the belief that lld would generally be the linker for everything. However, the need for something like wasm-merge has come up since then more than once, and I think I have an idea for a new design and goal for it.

Goal

Do the same things conceptually that a wasm VM would do at the "link" stage, but as a toolchain tool at compile time. That is, instead of shipping multiple wasm files and linking them at runtime on the client, you can link them at compile time. Hence I suggest the name wasm-link.

edit: To clarify, what I mean by "what a wasm VM would do" is "what would happen if a wasm VM were given a set of modules and names, and instantiated the first, then instantiated the second with the first imported with its name, then instantiated the third with the first two imported, and so forth." That is, the most simple and naive linking possible of wasm modules, without any special things done in the middle by JS loader code or anything like that.

Concretely, that means:

  • This runs on final wasm modules, not on wasm object files.
  • Each input file also has the module name it is imported as in other modules in this link - same as the VM would have.
  • In general, all we do is to concatenate, basically, keeping the modules unchanged in their behavior but linking them into one big module. But we also do what the VM would do at wasm link time, like match up imports in one module to exports in the other:
(module ;; imported with module name "one"
  (export "foo" (func $internal_foo))
  (func $internal_foo ..)
)

+

(module ;; imported with some other module name
  (import "one" "foo" (func $foo))
  (func $call-to-foo
    (call $foo) 
  )
)

== linked to =>

(module
  (func $internal_foo ..)
  (func $call-to-foo
    (call $internal_foo) 
  )
)
  • In the future interface types will give us a lot more to do here, like fuse lifting/lowering code etc.
  • Note that if more than one input module has a memory or a table, multi-memory/multi-table are necessary, we need to handle duplicate internal names, etc.

Use cases

  • The specific use case I have myself atm is I want to explore replacing JS glue code in Emscripten with wasm code using reftypes. For now I'm experimenting with handwritten wat files, and I need a way to link those in.
  • A JS bundler does similar things to JS, and I remember plans to do the same for wasm files there. I'm not sure if that's been done in any of them or not (lack of multiple memories likely prevents it so far). I think it would be good to have a simple standalone tool in binaryen for this, that could perhaps be used by bundlers if they want.
  • Linking code from different toolchains. Normally wasm object files and wasm-ld are what we want, but not all toolchains use wasm object files, like Go and AssemblyScript. And a single web page may contain code from multiple toolchains, so such bundling makes sense.
  • Optimizing fused lifting/lowering code from interface types. I imagine there will be cases where the fused code doesn't just get optimized away by design, and there is actual work to be done. By linking at compile time we can do more optimizations (more complex ones, ones slower to run, etc.) than the VM would do at runtime.
@tlively
Copy link
Member

tlively commented Apr 15, 2020

SGTM. Agree this will be useful, especially as IT and multiple memories are implemented. Do we know how this might relate (or not) to @lukewagner's linking proposal?

The specific use case I have myself atm is I want to explore replacing JS glue code in Emscripten with wasm code using reftypes. For now I'm experimenting with handwritten wat files, and I need a way to link those in.

This is somewhat separate, but have you considered using .s files written in the LLVM assembly format? It would be good to dogfood using that format if we could, because that's how we expect end users of LLVM-based toolchains to incorporate hand-written wasm at the moment. I would be happy to help get the instructions you need into the LLVM backend so they can be used in the .s format.

@kripken
Copy link
Member Author

kripken commented Apr 15, 2020

@tlively Yeah, @dschuff mentioned .s files to me offline earlier - it's one of the options I'll look into. It's slightly less convenient atm since I am building a project normally to a final wasm, and then modifying the output wasm file and nothing else. But .s files may end up better later.

@sbc100
Copy link
Member

sbc100 commented Apr 15, 2020

Would it be useful (as well) to be able to merge two wasm files into one without actually linking any of their imports or exports? (such a tool might more logically be called wasm-merge I guess :)

This would require import/export name re-writing to avoid collision, but would be "no-link" mode that would maintain the trust boundary between the two modules. A runtime linker could then decide if it really want to connect the two modules or keep them completely isolated.

@kripken
Copy link
Member Author

kripken commented Apr 15, 2020

@sbc100 Hmm, then maybe there could be two tools, wasm-merge which just "concatenates" but nothing more, and wasm-link which takes a single wasm module and does internal linking?

One issue with that is that I'm not sure it's enough for the second tool wasm-link to take a single module as the argument. For example if we are asked to merge two modules with the same export name, what would wasm-merge do? However it disambiguates that, wasm-link would need to be aware of the original modules names and so forth (which it would if it did the concatenation itself). Likewise, if we merge modules with interface types, what would the output of wasm-merge look like for wasm-link to optimize?

So maybe it should just be a single tool, with an option of doing the internal linking or not.

@sbc100
Copy link
Member

sbc100 commented Apr 15, 2020

Yes, I was thinking "internally-link" vs "no-linking" as two different modes of the same tool.

@kripken
Copy link
Member Author

kripken commented Apr 15, 2020

How about wasm-link --no-internal for the mode without internal linking?

@tlively
Copy link
Member

tlively commented Apr 15, 2020

I would prefer wasm-merge --link or wasm-link --no-link, which I think is more descriptive than --no-internal, but I don't feel strongly about it.

@sbc100
Copy link
Member

sbc100 commented Apr 15, 2020

We could also have wasm-link and wasm-merge be the same binary like clang and clang++, but this discussion is really putting the shed before the bike.

@LouisStAmour
Copy link

LouisStAmour commented Jun 15, 2020

Not sure if this is an active discussion, but I was hoping I'd be able to write a WASM Envoy module following https://github.com/proxy-wasm that would in turn be able to use (merge with?) wasm functions from https://www.openpolicyagent.org/docs/latest/wasm/ to avoid having to make extra network calls for each policy decision.

But I'm having trouble figuring out how :) The closest I can see might be SIDE_MODULE support if I want to try and keep dynamic linking of one WASM for another. But it's unclear which implementations support "side modules" and I should probably make the assumption that they're not supported by Envoy. If so, everything has to be merged into one module, and thus at build time of one wasm, I'd have to merge/wrap it in functionality to produce another wasm.

Both wasm files have different export function name prefixes, so from the outside this sounds like an easy thing to do, but ... it's unclear to me how often folks have done this or where to get started. It's a bit odd to me that WASM can support so many different languages as source targets, but seemingly not statically linking or calling wasm itself from such languages when producing wasm as an output. :)

@sbc100
Copy link
Member

sbc100 commented Jun 15, 2020

Is there any reason not to just use the static linker (wasm-ld) in this case? The only limitation with the static linker is that the inputs have to be wasm object files .. is that not poassible?

Can you not build your two programs a ar archives (.a files) or just object files (you can use wasm-ld --relocatable to combine many object files into one if that helps) and then use wasm-ld to link them together.

@LouisStAmour
Copy link

I guess I’ll have to investigate... I’m relatively new to uses of wasm outside the browser. In the case of OPA, the source for the .wasm it produces appears to be Go, so it’s unclear how compatible the .o file from Go would be and how to produce a wasm-compatible object file. I suppose I’ll have to experiment...

@LouisStAmour
Copy link

My use case will have to wait a bit to be practical it seems. After further investigation, https://github.com/envoyproxy/envoy-wasm is still under active development including a new ABI and remote deployment mechanisms with Istio also improving. I'll prototype without WASM on my end (using network calls instead) and revisit this later.

@pannous
Copy link

pannous commented Nov 29, 2020

@sbc100 the static linker wasm-ld requires elaborate custom linking sections and did not produce the desired output for me.

While renumbering the functions, updating the call indices apparently can break:
"R_WASM_FUNCTION_INDEX_LEB relocations may fail to be processed, in which case linking fails."

Generalized linking is also part of the wasm roadmap

@sbc100
Copy link
Member

sbc100 commented Nov 29, 2020

@sbc100 the static linker wasm-ld requires elaborate custom linking sections and did not produce the desired output for me.

If there are bugs in wasm-ld then we want to know about them. So far wasi-sdk and emscripten and rust all use wasm-ld and there are no known major outstanding issues. Of course they all use llvm so it easy for them to produce the object file metadata. Perhaps you can describe what you are trying to do and why its not working? For sure wasm-ld requires extra metadata, but that is because the wasm module system is not powerful enough to express everything that a static linker needs. For example, we want data segments to be statically relocatable which requires relocation information. We also want lld to be fast which means we don't want to pay the cost of disassembly all the instruction to find all the function call sites (another require for requiring relocation information).

While renumbering the functions, updating the call indices apparently can break:
"R_WASM_FUNCTION_INDEX_LEB relocations may fail to be processed, in which case linking fails."

Are you running into this specific issue regarding weak symbols?

Generalized linking is also part of the wasm roadmap

@pannous
Copy link

pannous commented Nov 29, 2020

Nevermind, we resolved it by removing parameters from the linker:
clang --target=wasm32 -nostdlib -Wl,--export-all,--relocatable,--no-entry,--shared -o lib.wasm lib.c
=>
clang --target=wasm32 -nostdlib -Wl,--relocatable -o lib.wasm lib.c

Trying to link a minimal example https://github.com/pannous/test-lld-wasm now works.

@abrown
Copy link

abrown commented Aug 31, 2021

I had a need for the tool this issue would provide and was discussing it over in the AssemblyScript repository: AssemblyScript/assemblyscript#2045. I floated an idea for a static, naive linker (like wasm-link?) that would not require the relocation sections but @dcodeIO mentioned that adding Wasm object file support to Binaryen might be another option. Is there any strong preference here between:

  • building wasm-link (@kripken, any recent developments on that?)
  • giving Binaryen the ability to output the necessary Wasm object file sections for wasm-ld to just work

@srenatus
Copy link

srenatus commented Sep 1, 2021

I think this might be relevant but want mentioned before: https://github.com/bytecodealliance/witx-bindgen/tree/main/crates/wasmlink

@kripken
Copy link
Member Author

kripken commented Sep 1, 2021

@abrown

I think building wasm-link would be pretty straightforward. I don't have an urgent need for this myself so I'm not planning to work on it soon, but I'd be very happy to review a PR for it!

Adding object file support would be significantly more work, as the relocations require IR changes. I actually wrote some notes on this a while back, and I'm not sure where I posted them, but attached is a PDF.
Binaryen Object File Support_.pdf

@srenatus Thanks for the link! Looks like that is focused on Module Linking and Interface Types, but perhaps it could be reused here - I'd expect we'd need to emit Module Linking logic for that linker to process, though, which might be more work than wasm-link itself. But it might be worth discussing with the devs there.

@dbanks12
Copy link

dbanks12 commented Jan 4, 2023

Has there been any progress here? If I have one project/library and I want to compile it to later be ingested by a dependent project with wasm bindings (functions with "default" visibility) of both the library and dependent project exposed, what should my process be? Would I use wasm-ld as mentioned above?

Apologies if my terminology is poor, I am a wasm noob.

@sbc100
Copy link
Member

sbc100 commented Jan 4, 2023

I think that sounds like a use case for normal emcc/wasm-ld style linking and use of wasm object files or libraries of wasm object files. Can your library be build as a library of object files?

@dbanks12
Copy link

dbanks12 commented Jan 5, 2023

I think that sounds like a use case for normal emcc/wasm-ld style linking and use of wasm object files or libraries of wasm object files. Can your library be build as a library of object files?

I am trying to build a library of object files, but struggling to do so. I am using wasi-sdk and cmake, and cannot figure out how to export wasm object files to be used by my dependent project.

Sorry, I am not sure where the right place is to have this conversation, but it might not be here. Please let me know if there is a better place for me to get help on this! Thanks.

@pannous
Copy link

pannous commented Jan 5, 2023

Until such a tool resurfaces again you can try the steps here:

https://github.com/pannous/test-lld-wasm

If the object binaries contain a relocation section OR are internally relocatable (by having nop spacers around calls and loads) you can also try wasp main.wasm lib.wasm

@sbc100
Copy link
Member

sbc100 commented Jan 5, 2023

I think that sounds like a use case for normal emcc/wasm-ld style linking and use of wasm object files or libraries of wasm object files. Can your library be build as a library of object files?

I am trying to build a library of object files, but struggling to do so. I am using wasi-sdk and cmake, and cannot figure out how to export wasm object files to be used by my dependent project.

Sorry, I am not sure where the right place is to have this conversation, but it might not be here. Please let me know if there is a better place for me to get help on this! Thanks.

Simply using cmake's normal static library construct should work fine. All static libraries are libraries of object files.

@tonibofarull
Copy link

Compile the library with wasi-sdk as well!

Example CMakeLists.txt

cmake_minimum_required(VERSION 2.8.12)

project(lib)

set(CMAKE_C_COMPILER    "/opt/wasi-sdk/bin/clang")
set(CMAKE_CXX_COMPILER  "/opt/wasi-sdk/bin/clang++")

add_library(lib lib.c)

And the following code for combining the library with main.c,

/opt/wasi-sdk/bin/ranlib lib/build/liblib.a

/opt/wasi-sdk/bin/clang \
    -Wl,--allow-undefined \
    -o main.wasm main.c lib/build/liblib.a \
    -I./lib

Running ranlib is needed, otherwise,

wasm-ld: error: lib/build/liblib.a: archive has no index; run ranlib to add one
clang-14: error: linker command failed with exit code 1 (use -v to see invocation)

Let me know if you need sample files.

@sbc100
Copy link
Member

sbc100 commented Jan 5, 2023

You also need to override the AR and RANLIB tools, not just C/CXX_COMPILER. wasi-sdk has a toolchain file that does this so you shouldn't need to: https://github.com/WebAssembly/wasi-sdk/blob/cee312d6d0561f302d79f432135bd2662d17862d/wasi-sdk.cmake#L17-L22

@tonibofarull
Copy link

You also need to override the AR and RANLIB tools, not just C/CXX_COMPILER. wasi-sdk has a toolchain file that does this so you shouldn't need to: https://github.com/WebAssembly/wasi-sdk/blob/cee312d6d0561f302d79f432135bd2662d17862d/wasi-sdk.cmake#L17-L22

Quite useful, thanks!

kripken added a commit that referenced this issue May 16, 2023
We used to have a wasm-merge tool but removed it for a lack of use cases. Recently
use cases have been showing up in the wasm GC space and elsewhere, as people are
using more diverse toolchains together, for example a project might build some C++
code alongside some wasm GC code. Merging those wasm files together can allow
for nice optimizations like inlining and better DCE etc., so it makes sense to have a
tool for merging.

Background:
* Removal: #1969
* Requests:
  * wasm-merge - why it has been deleted #2174
  * Compiling and linking wat files #2276
  * wasm-link? #2767

This PR is a compete rewrite of wasm-merge, not a restoration of the original
codebase. The original code was quite messy (my fault), and also, since then
we've added multi-memory and multi-table which makes things a lot simpler.

The linking semantics are as described in the "wasm-link" issue #2767 : all we do
is merge normal wasm files together and connect imports and export. That is, we
have a graph of modules and their names, and each import to a module name can
be resolved to that module. Basically, like a JS bundler would do for JS, or, in other
words, we do the same operations as JS code would do to glue wasm modules
together at runtime, but at compile time. See the README update in this PR for a
concrete example.

There are no plans to do more than that simple bundling, so this should not
really overlap with wasm-ld's use cases.

This should be fairly fast as it works in linear time on the total input code. However,
it won't be as fast as wasm-ld, of course, as it does build Binaryen IR for each
module. An advantage to working on Binaryen IR is that we can easily do some
global DCE after merging, and further optimizations are possible later.
radekdoulik pushed a commit to dotnet/binaryen that referenced this issue Jul 12, 2024
We used to have a wasm-merge tool but removed it for a lack of use cases. Recently
use cases have been showing up in the wasm GC space and elsewhere, as people are
using more diverse toolchains together, for example a project might build some C++
code alongside some wasm GC code. Merging those wasm files together can allow
for nice optimizations like inlining and better DCE etc., so it makes sense to have a
tool for merging.

Background:
* Removal: WebAssembly#1969
* Requests:
  * wasm-merge - why it has been deleted WebAssembly#2174
  * Compiling and linking wat files WebAssembly#2276
  * wasm-link? WebAssembly#2767

This PR is a compete rewrite of wasm-merge, not a restoration of the original
codebase. The original code was quite messy (my fault), and also, since then
we've added multi-memory and multi-table which makes things a lot simpler.

The linking semantics are as described in the "wasm-link" issue WebAssembly#2767 : all we do
is merge normal wasm files together and connect imports and export. That is, we
have a graph of modules and their names, and each import to a module name can
be resolved to that module. Basically, like a JS bundler would do for JS, or, in other
words, we do the same operations as JS code would do to glue wasm modules
together at runtime, but at compile time. See the README update in this PR for a
concrete example.

There are no plans to do more than that simple bundling, so this should not
really overlap with wasm-ld's use cases.

This should be fairly fast as it works in linear time on the total input code. However,
it won't be as fast as wasm-ld, of course, as it does build Binaryen IR for each
module. An advantage to working on Binaryen IR is that we can easily do some
global DCE after merging, and further optimizations are possible later.
@tlively
Copy link
Member

tlively commented Jan 13, 2025

wasm-merge is back and is here to stay.

@tlively tlively closed this as completed Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants