Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support WebAssembly Reference Types #10491

Open
leroycep opened this issue Jan 2, 2022 · 11 comments
Open

Support WebAssembly Reference Types #10491

leroycep opened this issue Jan 2, 2022 · 11 comments
Labels
arch-wasm 32-bit and 64-bit WebAssembly enhancement Solving this issue will likely involve adding new logic or components to the codebase. frontend Tokenization, parsing, AstGen, Sema, and Liveness. use case Describes a real use case that is difficult or impossible, but does not propose a solution.
Milestone

Comments

@leroycep
Copy link
Contributor

leroycep commented Jan 2, 2022

WebAssembly Reference Types are supported in most WebAssembly runtimes at the moment, and they make it easier to interoperate with the host runtime.

On the Discord: https://discord.com/channels/605571803288698900/

Stephen Solka#3548
Does zig's wasm target support reference objects? https://github.com/WebAssembly/reference-types/blob/master/proposals/reference-types/Overview.md I tried to figure it out by searching the code base for the code to declare these types externref. I hit this commit that upstreamed this external linker f56ae69 landing in 0.9.0 its not clear if this is exposed at the language level to be used by people using zig for wasm. Im trying to figure out the "right way" to pass JS objects to zig wasm.

Stephen Solka#3548
This is rust's bindgen ref types implementation https://rustwasm.github.io/wasm-bindgen/reference/reference-types.html

Later in the thread: https://discord.com/channels/605571803288698900/922695973623443466/927281011618873407

@Luukdegram
Hmm, I'm afraid there's no such thing yet. A lot of the stuff is currently in my head, as I have to implement it for the wasm backend anyway. As LLVM does support this, we could support this once the llvm backend of the selfhosted compiler is finished, which is targeted for 0.10.0. We will have to implement the wasm-specific address spaces though, so that will probably be after 0.10.0.

A roadmap in general is probably a good idea, but the selfhosted compiler is in such a high-speed development stage right now, I'd prefer to wait a bit until we have a more solid base. I'll add this point to my personal TODO 😉

As mentioned by Luuk, this feature will need address spaces to be implemented (see #653).

For now we can pass in reference as integer handles to index into an array or a hashmap.

@andrewrk andrewrk added arch-wasm 32-bit and 64-bit WebAssembly enhancement Solving this issue will likely involve adding new logic or components to the codebase. frontend Tokenization, parsing, AstGen, Sema, and Liveness. labels Feb 2, 2022
@andrewrk andrewrk added this to the 0.11.0 milestone Feb 2, 2022
@munrocket
Copy link

munrocket commented Feb 3, 2022

@leroycep @kubkon how you expecting this will be looks like in zig?

const Externref = *opaque{}; // like this?
const Externref = anytype; // or this?
// or another syntax?

I am tried this and it's not working right now. Even if externref exist in wasm.zig, it not exist in codegen.zig/air.zig/etc. So basically it's not used right now. However it's exist in LLVM and adding reference types support looks pretty easy.

Here a minimal example that should print WebGLProgram in browser console (right now it's printing 0 if you will run in console npm run build && npm start).

//gl.zig

pub const Externref = *opaque{};

pub const GLenum = u32;
pub const WebGLShader = Externref;
pub const DOMString = [*]const u8;
pub const VERTEX_SHADER: GLenum = 0x8B31;

pub extern "gl" fn createShader(t: GLenum) WebGLShader;
//console.zig

const gl = @import("gl.zig");

pub extern "console" fn log(_: gl.WebGLShader) void;
pub extern "console" fn logF(_: f32) void;
pub extern "console" fn logI(_: c_int) void;
//main.zig

const console = @import("console.zig");
const gl = @import("gl.zig");

export fn main() i32 {

  console.logI(123);
  console.log(gl.createShader(gl.FRAGMENT_SHADER));

  return 0;
}
<!--index.html-->

<!DOCTYPE html>
<html>
<head>
  <link rel="icon" href="data:;base64,iVBORw0KGgo=">
  <title>Test</title>
</head>
<body>
  <canvas id="c"></canvas>
  <script type="module" src="main.js"></script>
</body>
</html>
//main.js

const canvas = document.getElementById('c');
const gl = canvas.getContext('webgl');

const imports = {
  console: {
    log(r) { return console.log(r) },
    logI(i) { return console.log(i) }
  },
  gl: { createShader(t) { return gl.createShader(t); }}
}

WebAssembly.instantiateStreaming(fetch('../main.wasm'), imports).then(obj => {
  const wasm = obj.instance.exports;
  wasm.main();
})
//package.json

{
  "scripts": {
    "build": "zig build-lib main.zig -target wasm32-freestanding -dynamic -OReleaseSmall",
    "start": "npx servez"
  },
  "devDependencies": {
    "servez": "^1.12.1"
  }
}

@leroycep
Copy link
Contributor Author

leroycep commented Feb 4, 2022

The plan is to use address spaces (issue #653) for WASM externrefs. That would look something like this:

// gl.zig

// `.webref` is just a random name I chose, not likely to be the actual thing
pub extern "gl" fn createShader(t: GLenum)  *addrspace(.webref) WebGLShader;
const WebGLShader = opaque{
    pub extern "gl" fn shaderSource(this: *addrspace(.webref) WebGLShader, source: [*]const u8, sourceLen: usize) void;
};
// main.zig

const gl = @import("gl.zig");

const SHADER_SOURCE =
    \\ very clever fragment shader here
;

export fn main() i32 {
  const shader = gl.createShader(gl.FRAGMENT_SHADER);
  shader.shaderSource(SHADER_SOURCE.ptr, SHADER_SOURCE.len);

  return 0;
}

The address space proposal hasn't been finalized, far as I can tell, so it will end up looking a bit different from this.

@munrocket

This comment was marked as outdated.

@Luukdegram
Copy link
Member

@Luukdegram what do you think?

Think of what, exactly? I see a lot of noise here, but no concrete idea of how you want to solve this.
There's a lot to consider to fully support this use case:

  • Linking with C libraries - When building Zig code with Wasm as a target, you may want to link with existing C libraries. This means we must generate object files that support such use cases. Globals of type Opaque{} (or anyopaque for that matter) will generate a symbol for the Data section. However, externref symbols belong in the table section. This means that we cannot re-use the exact same syntax for both use cases, as they are incompatible with each other, as for wasm the symbols are typed and the linker will reject them when they resolve to incompatible types.
  • To generate the correct object files, we need to tell LLVM how to emit those. Currently, we do not tell LLVM at all that we want an externref, rather than a data symbol.
  • As mentioned above, we cannot re-use the same syntax, so a decision must be made on how we want to represent this use case using Zig's syntax. Some quick examples could be:
    • Using addressspaces: extern var foo: *anyopaque (.externref);
    • When defining the library name as non-C such as: extern "MyWasmEnvironment" var foo = opaque{};

For the LLVM backend, we can then emit whatever it wants, and do our own thing in the wasm backend. As long as they generate semantically correct behavior we want.

Don't get me wrong. I fully support this use case and would like to see this supported in Zig, but it's not as simple as you seem to portray. I don't think we should rush support for this and should carefully consider all options. Personally, it isn't high on my TODO list right now, as stage2 is far along and I'd like to support Wasm's MVP in the wasm backend before considering the additional proposals and features.

Also, note that I'm not part of the core team. While I can and will provide my input to the core team, I'm in no position to make a decision on this.

@andrewrk andrewrk added the use case Describes a real use case that is difficult or impossible, but does not propose a solution. label Feb 4, 2022
@munrocket
Copy link

munrocket commented Feb 6, 2022

Sorry, I am just tried to fix it by myself (was little bit naive here) and also attached an example that somebody can use as a reference test for implementation.

Use case: I want to make web engine like three.js, that's why I need to make fully compatibe WebAPI for audio, graphics (including new backends) and mouse events. I will do it with codegenerator that can be reused later for another APIs in another zig projects. Linking with C libraries not in a first priority, because right now ecosystem and tooling is more important. For example we also need manually create a glue for fetch/SetTimeout/reqeustAnimationFrame/performance.now().

So the reasons why I am considering Reference Types in zig:

  • with it JS glue become much smaller and whole application faster, because it will be almost a native call to a browser.
  • reference types already supported by all browsers https://webassembly.org/roadmap/

For those who trying to implement glue in old style it will be a x6 more work and will become legacy later.

I fully support this use case and would like to see this supported in Zig, but it's not as simple as you seem to portray.

@Luukdegram thank you for detailed response, you 100% right I am rushed here. But if someone will create experimental version with memory leak it will be helpful, because building ecosystem it's little bit orthogonal work.

@Pyrolistical
Copy link
Contributor

In the meantime, the workaround is the pass an unsecure i32 pointer which is a lookup key in JS land.

@codefromthecrypt
Copy link
Contributor

👍 and while undocumented anywhere as a common practice (AFAICT) this is the way a lot of things do it, regardless of if the host is JS or not. ex say it is a "context" object, there would be a context ID as i32, and the host makes sure this isn't actually mapped to memory, rather a lookup table. That way if some code manipulates it unsafely, they fail to crash anything.

It is still insecure in so far as someone can possibly guess another session's ID, if they are in the same module instance, but then again wasm modules are not safe for concurrent use and removing context (clearing the key and the memory) before adding one back to the pool can prevent leaks.

Take above as grain of salt because I don't work in wasm security, just things I noticed in how things work outside JS.

@gcoakes
Copy link
Contributor

gcoakes commented Oct 9, 2022

I've got a few questions about this ticket:

  1. Is address space fully implemented? It's parsed and fed into LLVM as far as I can tell. Considering that more pointer metadata: address spaces #653 is still open, I'm uncertain if it is complete.
  2. Is someone already working on it?
  3. Does my general battle plan seem correct?
    1. Rename std.wasm.Valtype to NumericType.
    2. Create std.wasm.ValueType as a tagged union of std.wasm.RefType and the above (leaving the possibility for VectorType in the future).
    3. Replace all uses of Valtype with the above union.
    4. Add all Reference Instructions1 to src/arch/wasm/Mir.zig.
    5. Add .host to std.builtin.AddressSpace.
    6. In src/arch/wasm/CodeGen.zig, convert *addrspace(.host) anyopaque to ValueType{.RefType = .externref} anywhere it might be found (function params, instructions).
  4. Where should I be looking in the stage1 compiler in order to make these changes?

(Please don't take this as a commitment to actually implement it. This isn't my day job, and my attention span for hobby work tends to be short.)

Footnotes

  1. https://webassembly.github.io/spec/core/syntax/instructions.html#reference-instructions

@Luukdegram
Copy link
Member

I've got a few questions about this ticket:

  1. Is address space fully implemented? It's parsed and fed into LLVM as far as I can tell. Considering that more pointer metadata: address spaces #653 is still open, I'm uncertain if it is complete.

  2. Is someone already working on it?

  3. Does my general battle plan seem correct?

    1. Rename std.wasm.Valtype to NumericType.
    2. Create std.wasm.ValueType as a tagged union of std.wasm.RefType and the above (leaving the possibility for VectorType in the future).
    3. Replace all uses of Valtype with the above union.
    4. Add all Reference Instructions1 to src/arch/wasm/Mir.zig.
    5. Add .host to std.builtin.AddressSpace.
    6. In src/arch/wasm/CodeGen.zig, convert *addrspace(.host) anyopaque to ValueType{.RefType = .externref} anywhere it might be found (function params, instructions).
  4. Where should I be looking in the stage1 compiler in order to make these changes?

(Please don't take this as a commitment to actually implement it. This isn't my day job, and my attention span for hobby work tends to be short.)

Footnotes

  1. https://webassembly.github.io/spec/core/syntax/instructions.html#reference-instructions

Before answering your questions, I'd like to bring to your attention that no decision has been made yet with regard to the syntax or whether it's even possible to integrate the external reference feature into Zig at all. Such a decision isn't very straightforward as there are many cases to consider before this can be accepted. e.g. what should be the behavior when someone tries to @ptrCast such a type? Therefore, implementing this right now is not possible and is also the reason why the work hasn't been started yet.
However, I'll still happily answer your questions:

  1. Address spaces are not fully implemented yet. However, some work has been done to support certain use cases.
  2. No; see my remark above.
  3. Your plan seems to target the native WebAssembly backend. This backend will only be used for debug mode in the future. It's also incomplete right now, which means it isn't being used outside of implementing the backend. Instead, this should probably be implemented in the LLVM backend first. The battleplan does have the basics correct for the native WebAssembly backend, but is still missing many edge cases such as updating other instructions to use is_null for example when the type is an External Reference.
  4. It is not worthwhile to implement this in the stage1 compiler. Stage2 is the new default, and stage1 will be removed in the future. Address spaces also aren't implemented at all within the stage1 compiler.

@gcoakes
Copy link
Contributor

gcoakes commented Oct 10, 2022

Thank you for your response.

this should probably be implemented in the LLVM backend first

Took me an hour or so to realize this... I implemented a naive version of the native changes and was very confused until I noticed one little line where it switched to the LLVM backend.

I have some notes I've taken since my last comment that I don't want to go to waste. @Luukdegram, though I suspect I'm just telling you things you already know, I hope someone will find them useful:

  • externref support isn't even complete in LLVM. There's some stuff merged (some kind of llvm specific assembly language for wasm), but most is wrapped up in: https://reviews.llvm.org/D122215 It is mostly for clang support, but it also adds the wasm_externref address space.
  • externref doesn't strictly map to a pointer. If compiler support is to be added, it would likely need either a whole new primitive (clang appears to be doing this) or limitations on instructions used with specific address spaces. It's not possible to compare them or store them outside of the stack. i.e.: the following wat is invalid:
(module
	(func (export "eq_ref") (param externref externref) (result i32)
		local.get 0
		local.get 1
		eq
	)
)
  • Rust's wasm-bindgen tool has support, but actually does this by post processing the wasm file. Instead of the compiler natively understanding externref's, it instead just uses them at the "fringes" of the module.

@jedisct1
Copy link
Contributor

An RFC to support Reference Types in Clang was just published.

As well as an implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-wasm 32-bit and 64-bit WebAssembly enhancement Solving this issue will likely involve adding new logic or components to the codebase. frontend Tokenization, parsing, AstGen, Sema, and Liveness. use case Describes a real use case that is difficult or impossible, but does not propose a solution.
Projects
None yet
Development

No branches or pull requests

8 participants