Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we remove signature mangling from wasm2c output? #1858

Closed
sbc100 opened this issue Mar 9, 2022 · 8 comments · Fixed by #1896
Closed

Should we remove signature mangling from wasm2c output? #1858

sbc100 opened this issue Mar 9, 2022 · 8 comments · Fixed by #1896

Comments

@sbc100
Copy link
Member

sbc100 commented Mar 9, 2022

;tldr; I think we should.

It has been proposed in #1833 that we don't need to be mangling the names at all (except to turn them into valid C symbols).

In particular there is no need to append the mangled signature to the symbol names. e.g. foo_vi (for a void function foo that takes an i32).

For exports there should be no issue at all since exports already need to have unique names. However, wasm has historically allowed the same symbol to imported with various different signatures. In particular the test suite it self used to do this when it imported the same spectest.print symbol with various signatures. However, this was (conveniently) removed in WebAssembly/spec#652. In the design repo there was some discussion of removing duplicate imports: WebAssembly/design#1402. However that doesn't seem like its doing to happen. In fact the discussion seems to be moving towards giving the host more power to link module together based on both names and positions (and signatures) of the imports.

Based the above it seems like the naive linking scheme we currently have is already insufficient and not general enough to provide all the things that a wasm host might want to do.. so I propose that we remove name mangling and add a more powerful linking mechanism (that is not just based on the symbol resolution in the linker) if we even need one.

@sbc100
Copy link
Member Author

sbc100 commented Mar 9, 2022

@keithw @shravanrn @binji

@sbc100
Copy link
Member Author

sbc100 commented Mar 9, 2022

@kripken

@keithw
Copy link
Member

keithw commented Mar 9, 2022

For the Stanford team, in an instanced world, I think if it were just up to us, tentatively our own preferred design would be that:

  • we remove the WASM_RT_MODULE_PREFIX mechanism
  • we remove the mangling of function names according to the types of the parameters and return value
  • everything that forms part of the public interface of a module is prefixed with the "module name" (basically the header name minus the ".h"). This includes:
    • the typename of the module_instance structure (e.g. modname_module_instance_t)
    • the typename of a structure (e.g. modname_imports_t) that holds pointers to the instance-specific imports (memories, globals, tables), named according to their import names
    • the name of the init() function (e.g. modname_init), whose arguments are:
      • an instance of the modname_module_instance_t structure to be initialized
      • an instance of the modname_imports_t structure (which has to be filled out by the host before calling init())
    • the name of all the instance-specific exports (e.g. Z_modname_exportname), which are functions that take a modname_module_instance_t ptr as argument and retrieve the resulting export
    • the names of all the host-import functions that the module expects (these need to be named on a per-modname basis since they take a modname_module_instance_t as their first argument, and C doesn't have operator overloading like this)
    • the names of all the exported functions (these should also be named on a per-modname basis since they're module-specific exports)

This would describe a world where every module has one set of imported host functions (established at link time -- the host has to provide the correctly named functions, and each host function gets the instance structure passed in so it can access its exports), and a per-instance set of imported memories, globals, and tables (established at init() time -- the host has to provide a modname_imports_t structure with the imports filled in).

Not sure if that's going to work for all the other users, but we're certainly glad to have the design conversation and grateful to be included in it.

@sbc100
Copy link
Member Author

sbc100 commented Mar 9, 2022

For the Stanford team, in an instanced world, I think if it were just up to us, tentatively our own preferred design would be that:

  • we remove the WASM_RT_MODULE_PREFIX mechanism

Agreed. This should be fixed at wasm2c time. I have a PR to do this.

  • we remove the mangling of function names according to the types of the parameters and return value

Agreed, I have PR to this too.. and this is the substance of this issue.

.. snip ..

This would describe a world where every module has one set of imported host functions (established at link time -- the host has to provide the correctly named functions, and each host function gets the instance structure passed in so it can access its exports), and a per-instance set of imported memories, globals, and tables (established at init() time -- the host has to provide a modname_imports_t structure with the imports filled in).

This works except in the case where a module imports a function, say print twice, with two different signatures. Essentially print would be polymorphic host function that could be imported with any signature.. and imported multiple time. In theory the wasm spec (and the JS embedding allow this). If we rely on the C symbol names to perform module linking we cannot support this case. It could be that we would need a higher level linking method .. but I think we should delay building it until we have a use case that requires it.

@shravanrn
Copy link
Collaborator

shravanrn commented Mar 10, 2022

So I propose that we remove name mangling

Yup, simplifying the symbol lookup is something that helps significantly. I think the specific thing I want to avoid is a complicated name mangling mechanism. So any simple prefix like "Z_" or "w2c_" or "w2c_<module_name>" is fine.

Any scheme that passes the below example is fine

// This is C code I am compiling into a Wasm module

void foo (int a, int b, int c) {}

// This is C code that is my host application

// Feel free to modify the contents of the macro itself
#define w2c_invoke(inst, mod, func, ...) w2c_##mod##name(inst, VA_ARGS)

void main() {
  w2c_sandbox inst = ...
  w2c_invoke(inst, foo, 1, 2, 3);
}

Essentially any name mangling that can be put into a simple C macro (w2c_invoke above) would work well.

It could be that we would need a higher level linking method .. but I think we should delay building it until we have a use case that requires it.

Sounds good to me.

@binji
Copy link
Member

binji commented Mar 10, 2022

Agreed with everyone here! My main concern was making sure that the spec tests passed, so we should try to keep any future tests that exercise this complex linking in separate files that can be ignored for wasm2c (at least until it is properly supported).

@sbc100
Copy link
Member Author

sbc100 commented Mar 10, 2022

All the spec tests do pass, even with mangling removed.. no spec tests using import overloading since it was removed in WebAssembly/spec#652. Kind of sad maybe since it means that the spec tests no longer exercise that features of the spec. I wonder if its worth open a bug about that?

@sbc100
Copy link
Member Author

sbc100 commented Apr 13, 2022

I spoke to @sunfishcode (who removed the spectest's dependency on this feature) and it seems that its reasonable for implemenations (in particualar non-JS implementations) to not support import overloads.. since only the "system/embedder" can ever generate such imports, a wasm module's exports can be overloaded in this way.

sbc100 added a commit that referenced this issue Apr 13, 2022
This effectively means that we no longer support imports that are
overloaded by signature only, which is not something that we need to
support in order to support the core wasm spec.

This feature is available in the JS embedding but there is no good
reason (AFAICT) to support it in wasm2c, and this simplifies the
generated code.

Fixes #1858
sbc100 added a commit that referenced this issue Apr 13, 2022
This effectively means that we no longer support imports that are
overloaded by signature only, which is not something that we need to
support in order to support the core wasm spec.

This feature is available in the JS embedding but there is no good
reason (AFAICT) to support it in wasm2c, and this simplifies the
generated code.

Fixes #1858
sbc100 added a commit that referenced this issue Apr 13, 2022
This effectively means that we no longer support imports that are
overloaded by signature only, which is not something that we need to
support in order to support the core wasm spec.

This feature is available in the JS embedding but there is no good
reason (AFAICT) to support it in wasm2c, and this simplifies the
generated code.

Fixes #1858
sbc100 added a commit that referenced this issue Apr 13, 2022
This effectively means that we no longer support imports that are
overloaded by signature only, which is not something that we need to
support in order to support the core wasm spec.

This feature is available in the JS embedding but there is no good
reason (AFAICT) to support it in wasm2c, and this simplifies the
generated code.

Fixes #1858
sbc100 added a commit that referenced this issue Apr 14, 2022
This effectively means that we no longer support imports that are
overloaded by signature only.  This is not something that we need to
support in order to support the core wasm spec.

This feature is available in the JS embedding but there is no good
reason (AFAICT) to support it in wasm2c, and this simplifies the
generated code.

Fixes #1858
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants