Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow character substitution of wasm symbol names #13477

Merged
merged 5 commits into from
Mar 28, 2021

Conversation

juj
Copy link
Collaborator

@juj juj commented Feb 11, 2021

Allow character substitution of wasm symbol names to avoid working around complex string escaping issues in operating system shells and build systems.

Wasm toolchain compiled functions do not support name mangling. As result, the generated function names can contain difficult to deal with characters , & and , (and maybe others).

When attempting to pass a complex signature to a function, such as foo(char const*, int&) to ASYNCIFY_ONLY/ASYNCIFY_ADD/ASYNCIFY_REMOVE, one might as a first try pass

emcc -s ASYNCIFY_ONLY=[foo(char const*, int&)] ...

However that has the problem that it passes the cmdline args emcc, -s, ASYNCIFY_ONLY=[foo(char, const*,, int&)], ... from the terminal (as one would expect is sensible). Then, adding quotes, one gets

emcc -s "ASYNCIFY_ONLY=[foo(char const*, int&)]" ...

On Windows, the command line interpreter consumes the quotes " to denote a single cmdline argument element, Emscripten will not see them, but will receive emcc, -s, ASYNCIFY_ONLY=[foo(char const*, int&)], ....

However that has the issue with , character in the function name, since it is treated as JSON array element separator. Hence emcc suggests to quote the arguments. Hence, one gets

emcc -s "ASYNCIFY_ONLY=[\"foo(char const*, int&)\"]" ...

However that does not work either on Windows, because the & character works as a special delimiter character for issuing two commands from the command line. (echo a & echo b prints a<newline>b). On Windows & cannot be escaped with \&, nor with &&, but the escape symbol is... "&". Hence one finally gets

emcc -s "ASYNCIFY_ONLY=[\"foo(char const*, int"&")\"]" ...

which actually works, just barely, and I have no idea why or how. Maybe Windows command line interpreter first short-circuit scans for "&" -> & without paying attention to any reasonable left-to-right or right-to-left " for any quote matching rules?.

However, when one takes that syntax to any kind of build system (CMake, Ninja, MinGW Makefiles, Tundra, ...), things just spectacularly fall apart again. One needs to at least escape " with \", then double escape \ with \\, leading into an utterly unreadable, and even worse, unmaintainable mess. And on top of that, the "&" escaping no longer seems to work, but Windows sees it again as a two command delimiter.

Finally, Linux and macOS have different terminal quoting rules, which leads to needing to start all of this from scratch on those OSes to come up with quoting and escaping rules that pass the terminals there. This can be observed e.g. at https://github.com/gabrielcuvillier/d3wasm/blob/ece7b9869eee5f4927b07efe3cf4146d6a562205/neo/CMakeLists.txt#L240 where two custom quoting rules have been devised for Windows vs Linux. CC @gabrielcuvillier

To avoid all of these issues, this PR bluntly adds single character substitution rules to avoid needing to fight build system and OS shell expansion rules:

  • -> .
  • & -> #
  • , -> ?

This way one can call on command line

emcc -s ASYNCIFY_ONLY=[foo(char.const*?.int#)] ...

No quotes, no escapes, no bs, and guaranteed to work cross platforms. Very unreadable, but still more readable than a sea of \\\\""s, and safe and less brittle. Developing custom tooling in build systems for automated single character substitution and unsubstitution will be at least three orders of magnitudes easier than multicharacter backslash-double-quote escape-unescape matching quotes searching substitution.

Are there other -s settings that take in raw wasm function names without mangling?

@juj juj force-pushed the fix_lack_of_wasm_symbol_mangling branch from 3314918 to 3d9ecef Compare February 11, 2021 20:43
@kripken
Copy link
Member

kripken commented Feb 12, 2021

Ah, interesting idea... It is a nice approach I had not even considered.

The more general alternative, however, is to use response files, -s ASYNCIFY_ONLY=@file.txt and then no escaping is needed in the file. Is that not sufficient?

@curiousdannii
Copy link
Contributor

curiousdannii commented Feb 13, 2021

Getting CMake to notice changes to config files like that is a pain in my experience. I managed to do it before by copying and modifying em_add_tracked_link_flag, but it's not fun at all.

I like this PR, as sometimes you only have a small handful of functions. But would a PR to add something like em_add_tracked_link_flag for the ASYNCIFY options also be useful?

@juj
Copy link
Collaborator Author

juj commented Feb 13, 2021

The more general alternative, however, is to use response files, -s ASYNCIFY_ONLY=@file.txt and then no escaping is needed in the file. Is that not sufficient?

If we think that response files are the correct solution, then I think we should disallow passing symbols directly at all to the function, as we should not be offering broken functionality to users.

Though I think response files are an awkward way to solve the issue. If one has only a few functions that need ASYNCIFYing, they may not immediately realize why they would have to use response files, and then they won't do it, and will get cross-platform broken behavior.

But would a PR to add something like em_add_tracked_link_flag for the ASYNCIFY options be useful?

Maybe it would help (could a more general option work as well? Or I wonder if recent CMake might have that kind of feature out of the box. Haven't looked into it in a while), but certainly not a solution here, since this issue is not CMake related, but happens already on command line.

@juj juj force-pushed the fix_lack_of_wasm_symbol_mangling branch 2 times, most recently from b2ece16 to f1a28c3 Compare February 13, 2021 16:52
@kripken
Copy link
Member

kripken commented Feb 16, 2021

@curiousdannii good point about noticing changes in build systems, yeah...

Base automatically changed from master to main March 8, 2021 23:49
@juj juj force-pushed the fix_lack_of_wasm_symbol_mangling branch from f1a28c3 to c2a2369 Compare March 16, 2021 15:43
@sbc100
Copy link
Collaborator

sbc100 commented Mar 16, 2021

Long term I'd like to push folks towards using a file on disk here instead.. and supporting a simple "one symbol per line" file format.

This seems like a reasonable interim helping hand though.

@juj
Copy link
Collaborator Author

juj commented Mar 16, 2021

@kripken I wonder if the test I added is somehow wrong use of the ASYNCIFY feature? I am getting RuntimeError: unreachable executed in browser.test_asyncify_tricky_function_sig, but I don't quite understand why? :o

@kripken
Copy link
Member

kripken commented Mar 16, 2021

Yes, I think that's an asyncify assertion. When you use the "only" list then only those methods are instrumented. I think adding main,__original_main should fix it (note that as C function they do not need ()).

@juj juj force-pushed the fix_lack_of_wasm_symbol_mangling branch from c2a2369 to cc40bdb Compare March 28, 2021 11:44
@juj juj enabled auto-merge (squash) March 28, 2021 11:45
@juj juj merged commit bbce5d7 into emscripten-core:main Mar 28, 2021
@sbc100
Copy link
Collaborator

sbc100 commented May 19, 2021

I'd like to recommend the new "one-symbol-per-line" file format as an alternative that avoids the need to any kind of quoting: #14170.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants