Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eager symbol tables #1537

Merged
merged 4 commits into from
Sep 16, 2020
Merged

Eager symbol tables #1537

merged 4 commits into from
Sep 16, 2020

Conversation

wingo
Copy link
Contributor

@wingo wingo commented Sep 9, 2020

Based on #1535.

Instead of adding entries to the symbol table as they are referenced, when writing relocatable binaries, we're going to make symbols for all functions. This allows exported functions and globals to be written with the proper exported / no_strip flags, so that resulting files will link with wasm-ld.

The current state is that I've only updated the tests in the repo. However the real difference can be seen if we make a simple wasm file from LLVM:

$ cat a.c
void b(void);
__attribute__((export_name("a"))) void a(void) { b(); }

$ clang -Oz --target=wasm32 -nostdlib -c -o a.o a.c
$ wasm2wat -o a.wat a.o

The resulting file looks like:

(module
  (type (;0;) (func))
  (import "env" "__linear_memory" (memory (;0;) 0))
  (import "env" "__indirect_function_table" (table (;0;) 0 funcref))
  (import "env" "b" (func (;0;) (type 0)))
  (func $a (type 0)
    call 0)
  (export "a" (func $a)))

If I then use wat2wasm --relocatable on that file, and I wasm-objdump -dx it, I used to get:

...
Custom:
 - name: "linking"
  - symbol table [count=1]
   - 0: F <env.b> func=0 undefined binding=global vis=default
Custom:
 - name: "reloc.Code"
  - relocations for section: 4 (Code) [1]
   - R_WASM_FUNCTION_INDEX_LEB offset=0x000004(file=0x000063) symbol=0 <env.b>

Code Disassembly:

000061 func[1] <a>:
 000062: 10 80 80 80 80 00          | call 0 <env.b>
 000068: 0b                         | end

Whereas now I get:

a.wat.o:	file format wasm 0x1

Section Details:

Type[1]:
 - type[0] () -> nil
Import[3]:
 - memory[0] pages: initial=0 <- env.__linear_memory
 - table[0] type=funcref initial=0 <- env.__indirect_function_table
 - func[0] sig=0 <env.b> <- env.b
Function[1]:
 - func[1] sig=0 <$a>
Export[1]:
 - func[1] <$a> -> "a"
Code[1]:
 - func[1] size=8 <$a>
Custom:
 - name: "linking"
  - symbol table [count=2]
   - 0: F <env.b> func=0 undefined binding=global vis=default
   - 1: F <$a> func=1 exported no_strip binding=global vis=hidden
Custom:
 - name: "reloc.Code"
  - relocations for section: 4 (Code) [1]
   - R_WASM_FUNCTION_INDEX_LEB offset=0x000004(file=0x000063) symbol=0 <env.b>

Code Disassembly:

000061 func[1] <$a>:
 000062: 10 80 80 80 80 00          | call 0 <env.b>
 000068: 0b                         | end

I.e. there's now an entry for the exported $a definition. This allows wasm-ld to do the right thing.

Note that the name added to the symbol table is $a. This causes wabt to use that name when printing $a elsewhere in the disassembly; hence the diffs to existing tests (but only when printing relocatable binaries). Not sure what the desired UI is for that. I think it's just an internal name in the module; the external names are defined already in the imports/exports.

@wingo
Copy link
Contributor Author

wingo commented Sep 9, 2020

My TODO here is to add more tests. I will probably add the a.o test above; if there's anything particular I should look at, LMK.

@sbc100
Copy link
Member

sbc100 commented Sep 10, 2020

I don't think $a should appear in the symbol table but just a. Same goes for env.b actually. Ideally it would just be a and b as symbol names since wasm-ld uses a flat namespace.

@wingo
Copy link
Contributor Author

wingo commented Sep 11, 2020

I don't think env.b is actually in the symbol table; nothing in wabt sets EXPLICIT_NAME for imports. env.b is just a display name that wabt uses internally.

Regarding $a vs a for naming symbols defined in the compilation unit, you are probably right regarding user expectations, but I wonder: does it matter to wasm-ld? It could only matter if the name were used by wabt for exports. I will have to try and see what happens.

@sbc100
Copy link
Member

sbc100 commented Sep 11, 2020

I don't think env.b is actually in the symbol table; nothing in wabt sets EXPLICIT_NAME for imports. env.b is just a display name that wabt uses internally.

Ah, you are correct. clang produced objects look the same.

Regarding $a vs a for naming symbols defined in the compilation unit, you are probably right regarding user expectations, but I wonder: does it matter to wasm-ld? It could only matter if the name were used by wabt for exports. I will have to try and see what happens.

No I don't it matters from wasm-ld perspective. Symbol names can be whatever you want them to be. But if you want to link with C/C++ you won't want wabt adding $ to the beginning of symbol names.

My understanding is that the$ is the escape char used in the text format for showing names. I would not expect the $ to propagate into the binary format in general.

Instead of adding entries to the symbol table as they are referenced,
when writing relocatable binaries, we're going to make symbols for all
functions.  This allows exported functions and globals to be written
with the proper exported / no_strip flags, so that resulting files will
link with wasm-ld.
@wingo wingo force-pushed the eager-symbol-tables branch from 602f105 to d7e70c3 Compare September 15, 2020 12:41
@wingo
Copy link
Contributor Author

wingo commented Sep 15, 2020

Regarding $a vs a for naming symbols defined in the compilation unit, you are probably right regarding user expectations, but I wonder: does it matter to wasm-ld? It could only matter if the name were used by wabt for exports. I will have to try and see what happens.

No I don't it matters from wasm-ld perspective. Symbol names can be whatever you want them to be. But if you want to link with C/C++ you won't want wabt adding $ to the beginning of symbol names.

Ah interesting, I didn't know that wasm-ld used the name from the symbol table; I thought it used the name from the exports. I guess I was just linking from wat to C and not the other way. Will fix!

In the previous commit, I wrongly assumed that the globally visible name
for linking was taken from the exports section, whereas actually it's
from the symbol table.  Therefore this patch changes to strip off the
dollar, and also to make all named bindings globally visible.  The
exported-to-the-host binding is mostly unrelated to the
visible-to-other-compilation-units binding.  Unnamed definitions aren't
added to the symbol table.
@wingo
Copy link
Contributor Author

wingo commented Sep 15, 2020

I rebased then pushed a followup; see f118694 for the changes. Now the test file from the original PR message looks as it did before, plus an additional entry for the export as expected:

$ ~/src/wabt/out/gcc/Debug/wasm-objdump -dx /tmp/a.o

a.o:	file format wasm 0x1

Section Details:

Type[1]:
 - type[0] () -> nil
Import[3]:
 - memory[0] pages: initial=0 <- env.__linear_memory
 - table[0] type=funcref initial=0 <- env.__indirect_function_table
 - func[0] sig=0 <env.b> <- env.b
Function[1]:
 - func[1] sig=0 <a>
Export[1]:
 - func[1] <a> -> "a"
Code[1]:
 - func[1] size=8 <a>
Custom:
 - name: "linking"
  - symbol table [count=2]
   - 0: F <env.b> func=0 undefined binding=global vis=default
   - 1: F <a> func=1 exported no_strip binding=global vis=hidden
Custom:
 - name: "reloc.Code"
  - relocations for section: 4 (Code) [1]
   - R_WASM_FUNCTION_INDEX_LEB offset=0x000004(file=0x000063) symbol=0 <env.b>

Code Disassembly:

000061 func[1] <a>:
 000062: 10 80 80 80 80 00          | call 0 <env.b>
 000068: 0b                         | end

Copy link
Member

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me. Have you tried actually linking the wabt output with C/C++ code using wasm-ld?


std::set<string_view> seen_names_;

Result Intern(const string_view& name) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just call this Add? or AddName?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, apologies for the name ;) I come from the lisp world, which is what this operation is called there... But I can update it to something less jargon-ful.

@wingo
Copy link
Contributor Author

wingo commented Sep 15, 2020

Looks pretty good to me. Have you tried actually linking the wabt output with C/C++ code using wasm-ld?

Yeah, you can see a little test here: https://github.com/Igalia/ref-cpp/blob/master/milestones/m2/Makefile. The .wat file gets linked to a little implementation of malloc.

@binji binji merged commit cd0b3db into WebAssembly:master Sep 16, 2020
wingo added a commit to wingo/wabt that referenced this pull request Sep 17, 2020
PRs WebAssembly#1527 and WebAssembly#1539 needed their test expectations updated after WebAssembly#1537
was merged; this patch does that.  It also renames a test for
consistency.
binji pushed a commit that referenced this pull request Sep 17, 2020
* Fix up reloc-related tests after #1537

PRs #1527 and #1539 needed their test expectations updated after #1537
was merged; this patch does that.  It also renames a test for
consistency.

* Adapt test to renaming
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants