Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WebAssembly] Handle symbols in .init_array sections #119127

Merged
merged 1 commit into from
Dec 10, 2024

Conversation

georgestagg
Copy link
Contributor

Follow on from #111008.

Consider the following C code, which supplies a function init() and an array written to the .init_array section so that the function runs before main().

void init() {}
typedef void (*fn)(void);
__attribute__((section(".init_array"))) fn p_init[1] = { &init };

int main() { return 0; }

Note that a label p_init is also provided for the array. I'm not sure if there exists a way to do this in C without additionally creating the p_init symbol.

In any case, compile this code to find:

$ EM_LLVM_ROOT=[...]/llvm-project/build/bin emcc bug.c -o bug.js
wasm-ld: error: [...]/emscripten_temp_fgsxu6w6/bug_0.o: invalid data segment index: 0

What is happening is the array to be written to .init_array is being handled specially by WasmObjectWriter.cpp:

// .init_array sections are handled specially elsewhere.
if (SectionName.starts_with(".init_array"))
continue;

This existing code skips writing out the array in a data segment. Later this is indeed dealt with by writing each function in the array into a Wasm custom linking section, in the InitFunctions subsection.

Unfortunately, the symbol p_init is also written to the Wasm custom linking section as a wasm::WASM_SYMBOL_TYPE_DATA, pointing to a data segment that now does not exist. As far as I can tell, we don't really indend to ever reference this symbol, the pattern is just used in C (and a similar construct can be used in Rust) to ensure that the .init_array array exists. So, while we could just throw an error here it's not ideal as it breaks this useful pattern for life-before-main.

In #111008 I suggested a change to simply not write the p_init symbol to the custom linking section of the Wasm object. But, after some prompting by @sbc100 it seems this is not a good idea, since it breaks code that does happen to reference the p_init symbol for whatever reason.

In this patch I instead remove the skip, so that data segments are also written for .init_array sections. This works around the problem above, since the p_init entry now references a data section that exists. I don't think the data written there is useful, but the change at least avoids the error message above. It also fixes related issues when you have multiple .init_array sections.

Some tests related to the change have been modified to reflect the additional emitted data, and a new test init-array-label.s has been added to test the specific problem above.

@llvmbot llvmbot added backend:WebAssembly mc Machine (object) code labels Dec 8, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 8, 2024

@llvm/pr-subscribers-backend-webassembly

@llvm/pr-subscribers-mc

Author: George Stagg (georgestagg)

Changes

Follow on from #111008.

Consider the following C code, which supplies a function init() and an array written to the .init_array section so that the function runs before main().

void init() {}
typedef void (*fn)(void);
__attribute__((section(".init_array"))) fn p_init[1] = { &init };

int main() { return 0; }

Note that a label p_init is also provided for the array. I'm not sure if there exists a way to do this in C without additionally creating the p_init symbol.

In any case, compile this code to find:

$ EM_LLVM_ROOT=[...]/llvm-project/build/bin emcc bug.c -o bug.js
wasm-ld: error: [...]/emscripten_temp_fgsxu6w6/bug_0.o: invalid data segment index: 0

What is happening is the array to be written to .init_array is being handled specially by WasmObjectWriter.cpp:

// .init_array sections are handled specially elsewhere.
if (SectionName.starts_with(".init_array"))
continue;

This existing code skips writing out the array in a data segment. Later this is indeed dealt with by writing each function in the array into a Wasm custom linking section, in the InitFunctions subsection.

Unfortunately, the symbol p_init is also written to the Wasm custom linking section as a wasm::WASM_SYMBOL_TYPE_DATA, pointing to a data segment that now does not exist. As far as I can tell, we don't really indend to ever reference this symbol, the pattern is just used in C (and a similar construct can be used in Rust) to ensure that the .init_array array exists. So, while we could just throw an error here it's not ideal as it breaks this useful pattern for life-before-main.

In #111008 I suggested a change to simply not write the p_init symbol to the custom linking section of the Wasm object. But, after some prompting by @sbc100 it seems this is not a good idea, since it breaks code that does happen to reference the p_init symbol for whatever reason.

In this patch I instead remove the skip, so that data segments are also written for .init_array sections. This works around the problem above, since the p_init entry now references a data section that exists. I don't think the data written there is useful, but the change at least avoids the error message above. It also fixes related issues when you have multiple .init_array sections.

Some tests related to the change have been modified to reflect the additional emitted data, and a new test init-array-label.s has been added to test the specific problem above.


Full diff: https://github.com/llvm/llvm-project/pull/119127.diff

4 Files Affected:

  • (modified) llvm/lib/MC/WasmObjectWriter.cpp (-4)
  • (modified) llvm/test/MC/WebAssembly/global-ctor-dtor.ll (+22-2)
  • (added) llvm/test/MC/WebAssembly/init-array-label.s (+86)
  • (modified) llvm/test/MC/WebAssembly/init-array.s (+15)
diff --git a/llvm/lib/MC/WasmObjectWriter.cpp b/llvm/lib/MC/WasmObjectWriter.cpp
index a66c5713ff8a6e..04466ae0fddf13 100644
--- a/llvm/lib/MC/WasmObjectWriter.cpp
+++ b/llvm/lib/MC/WasmObjectWriter.cpp
@@ -1482,10 +1482,6 @@ uint64_t WasmObjectWriter::writeOneObject(MCAssembler &Asm,
     LLVM_DEBUG(dbgs() << "Processing Section " << SectionName << "  group "
                       << Section.getGroup() << "\n";);
 
-    // .init_array sections are handled specially elsewhere.
-    if (SectionName.starts_with(".init_array"))
-      continue;
-
     // Code is handled separately
     if (Section.isText())
       continue;
diff --git a/llvm/test/MC/WebAssembly/global-ctor-dtor.ll b/llvm/test/MC/WebAssembly/global-ctor-dtor.ll
index f1ec71da1ebb64..b1b39ce02cfe85 100644
--- a/llvm/test/MC/WebAssembly/global-ctor-dtor.ll
+++ b/llvm/test/MC/WebAssembly/global-ctor-dtor.ll
@@ -63,7 +63,7 @@ declare void @func3()
 ; CHECK-NEXT:           Value:           1
 ; CHECK-NEXT:         Functions:       [ 5, 7 ]
 ; CHECK-NEXT:   - Type:            DATACOUNT
-; CHECK-NEXT:     Count:           1
+; CHECK-NEXT:     Count:           3
 ; CHECK-NEXT:   - Type:            CODE
 ; CHECK-NEXT:     Relocations:
 ; CHECK-NEXT:       - Type:            R_WASM_FUNCTION_INDEX_LEB
@@ -111,6 +111,18 @@ declare void @func3()
 ; CHECK-NEXT:           Opcode:          I32_CONST
 ; CHECK-NEXT:           Value:           0
 ; CHECK-NEXT:         Content:         '01040000'
+; CHECK-NEXT:       - SectionOffset:   15
+; CHECK-NEXT:         InitFlags:       0
+; CHECK-NEXT:         Offset:
+; CHECK-NEXT:           Opcode:          I32_CONST
+; CHECK-NEXT:           Value:           4
+; CHECK-NEXT:         Content:         '0000000000000000'
+; CHECK-NEXT:       - SectionOffset:   28
+; CHECK-NEXT:         InitFlags:       0
+; CHECK-NEXT:         Offset:
+; CHECK-NEXT:           Opcode:          I32_CONST
+; CHECK-NEXT:           Value:           12
+; CHECK-NEXT:         Content:         '0000000000000000'
 ; CHECK-NEXT:   - Type:            CUSTOM
 ; CHECK-NEXT:     Name:            linking
 ; CHECK-NEXT:     Version:         2
@@ -174,7 +186,15 @@ declare void @func3()
 ; CHECK-NEXT:       - Index:           0
 ; CHECK-NEXT:         Name:            .data.global1
 ; CHECK-NEXT:         Alignment:       3
-; CHECK-NEXT:         Flags:           [ ]
+; CHECK-NEXT:         Flags:           [  ]
+; CHECK-NEXT:       - Index:           1
+; CHECK-NEXT:         Name:            .init_array.42
+; CHECK-NEXT:         Alignment:       2
+; CHECK-NEXT:         Flags:           [  ]
+; CHECK-NEXT:       - Index:           2
+; CHECK-NEXT:         Name:            .init_array
+; CHECK-NEXT:         Alignment:       2
+; CHECK-NEXT:         Flags:           [  ]
 ; CHECK-NEXT:     InitFunctions:
 ; CHECK-NEXT:       - Priority: 42
 ; CHECK-NEXT:         Symbol: 9
diff --git a/llvm/test/MC/WebAssembly/init-array-label.s b/llvm/test/MC/WebAssembly/init-array-label.s
new file mode 100644
index 00000000000000..4fe0b64b3bc512
--- /dev/null
+++ b/llvm/test/MC/WebAssembly/init-array-label.s
@@ -0,0 +1,86 @@
+# RUN: llvm-mc -triple=wasm32-unknown-unknown -filetype=obj < %s | obj2yaml | FileCheck %s
+
+init1:
+	.functype	init1 () -> ()
+	end_function
+
+init2:
+	.functype	init2 () -> ()
+	end_function
+
+	.section	.init_array,"",@
+	.globl	p_init1
+	.p2align	2, 0x0
+p_init1:
+	.section	.init_array,"",@
+	.p2align	2, 0
+	.int32	init1
+	.size	p_init1, 4
+
+	.section	.init_array,"",@
+	.globl	p_init2
+	.p2align	2, 0x0
+p_init2:
+	.section	.init_array,"",@
+	.p2align	2
+	.int32	init2
+	.size	p_init2, 4
+
+# CHECK:        - Type:            FUNCTION
+# CHECK-NEXT:     FunctionTypes:   [ 0, 0 ]
+# CHECK-NEXT:   - Type:            DATACOUNT
+# CHECK-NEXT:     Count:           1
+# CHECK-NEXT:   - Type:            CODE
+# CHECK-NEXT:     Functions:
+# CHECK-NEXT:       - Index:           0
+# CHECK-NEXT:         Locals:          []
+# CHECK-NEXT:         Body:            0B
+# CHECK-NEXT:       - Index:           1
+# CHECK-NEXT:         Locals:          []
+# CHECK-NEXT:         Body:            0B
+# CHECK-NEXT:   - Type:            DATA
+# CHECK-NEXT:     Segments:
+# CHECK-NEXT:       - SectionOffset:   6
+# CHECK-NEXT:         InitFlags:       0
+# CHECK-NEXT:         Offset:
+# CHECK-NEXT:           Opcode:          I32_CONST
+# CHECK-NEXT:           Value:           0
+# CHECK-NEXT:         Content:         '0000000000000000'
+# CHECK-NEXT:   - Type:            CUSTOM
+# CHECK-NEXT:     Name:            linking
+# CHECK-NEXT:     Version:         2
+# CHECK-NEXT:     SymbolTable:
+# CHECK-NEXT:       - Index:           0
+# CHECK-NEXT:         Kind:            FUNCTION
+# CHECK-NEXT:         Name:            init1
+# CHECK-NEXT:         Flags:           [ BINDING_LOCAL ]
+# CHECK-NEXT:         Function:        0
+# CHECK-NEXT:       - Index:           1
+# CHECK-NEXT:         Kind:            FUNCTION
+# CHECK-NEXT:         Name:            init2
+# CHECK-NEXT:         Flags:           [ BINDING_LOCAL ]
+# CHECK-NEXT:         Function:        1
+# CHECK-NEXT:       - Index:           2
+# CHECK-NEXT:         Kind:            DATA
+# CHECK-NEXT:         Name:            p_init1
+# CHECK-NEXT:         Flags:           [  ]
+# CHECK-NEXT:         Segment:         0
+# CHECK-NEXT:         Size:            4
+# CHECK-NEXT:       - Index:           3
+# CHECK-NEXT:         Kind:            DATA
+# CHECK-NEXT:         Name:            p_init2
+# CHECK-NEXT:         Flags:           [  ]
+# CHECK-NEXT:         Segment:         0
+# CHECK-NEXT:         Offset:          4
+# CHECK-NEXT:         Size:            4
+# CHECK-NEXT:     SegmentInfo:
+# CHECK-NEXT:       - Index:           0
+# CHECK-NEXT:         Name:            .init_array
+# CHECK-NEXT:         Alignment:       2
+# CHECK-NEXT:         Flags:           [  ]
+# CHECK-NEXT:     InitFunctions:
+# CHECK-NEXT:       - Priority:        65535
+# CHECK-NEXT:         Symbol:          0
+# CHECK-NEXT:       - Priority:        65535
+# CHECK-NEXT:         Symbol:          1
+# CHECK-NEXT: ...
diff --git a/llvm/test/MC/WebAssembly/init-array.s b/llvm/test/MC/WebAssembly/init-array.s
index e79fb453ec12a3..ab8be11b0ff4a2 100644
--- a/llvm/test/MC/WebAssembly/init-array.s
+++ b/llvm/test/MC/WebAssembly/init-array.s
@@ -18,6 +18,8 @@ init2:
 
 # CHECK:        - Type:            FUNCTION
 # CHECK-NEXT:     FunctionTypes:   [ 0, 0 ]
+# CHECK-NEXT:   - Type:            DATACOUNT
+# CHECK-NEXT:     Count:           1
 # CHECK-NEXT:   - Type:            CODE
 # CHECK-NEXT:     Functions:
 # CHECK-NEXT:       - Index:           0
@@ -26,6 +28,14 @@ init2:
 # CHECK-NEXT:       - Index:           1
 # CHECK-NEXT:         Locals:          []
 # CHECK-NEXT:         Body:            0B
+# CHECK-NEXT:   - Type:            DATA
+# CHECK-NEXT:     Segments:
+# CHECK-NEXT:       - SectionOffset:   6
+# CHECK-NEXT:         InitFlags:       0
+# CHECK-NEXT:         Offset:
+# CHECK-NEXT:           Opcode:          I32_CONST
+# CHECK-NEXT:           Value:           0
+# CHECK-NEXT:         Content:         '0000000000000000'
 # CHECK-NEXT:   - Type:            CUSTOM
 # CHECK-NEXT:     Name:            linking
 # CHECK-NEXT:     Version:         2
@@ -40,6 +50,11 @@ init2:
 # CHECK-NEXT:         Name:            init2
 # CHECK-NEXT:         Flags:           [ BINDING_LOCAL ]
 # CHECK-NEXT:         Function:        1
+# CHECK-NEXT:     SegmentInfo:
+# CHECK-NEXT:       - Index:           0
+# CHECK-NEXT:         Name:            .init_array
+# CHECK-NEXT:         Alignment:       2
+# CHECK-NEXT:         Flags:           [  ]
 # CHECK-NEXT:     InitFunctions:
 # CHECK-NEXT:       - Priority:        65535
 # CHECK-NEXT:         Symbol:          0

@georgestagg georgestagg force-pushed the wasm-init-array-symbol branch from 0426b8d to f048af2 Compare December 9, 2024 14:31
Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, having asked you to add this code, I am now wondering if it is needed after all.

Since the linker runs with --gc-sections enabled by default I think the presence of these unreferenced sections in the object file should be harmless. However keeping the object files free of unused sections also makes some sense. I'll let you decided if you want to keep this code.

lgtm either way.

break;
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we split this out into a helper function?

e.g:

if (!isSectionReferenced(SectionName))
  continue;

@georgestagg georgestagg force-pushed the wasm-init-array-symbol branch from f048af2 to 741f3bb Compare December 10, 2024 09:28
@georgestagg
Copy link
Contributor Author

georgestagg commented Dec 10, 2024

I'll let you decide if you want to keep this code.

Now that it's been written, let's keep it. Keeping the objects clean is better.

Can we split this out into a helper function?

Yes, but I've used a slightly different calling form since we want to check the name too.

    if (SectionName.starts_with(".init_array") &&
        !isSectionReferenced(Asm, Section))
      continue;

The latest version of the commit has the logic in a separate function. If all looks good, could you hit the merge button for me? I don't have write access to the repo.

Also, if you don't mind, would you help me set up the llvmbot so as to request a backport of both #111008 and this PR to the 19.x release branch, please?

Thanks again for taking the time to review and feedback.

@sbc100 sbc100 merged commit ed91843 into llvm:main Dec 10, 2024
8 checks passed
@sbc100
Copy link
Collaborator

sbc100 commented Dec 10, 2024

It looks like you can request backpatches by commenting on an issue or PR: https://llvm.org/docs/GitHub.html#backporting-fixes-to-the-release-branches

broxigarchen pushed a commit to broxigarchen/llvm-project that referenced this pull request Dec 10, 2024
tru pushed a commit to llvmbot/llvm-project that referenced this pull request Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:WebAssembly mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants