Strip code section from separate-dwarf file #17257

dschuff · 2022-06-17T20:27:44Z

Other custom sections need to stay in the file so that the DWARF data can be interpreted by tools such as llvm-dwarfdump.

Fixes #13084

dschuff · 2022-06-17T20:34:28Z

Depends on https://reviews.llvm.org/D128094

One thing we might also want to do is remove the data section. However we can't currently do that because LLVM's object file reader verifies that each name (e.g. a function or data segment name) refers to a valid/existing entity (e.g. a function or data segment). For functions this is fine, since functions are declared in the Function section, and we can leave that in while stripping the code section. For data, the datacount section does the same thing, but it's optional and not used except in binaries with passive segments. That means just simply stripping out the data section invalidates data segment names.

So this PR does not strip the data section. We could:

Leave it as-is, as the data section is usually much smaller than the code section.
Drop the object file reader's name validation requirement (at least for data segments)
Convert the data section to a datacount section instead of stripping it completely.

sbc100

test?

dschuff · 2022-06-17T22:27:25Z

tests/test_other.py

    for sec in debug_wasm.sections():
-      # TODO: check for absence of code section (see
-      # https://github.com/emscripten-core/emscripten/issues/13084)
+      if sec.type == webassembly.SecType.CODE:


test_separate_dwarf now checks that the result doesn't have a code section.

kripken · 2022-06-17T23:53:48Z

tools/building.py

+  # TODO(dschuff): Also strip the DATA section? To make this work we'd need to
+  # either allow "invalid" data segment name entries, or maybe convert the DATA
+  # to a DATACOUNT section.
+  strip(wasm_file_with_dwarf, wasm_file_with_dwarf, sections=['CODE'])


Why is producers in lowercase but CODE in upper?

CODE is a known section, and known sections are displayed and matched by LLVM tools with capitalized names (whereas all the recognized custom section names happen to be lowercase)

Other custom sections need to stay in the file so that the DWARF data can be interpreted by tools such as llvm-dwarfdump. Fixes #13084

dschuff · 2022-06-24T00:06:15Z

The dependencies are in, so I'm going to let this auto-merge. I'd still be curious if any of you have opinions about the top comment above.

aheejin · 2022-06-24T00:29:59Z

One thing we might also want to do is remove the data section. However we can't currently do that because LLVM's object file reader verifies that each name (e.g. a function or data segment name) refers to a valid/existing entity (e.g. a function or data segment). For functions this is fine, since functions are declared in the Function section, and we can leave that in while stripping the code section. For data, the datacount section does the same thing, but it's optional and not used except in binaries with passive segments. That means just simply stripping out the data section invalidates data segment names.

So this PR does not strip the data section. We could:

Leave it as-is, as the data section is usually much smaller than the code section.

Drop the object file reader's name validation requirement (at least for data segments)

Convert the data section to a datacount section instead of stripping it completely.

What exactly is the name validation requirement? Does that require the name section, because otherwise we wouldn't even know what the names would be? And the datacount section only stores the number of data segments. Is that enough to verify the names, if present with the name section?

I think, if necessary, requiring the optional datacount section in debug info files generated by -gseparate-dwarf is fine because this is an invalid wasm file anyway and we only use wasm as a container.

How large is the data section usually compared to the debug info? Under 10%? If so it wouldn't really hurt to leave it.

I would prefer the option 1 or 3 depending on how large the data section is.

dschuff · 2022-06-24T15:57:03Z

What exactly is the name validation requirement?

The name section refers to functions and other entities by their index, and the object file reader will reject a binary with a name section that refers to a function or data segment that it hasn't seen a declaration for (e.g. if there is no function section or data section in the binary, then the index space will be empty). Or more precisely IIRC, if the reader sees a name with an index greater than the number of functions or data segments in the binary, then it knows the name can't be correct. But I don't think there's really any other consequence to having more names than functions/segments.
This also is why I think a datacount section would be sufficient to work around the problem, since all the object file really needs to know when parsing names is how many segments or functions there are.

How large is the data section usually compared to the debug info? Under 10%? If so it wouldn't really hurt to leave it.

I don't have "real" numbers on this but my sense from the binaries I've seen is that it's pretty small (especially compared to debug info), probably less than 10% in most cases. So I agree that it isn't a big deal, and it's why I'm comfortable going ahead with landing this even though we aren't totally sure what to do about the data sections.

aheejin · 2022-06-27T00:54:49Z

Yeah, we certainly can start with this, and it wouldn't be a big deal if we leave it. I personally prefer the datacount section option most though, given that it's the smallest and simplest.

This effectively reverts #17257

dschuff requested review from aheejin and sbc100 June 17, 2022 20:34

sbc100 approved these changes Jun 17, 2022

View reviewed changes

dschuff commented Jun 17, 2022

View reviewed changes

kripken approved these changes Jun 17, 2022

View reviewed changes

aheejin approved these changes Jun 18, 2022

View reviewed changes

dschuff added 3 commits June 23, 2022 16:57

Strip code and data sections from separate-dwarf file

0562364

Other custom sections need to stay in the file so that the DWARF data can be interpreted by tools such as llvm-dwarfdump. Fixes #13084

fix typos, remove print, don't strip DATA

8905a34

flake8

a9485c1

dschuff force-pushed the separate-dwarf-2 branch from 66c8141 to a9485c1 Compare June 23, 2022 23:57

dschuff enabled auto-merge (squash) June 24, 2022 00:05

dschuff merged commit 1c11590 into main Jun 24, 2022

dschuff deleted the separate-dwarf-2 branch June 24, 2022 17:56

dschuff added a commit that referenced this pull request Jul 29, 2022

Don't strip the code section from the separate-dwarf file

d4c83ce

This effectively reverts #17257

dschuff mentioned this pull request Oct 31, 2024

-gseparate-dwarf contains copy of the original .wasm file #13084

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Strip code section from separate-dwarf file #17257

Strip code section from separate-dwarf file #17257

Uh oh!

dschuff commented Jun 17, 2022

Uh oh!

dschuff commented Jun 17, 2022

Uh oh!

sbc100 left a comment

Uh oh!

dschuff Jun 17, 2022

Uh oh!

kripken Jun 17, 2022

Uh oh!

dschuff Jun 18, 2022

Uh oh!

dschuff commented Jun 24, 2022

Uh oh!

aheejin commented Jun 24, 2022

Uh oh!

dschuff commented Jun 24, 2022 •

edited

Loading

Uh oh!

aheejin commented Jun 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Strip code section from separate-dwarf file #17257

Strip code section from separate-dwarf file #17257

Uh oh!

Conversation

dschuff commented Jun 17, 2022

Uh oh!

dschuff commented Jun 17, 2022

Uh oh!

sbc100 left a comment

Choose a reason for hiding this comment

Uh oh!

dschuff Jun 17, 2022

Choose a reason for hiding this comment

Uh oh!

kripken Jun 17, 2022

Choose a reason for hiding this comment

Uh oh!

dschuff Jun 18, 2022

Choose a reason for hiding this comment

Uh oh!

dschuff commented Jun 24, 2022

Uh oh!

aheejin commented Jun 24, 2022

Uh oh!

dschuff commented Jun 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aheejin commented Jun 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dschuff commented Jun 24, 2022 •

edited

Loading