Skip to content

Conversation

@dschuff
Copy link
Member

@dschuff dschuff commented Jun 17, 2022

Other custom sections need to stay in the file so that the DWARF data can be interpreted by tools such as llvm-dwarfdump.

Fixes #13084

@dschuff
Copy link
Member Author

dschuff commented Jun 17, 2022

Depends on https://reviews.llvm.org/D128094

One thing we might also want to do is remove the data section. However we can't currently do that because LLVM's object file reader verifies that each name (e.g. a function or data segment name) refers to a valid/existing entity (e.g. a function or data segment). For functions this is fine, since functions are declared in the Function section, and we can leave that in while stripping the code section. For data, the datacount section does the same thing, but it's optional and not used except in binaries with passive segments. That means just simply stripping out the data section invalidates data segment names.

So this PR does not strip the data section. We could:

  1. Leave it as-is, as the data section is usually much smaller than the code section.
  2. Drop the object file reader's name validation requirement (at least for data segments)
  3. Convert the data section to a datacount section instead of stripping it completely.

@dschuff dschuff requested review from aheejin and sbc100 June 17, 2022 20:34
Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test?

for sec in debug_wasm.sections():
# TODO: check for absence of code section (see
# https://github.com/emscripten-core/emscripten/issues/13084)
if sec.type == webassembly.SecType.CODE:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_separate_dwarf now checks that the result doesn't have a code section.

# TODO(dschuff): Also strip the DATA section? To make this work we'd need to
# either allow "invalid" data segment name entries, or maybe convert the DATA
# to a DATACOUNT section.
strip(wasm_file_with_dwarf, wasm_file_with_dwarf, sections=['CODE'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is producers in lowercase but CODE in upper?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CODE is a known section, and known sections are displayed and matched by LLVM tools with capitalized names (whereas all the recognized custom section names happen to be lowercase)

dschuff added 3 commits June 23, 2022 16:57
Other custom sections need to stay in the file so that the DWARF data can be
interpreted by tools such as llvm-dwarfdump.

Fixes #13084
@dschuff dschuff force-pushed the separate-dwarf-2 branch from 66c8141 to a9485c1 Compare June 23, 2022 23:57
@dschuff dschuff enabled auto-merge (squash) June 24, 2022 00:05
@dschuff
Copy link
Member Author

dschuff commented Jun 24, 2022

The dependencies are in, so I'm going to let this auto-merge. I'd still be curious if any of you have opinions about the top comment above.

@aheejin
Copy link
Member

aheejin commented Jun 24, 2022

One thing we might also want to do is remove the data section. However we can't currently do that because LLVM's object file reader verifies that each name (e.g. a function or data segment name) refers to a valid/existing entity (e.g. a function or data segment). For functions this is fine, since functions are declared in the Function section, and we can leave that in while stripping the code section. For data, the datacount section does the same thing, but it's optional and not used except in binaries with passive segments. That means just simply stripping out the data section invalidates data segment names.

So this PR does not strip the data section. We could:

  1. Leave it as-is, as the data section is usually much smaller than the code section.
  2. Drop the object file reader's name validation requirement (at least for data segments)
  3. Convert the data section to a datacount section instead of stripping it completely.

What exactly is the name validation requirement? Does that require the name section, because otherwise we wouldn't even know what the names would be? And the datacount section only stores the number of data segments. Is that enough to verify the names, if present with the name section?

I think, if necessary, requiring the optional datacount section in debug info files generated by -gseparate-dwarf is fine because this is an invalid wasm file anyway and we only use wasm as a container.

How large is the data section usually compared to the debug info? Under 10%? If so it wouldn't really hurt to leave it.

I would prefer the option 1 or 3 depending on how large the data section is.

@dschuff
Copy link
Member Author

dschuff commented Jun 24, 2022

What exactly is the name validation requirement?

The name section refers to functions and other entities by their index, and the object file reader will reject a binary with a name section that refers to a function or data segment that it hasn't seen a declaration for (e.g. if there is no function section or data section in the binary, then the index space will be empty). Or more precisely IIRC, if the reader sees a name with an index greater than the number of functions or data segments in the binary, then it knows the name can't be correct. But I don't think there's really any other consequence to having more names than functions/segments.
This also is why I think a datacount section would be sufficient to work around the problem, since all the object file really needs to know when parsing names is how many segments or functions there are.

How large is the data section usually compared to the debug info? Under 10%? If so it wouldn't really hurt to leave it.

I don't have "real" numbers on this but my sense from the binaries I've seen is that it's pretty small (especially compared to debug info), probably less than 10% in most cases. So I agree that it isn't a big deal, and it's why I'm comfortable going ahead with landing this even though we aren't totally sure what to do about the data sections.

@dschuff dschuff merged commit 1c11590 into main Jun 24, 2022
@dschuff dschuff deleted the separate-dwarf-2 branch June 24, 2022 17:56
@aheejin
Copy link
Member

aheejin commented Jun 27, 2022

Yeah, we certainly can start with this, and it wouldn't be a big deal if we leave it. I personally prefer the datacount section option most though, given that it's the smallest and simplest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

-gseparate-dwarf contains copy of the original .wasm file

5 participants