-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Strip code section from separate-dwarf file #17257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Depends on https://reviews.llvm.org/D128094 One thing we might also want to do is remove the data section. However we can't currently do that because LLVM's object file reader verifies that each name (e.g. a function or data segment name) refers to a valid/existing entity (e.g. a function or data segment). For functions this is fine, since functions are declared in the Function section, and we can leave that in while stripping the code section. For data, the datacount section does the same thing, but it's optional and not used except in binaries with passive segments. That means just simply stripping out the data section invalidates data segment names. So this PR does not strip the data section. We could:
|
sbc100
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test?
| for sec in debug_wasm.sections(): | ||
| # TODO: check for absence of code section (see | ||
| # https://github.com/emscripten-core/emscripten/issues/13084) | ||
| if sec.type == webassembly.SecType.CODE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_separate_dwarf now checks that the result doesn't have a code section.
| # TODO(dschuff): Also strip the DATA section? To make this work we'd need to | ||
| # either allow "invalid" data segment name entries, or maybe convert the DATA | ||
| # to a DATACOUNT section. | ||
| strip(wasm_file_with_dwarf, wasm_file_with_dwarf, sections=['CODE']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is producers in lowercase but CODE in upper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CODE is a known section, and known sections are displayed and matched by LLVM tools with capitalized names (whereas all the recognized custom section names happen to be lowercase)
Other custom sections need to stay in the file so that the DWARF data can be interpreted by tools such as llvm-dwarfdump. Fixes #13084
66c8141 to
a9485c1
Compare
|
The dependencies are in, so I'm going to let this auto-merge. I'd still be curious if any of you have opinions about the top comment above. |
What exactly is the name validation requirement? Does that require the name section, because otherwise we wouldn't even know what the names would be? And the datacount section only stores the number of data segments. Is that enough to verify the names, if present with the name section? I think, if necessary, requiring the optional datacount section in debug info files generated by How large is the data section usually compared to the debug info? Under 10%? If so it wouldn't really hurt to leave it. I would prefer the option 1 or 3 depending on how large the data section is. |
The name section refers to functions and other entities by their index, and the object file reader will reject a binary with a name section that refers to a function or data segment that it hasn't seen a declaration for (e.g. if there is no function section or data section in the binary, then the index space will be empty). Or more precisely IIRC, if the reader sees a name with an index greater than the number of functions or data segments in the binary, then it knows the name can't be correct. But I don't think there's really any other consequence to having more names than functions/segments.
I don't have "real" numbers on this but my sense from the binaries I've seen is that it's pretty small (especially compared to debug info), probably less than 10% in most cases. So I agree that it isn't a big deal, and it's why I'm comfortable going ahead with landing this even though we aren't totally sure what to do about the data sections. |
|
Yeah, we certainly can start with this, and it wouldn't be a big deal if we leave it. I personally prefer the datacount section option most though, given that it's the smallest and simplest. |
This effectively reverts #17257
Other custom sections need to stay in the file so that the DWARF data can be interpreted by tools such as llvm-dwarfdump.
Fixes #13084