-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trivial dependencies on large crates pull in massive amounts of debuginfo #56068
Comments
It seems to me that this debuginfo should have been garbage collected at some point, by the linker if not before, but for some reason that isn't happening.
|
@tromey you might be interested in this. |
Also, |
Similar results with |
And similar results with It seems to me that the object files that don't have any code eventually included in the binary should be skipped by the linker, with their debuginfo not linked in at all. |
It appears that because Rust divides code up into codegen units (i.e. object files) somewhat arbitrarily, as soon as you pull one from a crate, you end up having to link them all as you transitively resolve undefined symbols. Then it's up to |
In that case, probably the best option here would be to teach the linker's |
This issue has been discussed before, e.g. http://sourceware-org.1504.n7.nabble.com/Re-Debuggin-info-for-unused-sections-td112685.html. I guess this isn't a high priority for C/C++ libraries since developers can divide them up manually into separate object files so that not all object files need to be linked into every application. So implementing the above optimization, though it needs to be done in the linker, would mainly be for the benefit of Rust. |
I implemented this in LLD. It works well for some of my binaries, and for the testcase here, but only shrank the overall size of my built binaries from 2.4G to 2.0G, so I'm not sure whether it's worth the effort to push upstream... |
Let's give it a go. https://reviews.llvm.org/D54747 |
I have the exact same issue, and it's really problematic because it results in up 600MB of memory usage when generating a backtrace, for only 50MB of debug info, for a binary that is only 643KB in size when stripped! That's literally 1000x increase in memory usage 😱 |
Patch by Robert O'Callahan. Rust projects tend to link in all object files from all dependent libraries and rely on --gc-sections to strip unused code and data. Unfortunately --gc-sections doesn't currently strip any debuginfo associated with GC'ed sections, so lld links in the full debuginfo from all dependencies even if almost all that code has been discarded. See rust-lang/rust#56068 for some details. Properly stripping debuginfo for discarded sections would be difficult, but a simple approach that helps significantly is to mark debuginfo sections as live only if their associated object file has at least one live code/data section. This patch does that. In a (contrived but not totally artificial) Rust testcase linked above, it reduces the final binary size from 46MB to 5.1MB. Differential Revision: https://reviews.llvm.org/D54747 git-svn-id: https://llvm.org/svn/llvm-project/lld/trunk@358069 91177308-0d34-0410-b5e6-96231b3b80d8
Patch by Robert O'Callahan. Rust projects tend to link in all object files from all dependent libraries and rely on --gc-sections to strip unused code and data. Unfortunately --gc-sections doesn't currently strip any debuginfo associated with GC'ed sections, so lld links in the full debuginfo from all dependencies even if almost all that code has been discarded. See rust-lang/rust#56068 for some details. Properly stripping debuginfo for discarded sections would be difficult, but a simple approach that helps significantly is to mark debuginfo sections as live only if their associated object file has at least one live code/data section. This patch does that. In a (contrived but not totally artificial) Rust testcase linked above, it reduces the final binary size from 46MB to 5.1MB. Differential Revision: https://reviews.llvm.org/D54747 llvm-svn: 358069
It seems LLD patch is merged. How can I take advantage of this today? |
Build LLD master and use it. LLD is pretty easy to build. |
It was unfortunately reverted back in May (shortly after @rocallahan last comment above...) |
I am creating patches for Thunderbird mail client (originally from Mozilla) from time to time when I notice a bug here and there. I have a nagging problem since early this year. On my local PC, I see that the file directory size (with the sizes of MOZ_OBJ is 45.8 GB large. This is the top-most directory for storing binary object files. I have learned of this issue of large debug info from a discussion in a mozilla mailing list. It would be super if we can reduce the size of debug info (.dwo) for rust object files. Thank you for the great package otherwise. TIA |
Our Linux release binary was hilariously large, weighing in at nearly 800MB (!). Nearly all of the bloat was from DWARF debug info: $ bloaty materialized -n 10 FILE SIZE VM SIZE -------------- -------------- 24.5% 194Mi 0.0% 0 .debug_info 24.1% 191Mi 0.0% 0 .debug_loc 13.8% 109Mi 0.0% 0 .debug_pubtypes 10.1% 79.9Mi 0.0% 0 .debug_pubnames 8.8% 70.0Mi 0.0% 0 .debug_str 8.3% 66.3Mi 0.0% 0 .debug_ranges 4.4% 35.3Mi 0.0% 0 .debug_line 3.1% 24.8Mi 66.3% 24.8Mi .text 1.8% 14.4Mi 25.1% 9.39Mi [41 Others] 0.6% 4.79Mi 0.0% 0 .strtab 0.4% 3.22Mi 8.6% 3.22Mi .eh_frame 100.0% 793Mi 100.0% 37.4Mi TOTAL This patch gets a handle on this by attacking the problem from several angles: 1. We instruct the linker to compress debug info sections. Most of the debug info is redundant and compresses exceptionally well. Part of the reason we didn't notice the issue is because our Docker images and gzipped tarballs were relatively small (~150MB). 2. We strip out the unnecessary `.debug_pubnames` and `.debug_pubtypes` from the binary. This works around a known Rust bug (rust-lang/rust#46034). 3. We ask Rust to generate less debug info for release builds, limiting it to line info. This is enough information to symbolicate a backtrace, but not enough information to run an interactive debugger. This is usually the right tradeoff for a release build. $ bloaty materialized -n 10 VM SIZE FILE SIZE -------------- -------------- 0.0% 0 .debug_info 31.9Mi 33.8% 70.5% 25.0Mi .text 25.0Mi 26.5% 0.0% 0 .debug_str 7.54Mi 8.0% 0.0% 0 .debug_line 6.36Mi 6.7% 9.4% 3.33Mi [38 Others] 5.36Mi 5.7% 0.0% 0 .strtab 4.71Mi 5.0% 0.0% 0 .debug_ranges 3.55Mi 3.8% 8.8% 3.11Mi .eh_frame 3.11Mi 3.3% 0.0% 0 .symtab 2.87Mi 3.0% 6.0% 2.12Mi .rodata 2.12Mi 2.2% 5.4% 1.92Mi .gcc_except_table 1.92Mi 2.0% 100.0% 35.5Mi TOTAL 94.4Mi 100.0% One issue remains unsolved, which is that Rust/LLVM cannot currently garbage collect DWARF that refers to unused symbols/types. The actual symbols get cut from the binary, but their debug info remains. Follow rust-lang/rust#56068 and LLVM D74169 [0] if curious. I tested with the aforementioned lld patch (and none of the other changes) and it cut the binary down to 300MB. With the other changes, the savings are less substantial, but probably another 10MB to be had. [0]: https://reviews.llvm.org/D74169
Our Linux release binary was hilariously large, weighing in at nearly 800MB (!). Nearly all of the bloat was from DWARF debug info: $ bloaty materialized -n 10 FILE SIZE VM SIZE -------------- -------------- 24.5% 194Mi 0.0% 0 .debug_info 24.1% 191Mi 0.0% 0 .debug_loc 13.8% 109Mi 0.0% 0 .debug_pubtypes 10.1% 79.9Mi 0.0% 0 .debug_pubnames 8.8% 70.0Mi 0.0% 0 .debug_str 8.3% 66.3Mi 0.0% 0 .debug_ranges 4.4% 35.3Mi 0.0% 0 .debug_line 3.1% 24.8Mi 66.3% 24.8Mi .text 1.8% 14.4Mi 25.1% 9.39Mi [41 Others] 0.6% 4.79Mi 0.0% 0 .strtab 0.4% 3.22Mi 8.6% 3.22Mi .eh_frame 100.0% 793Mi 100.0% 37.4Mi TOTAL This patch gets a handle on this by attacking the problem from several angles: 1. We instruct the linker to compress debug info sections. Most of the debug info is redundant and compresses exceptionally well. Part of the reason we didn't notice the issue is because our Docker images and gzipped tarballs were relatively small (~150MB). 2. We strip out the unnecessary `.debug_pubnames` and `.debug_pubtypes` sections from the binary. This works around a known Rust bug (rust-lang/rust#46034). 3. We ask Rust to generate less debug info for release builds, limiting it to line info. This is enough information to symbolicate a backtrace, but not enough information to run an interactive debugger. This is usually the right tradeoff for a release build. $ bloaty materialized -n 10 FILE SIZE VM SIZE -------------- -------------- 33.8% 31.9Mi 0.0% 0 .debug_info 26.5% 25.0Mi 70.5% 25.0Mi .text 8.0% 7.54Mi 0.0% 0 .debug_str 6.7% 6.36Mi 0.0% 0 .debug_line 5.7% 5.36Mi 9.4% 3.33Mi [38 Others] 5.0% 4.71Mi 0.0% 0 .strtab 3.8% 3.55Mi 0.0% 0 .debug_ranges 3.3% 3.11Mi 8.8% 3.11Mi .eh_frame 3.0% 2.87Mi 0.0% 0 .symtab 2.2% 2.12Mi 6.0% 2.12Mi .rodata 2.0% 1.92Mi 5.4% 1.92Mi .gcc_except_table 100.0% 94.4Mi 100.0% 35.5Mi TOTAL One issue remains unsolved, which is that Rust/LLVM cannot currently garbage collect DWARF that refers to unused symbols/types. The actual symbols get cut from the binary, but their debug info remains. Follow rust-lang/rust#56068 and LLVM D74169 [0] if curious. I tested with the aforementioned lld patch and the resulting binary is even small, at 71MB, so there's another 25MB of savings to be had there. (That patch on its own, without the other changes, cuts the ~800MB binary to a ~300MB binary, so it's an impressive piece of work. Unfortunately it also increases link time by 15-25x.) [0]: https://reviews.llvm.org/D74169
There's an LLVM patch out that adds a Looks to be a ways away from landing, but I tested it on a relatively large Rust project and it was able to reduce an 800MB binary to a 300MB binary (with full debug info). Details in a comment here: https://reviews.llvm.org/D74169#1990180 |
Our Linux release binary was hilariously large, weighing in at nearly 800MB (!). Nearly all of the bloat was from DWARF debug info: $ bloaty materialized -n 10 FILE SIZE VM SIZE -------------- -------------- 24.5% 194Mi 0.0% 0 .debug_info 24.1% 191Mi 0.0% 0 .debug_loc 13.8% 109Mi 0.0% 0 .debug_pubtypes 10.1% 79.9Mi 0.0% 0 .debug_pubnames 8.8% 70.0Mi 0.0% 0 .debug_str 8.3% 66.3Mi 0.0% 0 .debug_ranges 4.4% 35.3Mi 0.0% 0 .debug_line 3.1% 24.8Mi 66.3% 24.8Mi .text 1.8% 14.4Mi 25.1% 9.39Mi [41 Others] 0.6% 4.79Mi 0.0% 0 .strtab 0.4% 3.22Mi 8.6% 3.22Mi .eh_frame 100.0% 793Mi 100.0% 37.4Mi TOTAL This patch gets a handle on this by attacking the problem from several angles: 1. We instruct the linker to compress debug info sections. Most of the debug info is redundant and compresses exceptionally well. Part of the reason we didn't notice the issue is because our Docker images and gzipped tarballs were relatively small (~150MB). 2. We strip out the unnecessary `.debug_pubnames` and `.debug_pubtypes` sections from the binary. This works around a known Rust bug (rust-lang/rust#46034). 3. We ask Rust to generate less debug info for release builds, limiting it to line info. This is enough information to symbolicate a backtrace, but not enough information to run an interactive debugger. This is usually the right tradeoff for a release build. $ bloaty materialized -n 10 FILE SIZE VM SIZE -------------- -------------- 33.8% 31.9Mi 0.0% 0 .debug_info 26.5% 25.0Mi 70.5% 25.0Mi .text 8.0% 7.54Mi 0.0% 0 .debug_str 6.7% 6.36Mi 0.0% 0 .debug_line 5.7% 5.36Mi 9.4% 3.33Mi [38 Others] 5.0% 4.71Mi 0.0% 0 .strtab 3.8% 3.55Mi 0.0% 0 .debug_ranges 3.3% 3.11Mi 8.8% 3.11Mi .eh_frame 3.0% 2.87Mi 0.0% 0 .symtab 2.2% 2.12Mi 6.0% 2.12Mi .rodata 2.0% 1.92Mi 5.4% 1.92Mi .gcc_except_table 100.0% 94.4Mi 100.0% 35.5Mi TOTAL One issue remains unsolved, which is that Rust/LLVM cannot currently garbage collect DWARF that refers to unused symbols/types. The actual symbols get cut from the binary, but their debug info remains. Follow rust-lang/rust#56068 and LLVM D74169 [0] if curious. I tested with the aforementioned lld patch and the resulting binary is even small, at 71MB, so there's another 25MB of savings to be had there. (That patch on its own, without the other changes, cuts the ~800MB binary to a ~300MB binary, so it's an impressive piece of work. Unfortunately it also increases link time by 15-25x.) [0]: https://reviews.llvm.org/D74169
On 2020/04/19 3:23, Nikhil Benesch wrote:
There's an LLVM patch out that adds a |--gc-debuginfo| flag to |lld|
that seems to have exactly the desired effect:
https://reviews.llvm.org/D74169
Looks to be a ways away from landing, but I tested it on a relatively
large Rust project and it was able to reduce an 800MB binary to a
300MB binary (with full debug info). Details in a comment here:
https://reviews.llvm.org/D74169#1990180
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#56068 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACUNALDPTKMXIR4M4IKAIE3RNHV2FANCNFSM4GFDXILQ>.
Thank you for the update.
I will investigate the new option to lld and see if it helps me for my
local development of mozilla Thunderbird mailer.
It seems to me that many developers are now so used to fast SSD storage
that they are unaware of binary bloat very well.
If they use old-fashined hard disks, and slow ones at that, they will
notice the problem very fast.
Thank you again.
Regards,
|
This continues to be an enormous problem. I have a binary that's 1% useful stuff and 99% debug info, almost all of which is for dead code. And that |
Agreed. The majority of DWARF debug records in even a hello world application are dead code with bogus PC ranges all starting at |
rusoto_core::Region
is a very simple enum type. This doesn't run any interesting code. The resulting Linux debug-build binary is 41MB. If I replace®ION
with0
the binary is 7.5MB.readelf -a
shows that.text
is 370,558 bytes..debug_info
is 10,541,677 bytes. The other debug sections account for most of the rest.Inspecting the debuginfo shows DWARF compilation units for
rusoto_core
and lots of its dependencies that are entirely dead code. For example:... followed by 85 more identical ranges. These all indicate empty code ranges; all code for this CU has been stripped by the linker. However, this CU still has a ton of debuginfo for types and for functions. E.g.:
All
DW_AT_low_pc
s ofDW_TAG_subprogram
,DW_TAG_lexical_block
andDW_TAG_inlined_subroutine
in this CU are zero. There are a lot of CUs like this.The text was updated successfully, but these errors were encountered: