Optimizing Clang and LLD with the generated profile BOLT-INFO: shared object or position-independent executable detected BOLT-INFO: Target architecture: x86_64 BOLT-INFO: BOLT version: c285b7f5139d0870d5ccbdc3f73b254004211030 BOLT-INFO: first alloc address is 0x0 BOLT-INFO: enabling relocation mode BOLT-WARNING: Failed to analyze 1736 relocations BOLT-INFO: pre-processing profile using branch profile reader BOLT-INFO: profile collection done on a binary already processed by BOLT BOLT-WARNING: 1 collisions detected while hashing binary objects. Use -v=1 to see the list. BOLT-INFO: 10324 out of 77038 functions in the binary (13.4%) have non-empty execution profile BOLT-INFO: 459 functions with profile could not be optimized BOLT-INFO: profile for 1 objects was ignored BOLT-INFO: the input contains 38728 (dynamic count : 6191278890) opportunities for macro-fusion optimization that are going to be fixed BOLT-INFO: validate-mem-refs updated 14 object references BOLT-INFO: 2370817 instructions were shortened BOLT-INFO: removed 907 empty blocks BOLT-INFO: merged 1 duplicate CFG edge BOLT-INFO: ICF folded 27 out of 77326 functions in 5 passes. 0 functions had jump tables. BOLT-INFO: Removing all identical functions will save 17.92 KB of code space. Folded functions were called 15787790 times based on profile. BOLT-INFO: inlined 27852 memcpy() calls. The calls were executed 1667817760 times based on profile. BOLT-INFO: ICP Total indirect calls = 7307869491, 825 callsites cover 99% of all indirect calls BOLT-INFO: ICP total indirect callsites with profile = 1225 BOLT-INFO: ICP total jump table callsites = 176 BOLT-INFO: ICP total number of calls = 23001984622 BOLT-INFO: ICP percentage of calls that are indirect = 31.5% BOLT-INFO: ICP percentage of indirect calls that can be optimized = 78.8% BOLT-INFO: ICP percentage of indirect callsites that are optimized = 63.9% BOLT-INFO: ICP number of method load elimination candidates = 0 BOLT-INFO: ICP percentage of method calls candidates that have loads eliminated = 0.0% BOLT-INFO: ICP percentage of indirect branches that are optimized = 0.0% BOLT-INFO: ICP percentage of jump table callsites that are optimized = 0.0% BOLT-INFO: ICP number of jump table callsites that can use hot indices = 0 BOLT-INFO: ICP percentage of jump table callsites that use hot indices = 0.0% BOLT-INFO: simplified 813 out of 23818 loads from a statically computed address. BOLT-INFO: dynamic loads simplified: 56405724 BOLT-INFO: dynamic loads found: 2894875902 BOLT-INFO: 18382 PLT calls in the binary were optimized. BOLT-INFO: basic block reordering modified layout of 7130 functions (69.06% of profiled, 9.22% of total) BOLT-INFO: UCE removed 4 blocks and 159 bytes of code BOLT-INFO: splitting separates 9447524 hot bytes from 19925424 cold bytes (32.16% of split functions is hot). BOLT-INFO: 64 Functions were reordered by LoopInversionPass BOLT-INFO: tail duplication modified 4041 (5.23%) functions; duplicated 5811 blocks (48519 bytes) responsible for 2850656988 dynamic executions (0.33% of all block executions) BOLT-INFO: program-wide dynostats after all optimizations before SCTC and FOP: 491676108618 : executed forward branches 126335967921 : taken forward branches 135427450253 : executed backward branches 62677634517 : taken backward branches 40949023306 : executed unconditional branches 75907982182 : all function calls 18289401186 : indirect calls 10686381122 : PLT calls 4453535275727 : executed instructions 1097615294527 : executed load instructions 613815326543 : executed store instructions 8934866082 : taken jump table branches 0 : taken unknown indirect branches 668052582177 : total branches 229962625744 : taken branches 438089956433 : non-taken conditional branches 189013602438 : taken conditional branches 627103558871 : all conditional branches 536195204006 : executed forward branches (+9.1%) 32934958338 : taken forward branches (-73.9%) 97298924834 : executed backward branches (-28.2%) 46764088135 : taken backward branches (-25.4%) 17404861948 : executed unconditional branches (-57.5%) 63553783295 : all function calls (-16.3%) 10813816231 : indirect calls (-40.9%) 0 : PLT calls (-100.0%) 4433479147884 : executed instructions (-0.5%) 1096421001628 : executed load instructions (-0.1%) 613815326534 : executed store instructions (+0.0%) 8934866082 : taken jump table branches (=) 0 : taken unknown indirect branches (=) 650898990788 : total branches (-2.6%) 97103908421 : taken branches (-57.8%) 553795082367 : non-taken conditional branches (+26.4%) 79699046473 : taken conditional branches (-57.8%) 633494128840 : all conditional branches (+1.0%) BOLT-INFO: SCTC: patched 1050 tail calls (1083 forward) tail calls (0 backward) from a total of 1083 while removing 22 double jumps and removing 1072 basic blocks totalling 5315 bytes of code. CTCs total execution count is 61725169 and the number of times CTCs are taken is 47477565 BOLT-INFO: Peephole: 1585 double jumps patched. BOLT-INFO: Peephole: 1619 tail call traps inserted. BOLT-INFO: Peephole: 0 useless conditional branches removed. BOLT-INFO: setting __hot_start to 0x9800000 BOLT-INFO: setting __hot_end to 0xa5835df BOLT-INFO: shared object or position-independent executable detected BOLT-INFO: Target architecture: x86_64 BOLT-INFO: BOLT version: c285b7f5139d0870d5ccbdc3f73b254004211030 BOLT-INFO: first alloc address is 0x0 BOLT-INFO: enabling relocation mode BOLT-WARNING: Failed to analyze 1978 relocations BOLT-INFO: pre-processing profile using branch profile reader BOLT-INFO: profile collection done on a binary already processed by BOLT BOLT-INFO: 5275 out of 60359 functions in the binary (8.7%) have non-empty execution profile BOLT-INFO: 90 functions with profile could not be optimized BOLT-WARNING: 152 (2.9% of all profiled) functions have invalid (possibly stale) profile. Use -report-stale to see the list. BOLT-WARNING: 23256859658 out of 272817029905 samples in the binary (8.5%) belong to functions with invalid (possibly stale) profile. BOLT-INFO: profile for 5422 objects was ignored BOLT-INFO: the input contains 18059 (dynamic count : 2527703198) opportunities for macro-fusion optimization that are going to be fixed BOLT-INFO: validate-mem-refs updated 13 object references BOLT-INFO: 1080378 instructions were shortened BOLT-INFO: removed 566 empty blocks BOLT-INFO: ICF folded 8 out of 60692 functions in 3 passes. 0 functions had jump tables. BOLT-INFO: Removing all identical functions will save 3.56 KB of code space. Folded functions were called 219892 times based on profile. BOLT-INFO: inlined 16211 memcpy() calls. The calls were executed 509127947 times based on profile. BOLT-INFO: ICP Total indirect calls = 1988474123, 258 callsites cover 99% of all indirect calls BOLT-INFO: ICP total indirect callsites with profile = 433 BOLT-INFO: ICP total jump table callsites = 78 BOLT-INFO: ICP total number of calls = 4689470067 BOLT-INFO: ICP percentage of calls that are indirect = 42.1% BOLT-INFO: ICP percentage of indirect calls that can be optimized = 71.4% BOLT-INFO: ICP percentage of indirect callsites that are optimized = 53.1% BOLT-INFO: ICP number of method load elimination candidates = 0 BOLT-INFO: ICP percentage of method calls candidates that have loads eliminated = 0.0% BOLT-INFO: ICP percentage of indirect branches that are optimized = 0.0% BOLT-INFO: ICP percentage of jump table callsites that are optimized = 0.0% BOLT-INFO: ICP number of jump table callsites that can use hot indices = 0 BOLT-INFO: ICP percentage of jump table callsites that use hot indices = 0.0% BOLT-INFO: simplified 837 out of 17664 loads from a statically computed address. BOLT-INFO: dynamic loads simplified: 12148910 BOLT-INFO: dynamic loads found: 835001949 BOLT-INFO: 11807 PLT calls in the binary were optimized. BOLT-INFO: basic block reordering modified layout of 3137 functions (59.47% of profiled, 5.17% of total) BOLT-INFO: UCE removed 4 blocks and 159 bytes of code BOLT-INFO: splitting separates 4630523 hot bytes from 4379948 cold bytes (51.39% of split functions is hot). BOLT-INFO: 20 Functions were reordered by LoopInversionPass BOLT-INFO: tail duplication modified 2589 (4.27%) functions; duplicated 3883 blocks (26204 bytes) responsible for 723550265 dynamic executions (0.28% of all block executions) BOLT-INFO: program-wide dynostats after all optimizations before SCTC and FOP: 132149629305 : executed forward branches 30643936752 : taken forward branches 51381999324 : executed backward branches 24356415795 : taken backward branches 10497973166 : executed unconditional branches 15998435216 : all function calls 5388698616 : indirect calls 2550774728 : PLT calls 1204297367501 : executed instructions 269130648532 : executed load instructions 145645188100 : executed store instructions 1939645391 : taken jump table branches 0 : taken unknown indirect branches 194029601795 : total branches 65498325713 : taken branches 128531276082 : non-taken conditional branches 55000352547 : taken conditional branches 183531628629 : all conditional branches 141864018664 : executed forward branches (+7.4%) 11147929816 : taken forward branches (-63.6%) 43922056841 : executed backward branches (-14.5%) 21671470162 : taken backward branches (-11.0%) 5580076039 : executed unconditional branches (-46.8%) 12938532539 : all function calls (-19.1%) 2981856000 : indirect calls (-44.7%) 0 : PLT calls (-100.0%) 1201085521516 : executed instructions (-0.3%) 268931475366 : executed load instructions (-0.1%) 145645188093 : executed store instructions (+0.0%) 1939645391 : taken jump table branches (=) 0 : taken unknown indirect branches (=) 191366151544 : total branches (-1.4%) 38399476017 : taken branches (-41.4%) 152966675527 : non-taken conditional branches (+19.0%) 32819399978 : taken conditional branches (-40.3%) 185786075505 : all conditional branches (+1.2%) BOLT-INFO: SCTC: patched 903 tail calls (911 forward) tail calls (0 backward) from a total of 911 while removing 20 double jumps and removing 922 basic blocks totalling 4568 bytes of code. CTCs total execution count is 3878764 and the number of times CTCs are taken is 1119143 BOLT-INFO: Peephole: 857 double jumps patched. BOLT-INFO: Peephole: 1424 tail call traps inserted. BOLT-INFO: Peephole: 0 useless conditional branches removed.