Skip to content

Commit 088f328

Browse files
committed
Auto merge of #46382 - alexcrichton:thinlto-default, r=michaelwoerister
rustc: Prepare to enable ThinLTO by default This commit *almost* enables ThinLTO and multiple codegen units in release mode by default but is blocked on #46346 now before pulling the trigger.
2 parents f9b0897 + 855f6d1 commit 088f328

File tree

15 files changed

+154
-126
lines changed

15 files changed

+154
-126
lines changed

src/librustc/session/config.rs

+15-3
Original file line numberDiff line numberDiff line change
@@ -383,8 +383,13 @@ top_level_options!(
383383
// try to not rely on this too much.
384384
actually_rustdoc: bool [TRACKED],
385385

386-
// Number of object files/codegen units to produce on the backend
386+
// Specifications of codegen units / ThinLTO which are forced as a
387+
// result of parsing command line options. These are not necessarily
388+
// what rustc was invoked with, but massaged a bit to agree with
389+
// commands like `--emit llvm-ir` which they're often incompatible with
390+
// if we otherwise use the defaults of rustc.
387391
cli_forced_codegen_units: Option<usize> [UNTRACKED],
392+
cli_forced_thinlto: Option<bool> [UNTRACKED],
388393
}
389394
);
390395

@@ -566,6 +571,7 @@ pub fn basic_options() -> Options {
566571
debug_assertions: true,
567572
actually_rustdoc: false,
568573
cli_forced_codegen_units: None,
574+
cli_forced_thinlto: None,
569575
}
570576
}
571577

@@ -1163,7 +1169,7 @@ options! {DebuggingOptions, DebuggingSetter, basic_debugging_options,
11631169
"run the non-lexical lifetimes MIR pass"),
11641170
trans_time_graph: bool = (false, parse_bool, [UNTRACKED],
11651171
"generate a graphical HTML report of time spent in trans and LLVM"),
1166-
thinlto: bool = (false, parse_bool, [TRACKED],
1172+
thinlto: Option<bool> = (None, parse_opt_bool, [TRACKED],
11671173
"enable ThinLTO when possible"),
11681174
inline_in_all_cgus: Option<bool> = (None, parse_opt_bool, [TRACKED],
11691175
"control whether #[inline] functions are in all cgus"),
@@ -1599,6 +1605,7 @@ pub fn build_session_options_and_crate_config(matches: &getopts::Matches)
15991605

16001606
let mut cg = build_codegen_options(matches, error_format);
16011607
let mut codegen_units = cg.codegen_units;
1608+
let mut thinlto = None;
16021609

16031610
// Issue #30063: if user requests llvm-related output to one
16041611
// particular path, disable codegen-units.
@@ -1620,9 +1627,13 @@ pub fn build_session_options_and_crate_config(matches: &getopts::Matches)
16201627
}
16211628
early_warn(error_format, "resetting to default -C codegen-units=1");
16221629
codegen_units = Some(1);
1630+
thinlto = Some(false);
16231631
}
16241632
}
1625-
_ => codegen_units = Some(1),
1633+
_ => {
1634+
codegen_units = Some(1);
1635+
thinlto = Some(false);
1636+
}
16261637
}
16271638
}
16281639

@@ -1832,6 +1843,7 @@ pub fn build_session_options_and_crate_config(matches: &getopts::Matches)
18321843
debug_assertions,
18331844
actually_rustdoc: false,
18341845
cli_forced_codegen_units: codegen_units,
1846+
cli_forced_thinlto: thinlto,
18351847
},
18361848
cfg)
18371849
}

src/librustc/session/mod.rs

+82-21
Original file line numberDiff line numberDiff line change
@@ -656,30 +656,91 @@ impl Session {
656656
return n as usize
657657
}
658658

659+
// Why is 16 codegen units the default all the time?
660+
//
661+
// The main reason for enabling multiple codegen units by default is to
662+
// leverage the ability for the trans backend to do translation and
663+
// codegen in parallel. This allows us, especially for large crates, to
664+
// make good use of all available resources on the machine once we've
665+
// hit that stage of compilation. Large crates especially then often
666+
// take a long time in trans/codegen and this helps us amortize that
667+
// cost.
668+
//
669+
// Note that a high number here doesn't mean that we'll be spawning a
670+
// large number of threads in parallel. The backend of rustc contains
671+
// global rate limiting through the `jobserver` crate so we'll never
672+
// overload the system with too much work, but rather we'll only be
673+
// optimizing when we're otherwise cooperating with other instances of
674+
// rustc.
675+
//
676+
// Rather a high number here means that we should be able to keep a lot
677+
// of idle cpus busy. By ensuring that no codegen unit takes *too* long
678+
// to build we'll be guaranteed that all cpus will finish pretty closely
679+
// to one another and we should make relatively optimal use of system
680+
// resources
681+
//
682+
// Note that the main cost of codegen units is that it prevents LLVM
683+
// from inlining across codegen units. Users in general don't have a lot
684+
// of control over how codegen units are split up so it's our job in the
685+
// compiler to ensure that undue performance isn't lost when using
686+
// codegen units (aka we can't require everyone to slap `#[inline]` on
687+
// everything).
688+
//
689+
// If we're compiling at `-O0` then the number doesn't really matter too
690+
// much because performance doesn't matter and inlining is ok to lose.
691+
// In debug mode we just want to try to guarantee that no cpu is stuck
692+
// doing work that could otherwise be farmed to others.
693+
//
694+
// In release mode, however (O1 and above) performance does indeed
695+
// matter! To recover the loss in performance due to inlining we'll be
696+
// enabling ThinLTO by default (the function for which is just below).
697+
// This will ensure that we recover any inlining wins we otherwise lost
698+
// through codegen unit partitioning.
699+
//
700+
// ---
701+
//
702+
// Ok that's a lot of words but the basic tl;dr; is that we want a high
703+
// number here -- but not too high. Additionally we're "safe" to have it
704+
// always at the same number at all optimization levels.
705+
//
706+
// As a result 16 was chosen here! Mostly because it was a power of 2
707+
// and most benchmarks agreed it was roughly a local optimum. Not very
708+
// scientific.
659709
match self.opts.optimize {
660-
// If we're compiling at `-O0` then default to 16 codegen units.
661-
// The number here shouldn't matter too too much as debug mode
662-
// builds don't rely on performance at all, meaning that lost
663-
// opportunities for inlining through multiple codegen units is
664-
// a non-issue.
665-
//
666-
// Note that the high number here doesn't mean that we'll be
667-
// spawning a large number of threads in parallel. The backend
668-
// of rustc contains global rate limiting through the
669-
// `jobserver` crate so we'll never overload the system with too
670-
// much work, but rather we'll only be optimizing when we're
671-
// otherwise cooperating with other instances of rustc.
672-
//
673-
// Rather the high number here means that we should be able to
674-
// keep a lot of idle cpus busy. By ensuring that no codegen
675-
// unit takes *too* long to build we'll be guaranteed that all
676-
// cpus will finish pretty closely to one another and we should
677-
// make relatively optimal use of system resources
678710
config::OptLevel::No => 16,
711+
_ => 1, // FIXME(#46346) this should be 16
712+
}
713+
}
679714

680-
// All other optimization levels default use one codegen unit,
681-
// the historical default in Rust for a Long Time.
682-
_ => 1,
715+
/// Returns whether ThinLTO is enabled for this compilation
716+
pub fn thinlto(&self) -> bool {
717+
// If processing command line options determined that we're incompatible
718+
// with ThinLTO (e.g. `-C lto --emit llvm-ir`) then return that option.
719+
if let Some(enabled) = self.opts.cli_forced_thinlto {
720+
return enabled
721+
}
722+
723+
// If explicitly specified, use that with the next highest priority
724+
if let Some(enabled) = self.opts.debugging_opts.thinlto {
725+
return enabled
726+
}
727+
728+
// If there's only one codegen unit and LTO isn't enabled then there's
729+
// no need for ThinLTO so just return false.
730+
if self.codegen_units() == 1 && !self.lto() {
731+
return false
732+
}
733+
734+
// Right now ThinLTO isn't compatible with incremental compilation.
735+
if self.opts.incremental.is_some() {
736+
return false
737+
}
738+
739+
// Now we're in "defaults" territory. By default we enable ThinLTO for
740+
// optimized compiles (anything greater than O0).
741+
match self.opts.optimize {
742+
config::OptLevel::No => false,
743+
_ => true,
683744
}
684745
}
685746
}

src/librustc_trans/back/write.rs

+3-2
Original file line numberDiff line numberDiff line change
@@ -1402,8 +1402,9 @@ fn start_executing_work(tcx: TyCtxt,
14021402
// for doesn't require full LTO. Some targets require one LLVM module
14031403
// (they effectively don't have a linker) so it's up to us to use LTO to
14041404
// link everything together.
1405-
thinlto: sess.opts.debugging_opts.thinlto &&
1406-
!sess.target.target.options.requires_lto,
1405+
thinlto: sess.thinlto() &&
1406+
!sess.target.target.options.requires_lto &&
1407+
unsafe { llvm::LLVMRustThinLTOAvailable() },
14071408

14081409
no_landing_pads: sess.no_landing_pads(),
14091410
save_temps: sess.opts.cg.save_temps,

src/librustc_trans/base.rs

+1-1
Original file line numberDiff line numberDiff line change
@@ -704,7 +704,7 @@ pub fn trans_crate<'a, 'tcx>(tcx: TyCtxt<'a, 'tcx, 'tcx>,
704704

705705
check_for_rustc_errors_attr(tcx);
706706

707-
if tcx.sess.opts.debugging_opts.thinlto {
707+
if let Some(true) = tcx.sess.opts.debugging_opts.thinlto {
708708
if unsafe { !llvm::LLVMRustThinLTOAvailable() } {
709709
tcx.sess.fatal("this compiler's LLVM does not support ThinLTO");
710710
}

src/libstd/sys_common/backtrace.rs

+20-2
Original file line numberDiff line numberDiff line change
@@ -252,8 +252,26 @@ fn output_fileline(w: &mut Write,
252252
// Note that this demangler isn't quite as fancy as it could be. We have lots
253253
// of other information in our symbols like hashes, version, type information,
254254
// etc. Additionally, this doesn't handle glue symbols at all.
255-
pub fn demangle(writer: &mut Write, s: &str, format: PrintFormat) -> io::Result<()> {
256-
// First validate the symbol. If it doesn't look like anything we're
255+
pub fn demangle(writer: &mut Write, mut s: &str, format: PrintFormat) -> io::Result<()> {
256+
// During ThinLTO LLVM may import and rename internal symbols, so strip out
257+
// those endings first as they're one of the last manglings applied to
258+
// symbol names.
259+
let llvm = ".llvm.";
260+
if let Some(i) = s.find(llvm) {
261+
let candidate = &s[i + llvm.len()..];
262+
let all_hex = candidate.chars().all(|c| {
263+
match c {
264+
'A' ... 'F' | '0' ... '9' => true,
265+
_ => false,
266+
}
267+
});
268+
269+
if all_hex {
270+
s = &s[..i];
271+
}
272+
}
273+
274+
// Validate the symbol. If it doesn't look like anything we're
257275
// expecting, we just print it literally. Note that we must handle non-rust
258276
// symbols because we could have any function in the backtrace.
259277
let mut valid = true;

src/rustllvm/PassWrapper.cpp

+23-87
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
#include <stdio.h>
1212

1313
#include <vector>
14+
#include <set>
1415

1516
#include "rustllvm.h"
1617

@@ -885,86 +886,6 @@ getFirstDefinitionForLinker(const GlobalValueSummaryList &GVSummaryList) {
885886
return FirstDefForLinker->get();
886887
}
887888

888-
// This is a helper function we added that isn't present in LLVM's source.
889-
//
890-
// The way LTO works in Rust is that we typically have a number of symbols that
891-
// we know ahead of time need to be preserved. We want to ensure that ThinLTO
892-
// doesn't accidentally internalize any of these and otherwise is always
893-
// ready to keep them linking correctly.
894-
//
895-
// This function will recursively walk the `GUID` provided and all of its
896-
// references, as specified in the `Index`. In other words, we're taking a
897-
// `GUID` as input, adding it to `Preserved`, and then taking all `GUID`
898-
// items that the input references and recursing.
899-
static void
900-
addPreservedGUID(const ModuleSummaryIndex &Index,
901-
DenseSet<GlobalValue::GUID> &Preserved,
902-
GlobalValue::GUID GUID) {
903-
if (Preserved.count(GUID))
904-
return;
905-
Preserved.insert(GUID);
906-
907-
#if LLVM_VERSION_GE(5, 0)
908-
auto Info = Index.getValueInfo(GUID);
909-
if (!Info) {
910-
return;
911-
}
912-
for (auto &Summary : Info.getSummaryList()) {
913-
for (auto &Ref : Summary->refs()) {
914-
addPreservedGUID(Index, Preserved, Ref.getGUID());
915-
}
916-
917-
GlobalValueSummary *GVSummary = Summary.get();
918-
if (isa<FunctionSummary>(GVSummary)) {
919-
auto *FS = cast<FunctionSummary>(GVSummary);
920-
for (auto &Call: FS->calls()) {
921-
addPreservedGUID(Index, Preserved, Call.first.getGUID());
922-
}
923-
for (auto &GUID: FS->type_tests()) {
924-
addPreservedGUID(Index, Preserved, GUID);
925-
}
926-
}
927-
if (isa<AliasSummary>(GVSummary)) {
928-
auto *AS = cast<AliasSummary>(GVSummary);
929-
auto GUID = AS->getAliasee().getOriginalName();
930-
addPreservedGUID(Index, Preserved, GUID);
931-
}
932-
}
933-
#else
934-
auto SummaryList = Index.findGlobalValueSummaryList(GUID);
935-
if (SummaryList == Index.end())
936-
return;
937-
for (auto &Summary : SummaryList->second) {
938-
for (auto &Ref : Summary->refs()) {
939-
if (Ref.isGUID()) {
940-
addPreservedGUID(Index, Preserved, Ref.getGUID());
941-
} else {
942-
auto Value = Ref.getValue();
943-
addPreservedGUID(Index, Preserved, Value->getGUID());
944-
}
945-
}
946-
947-
if (auto *FS = dyn_cast<FunctionSummary>(Summary.get())) {
948-
for (auto &Call: FS->calls()) {
949-
if (Call.first.isGUID()) {
950-
addPreservedGUID(Index, Preserved, Call.first.getGUID());
951-
} else {
952-
auto Value = Call.first.getValue();
953-
addPreservedGUID(Index, Preserved, Value->getGUID());
954-
}
955-
}
956-
for (auto &GUID: FS->type_tests()) {
957-
addPreservedGUID(Index, Preserved, GUID);
958-
}
959-
}
960-
if (auto *AS = dyn_cast<AliasSummary>(Summary.get())) {
961-
auto GUID = AS->getAliasee().getOriginalName();
962-
addPreservedGUID(Index, Preserved, GUID);
963-
}
964-
}
965-
#endif
966-
}
967-
968889
// The main entry point for creating the global ThinLTO analysis. The structure
969890
// here is basically the same as before threads are spawned in the `run`
970891
// function of `lib/LTO/ThinLTOCodeGenerator.cpp`.
@@ -1004,12 +925,10 @@ LLVMRustCreateThinLTOData(LLVMRustThinLTOModule *modules,
1004925
Ret->Index.collectDefinedGVSummariesPerModule(Ret->ModuleToDefinedGVSummaries);
1005926

1006927
// Convert the preserved symbols set from string to GUID, this is then needed
1007-
// for internalization. We use `addPreservedGUID` to include any transitively
1008-
// used symbol as well.
928+
// for internalization.
1009929
for (int i = 0; i < num_symbols; i++) {
1010-
addPreservedGUID(Ret->Index,
1011-
Ret->GUIDPreservedSymbols,
1012-
GlobalValue::getGUID(preserved_symbols[i]));
930+
auto GUID = GlobalValue::getGUID(preserved_symbols[i]);
931+
Ret->GUIDPreservedSymbols.insert(GUID);
1013932
}
1014933

1015934
// Collect the import/export lists for all modules from the call-graph in the
@@ -1038,7 +957,8 @@ LLVMRustCreateThinLTOData(LLVMRustThinLTOModule *modules,
1038957
// Resolve LinkOnce/Weak symbols, this has to be computed early be cause it
1039958
// impacts the caching.
1040959
//
1041-
// This is copied from `lib/LTO/ThinLTOCodeGenerator.cpp`
960+
// This is copied from `lib/LTO/ThinLTOCodeGenerator.cpp` with some of this
961+
// being lifted from `lib/LTO/LTO.cpp` as well
1042962
StringMap<std::map<GlobalValue::GUID, GlobalValue::LinkageTypes>> ResolvedODR;
1043963
DenseMap<GlobalValue::GUID, const GlobalValueSummary *> PrevailingCopy;
1044964
for (auto &I : Ret->Index) {
@@ -1062,11 +982,27 @@ LLVMRustCreateThinLTOData(LLVMRustThinLTOModule *modules,
1062982
ResolvedODR[ModuleIdentifier][GUID] = NewLinkage;
1063983
};
1064984
thinLTOResolveWeakForLinkerInIndex(Ret->Index, isPrevailing, recordNewLinkage);
985+
986+
// Here we calculate an `ExportedGUIDs` set for use in the `isExported`
987+
// callback below. This callback below will dictate the linkage for all
988+
// summaries in the index, and we basically just only want to ensure that dead
989+
// symbols are internalized. Otherwise everything that's already external
990+
// linkage will stay as external, and internal will stay as internal.
991+
std::set<GlobalValue::GUID> ExportedGUIDs;
992+
for (auto &List : Ret->Index) {
993+
for (auto &GVS: List.second) {
994+
if (!GlobalValue::isExternalLinkage(GVS->linkage()))
995+
continue;
996+
auto GUID = GVS->getOriginalName();
997+
if (!DeadSymbols.count(GUID))
998+
ExportedGUIDs.insert(GUID);
999+
}
1000+
}
10651001
auto isExported = [&](StringRef ModuleIdentifier, GlobalValue::GUID GUID) {
10661002
const auto &ExportList = Ret->ExportLists.find(ModuleIdentifier);
10671003
return (ExportList != Ret->ExportLists.end() &&
10681004
ExportList->second.count(GUID)) ||
1069-
Ret->GUIDPreservedSymbols.count(GUID);
1005+
ExportedGUIDs.count(GUID);
10701006
};
10711007
thinLTOInternalizeAndPromoteInIndex(Ret->Index, isExported);
10721008

src/test/run-fail/mir_trans_no_landing_pads.rs

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
// option. This file may not be copied, modified, or distributed
99
// except according to those terms.
1010

11-
// compile-flags: -Z no-landing-pads
11+
// compile-flags: -Z no-landing-pads -C codegen-units=1
1212
// error-pattern:converging_fn called
1313
use std::io::{self, Write};
1414

src/test/run-fail/mir_trans_no_landing_pads_diverging.rs

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
// option. This file may not be copied, modified, or distributed
99
// except according to those terms.
1010

11-
// compile-flags: -Z no-landing-pads
11+
// compile-flags: -Z no-landing-pads -C codegen-units=1
1212
// error-pattern:diverging_fn called
1313
use std::io::{self, Write};
1414

0 commit comments

Comments
 (0)