-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more optimizations to the release build profile. #11298
Comments
I suspect |
Compile times are a sensitive area and increasing them in any way is always a worry. I ran a test on my machine (M1 Max chip 32GB ram) against a private project that has 395 dependencies (many are very heavy) and the jump in compile time is large for "fat" LTO. The results also appear to agree that One thing to consider is if changing the Test Results
"fat" codegen=1[profile.production]
inherits = "release"
lto = true
codegen-units = 1
opt-level = 3 "fat"[profile.production]
inherits = "release"
lto = true
opt-level = 3 "thin" codegen=1[profile.production]
inherits = "release"
lto = "thin"
codegen-units = 1
opt-level = 3 "thin"[profile.production]
inherits = "release"
lto = "thin"
opt-level = 3 No LTO codegen=1[profile.production]
inherits = "release"
codegen-units = 1
opt-level = 3 |
I completely agree with you, lto is, whatever the value, a great performance increase |
That's what I thought at first, adding a "production" profile that would optimize the binary to the maximum and which would use the flags I mentioned in the issue. |
Just want to make it clear. Are we proposing a new built-in profile here? |
Currently ThinLTO creates issues with at least one common target while linking with |
It depends on what the majority of devs think. It could be a new built-in profile or it could simply be en enhancement to the current release build profile. |
This is what we'll use to generate the distributable artifacts. Some of these things don't exist on the default release profile because there's an aversion to making compilation times longer. This also moves `strip` to just the production profile and adds "thin" LTO to the release profile. It'll be nice to have a faster version of LTO for local release builds (and benchmarks), but it won't add too much time for compiling. See: - https://doc.rust-lang.org/cargo/reference/profiles.html - https://nnethercote.github.io/perf-book/build-configuration.html - rust-lang/cargo#11298
This is what we'll use to generate the distributable artifacts. Some of these things don't exist on the default release profile because there's an aversion to making compilation times longer. This also moves `strip` to just the production profile and adds "thin" LTO to the release profile. It'll be nice to have a faster version of LTO for local release builds (and benchmarks), but it won't add too much time for compiling. See: - https://doc.rust-lang.org/cargo/reference/profiles.html - https://nnethercote.github.io/perf-book/build-configuration.html - rust-lang/cargo#11298
I don't think the It may be difficult to get everyone agree on what is desired for some "dist"/"production" profile. For example, I care about small executable sizes more than backtraces, so my preferred dist profiles include |
Agreeing on a dist profile would be near impossible, yeah. I think that a better way would be to add some guidance/templates/profile wizard to cargo, which would help users build the desired profile interactively, with some predefined presets. |
Each option for a build configuration has pros and cons, depending on the needs of the developer. |
Created a Cargo subcommand for configuring Cargo projects (focused on performance aspects of configuration), might be useful to automate the creation of optimized Cargo profiles: https://github.com/Kobzol/cargo-wizard. |
I agree that making everyone agree on one new profile is impossible. I think it should then be considered to possibly add multiple new build profiles, each fulfilling a goal, or at least mention new ones in the docs, offering the developers the possibility to modify their 'release' profile to maximize(or minimize) a certain aspect.
|
Hi! I've created a #14738 that enables fat Link-Time Optimization (LTO) by default for new packages which are started via As I can see, here the discussion is mostly about already existing packages which may have a huge number of dependencies and the guys are right about usage of LTO with such packages may leads to troubles with compile time/memory consumption. However, a new projects will not have those issues since not so many code was written. And later, when the codebase of new package will grow, these optimizations can be disabled so, I do not see that it is a big issue to suggest users maximal optimization from beginning with the possibility to disable them in time. For me, enabling of LTO looks like a next step in level of optimizations, right after -O3, so, that is why I would prefer to have enabled |
Fat LTO can be an incredible compile-time hog, and in my experience often with little gain over thin LTO. I would also rather see a new built-in ultra optimized profile than modifying release, tbh. You said that LTO is not an issue for new projects, but the problem with that is that Rust compiles everything from scratch. So it's enough to add 2-3 lines to your Cargo.toml But maybe my intuition is wrong. In any case, I think that before making any kind of decision like this, we should get compilation benchmark results across the ecosystem. The Rust compiler benchmark suite could be used for this, I did something similar with it recently (I analyzed different things than the effect of LTO on compilation time, though). This might also be interesting to you: https://github.com/zamazan4ik/lto-statistics |
I agree with @Kobzol, optimizations that heavily impact the compile time should be used on a new, dedicated, "heavy" profile. |
How often do rust developers compile release config in modify-build-test cycle? It seems to me that mostly it is happen in CI(/CD) rather on developers' machines. So, I do not see a problem here. And since it is in CI, it will relatively easy to collect statistic of compilation times during package development and decide to use lighter LTO version or refuse it.
Although an idea about a new profile (
I'm looking on #14719 and I do not see a real problem here. As I can see cargo itself has about 350 packages and it compiles about 1 min in CI. With enabled LTO compilation takes only 4 min in the worst case (I write PR descriptions in a much longer time). And assuming that developers just calls
I think for seeing on these graphs properly, one need to enable LTO default for whole ecosystem :)
Thank you! As I can see, currently Also, sometimes developers create faster release builds like here where thin LTO is used: Thanks! Nice flags! I will add them into my branch :) |
As a maintainer of Cargo I do see problems. I sometimes build in release mode for testing some subtle bugs and I don't want to wait for 4 minutes for each change I made. Also, the 4 minutes build was on a machine with more than 100 cores (granted some cases are under I think one of the issues underneath is how to teach and discover optimizations options. We have an awesome The Rust Performance Book, though it is not official and not mentioned in The Cargo Book. The Cargo Book is like a reference and doesn't really provide a guide for optimization. |
|
LTO is applied at LINK time, not compile time. |
Even though it's not as important as debug/incr for modify-build-test, there are definitely use-cases where people use release for local rebuilds, be it e.g. for faster tests or in domains where it is in fact required (e.g. bevy and games). The performance of debug Rust programs is notoriously bad. Having statistics for this would of course be nicer though.
3 minutes might not seem like much, but we're talking about a 4x slowdown, that's massive. I would personally love to enable LTO by default, but there's a reason why it hasn't been done so far, and also why it isn't being done in similar toolchains, such as C and C++ compilers. In the current state of affairs, we IMO simply cannot make LTO be the default, as people would eat us alive. That being said, this discussion is mostly based on vibes and feelings. If we had data from the ecosystem about how much does LTO slow down compilation and how much it improves performance across the board, for various benchmarks and crates, then it would be easier to make decisions based on it. |
Sure, the compilation might even get a bit faster, since you're just generating bitcode, instead of code (depending on how the compilation pipeline is configured). It's true that if your program is literally a hello world, then LTO won't have much compilation effect even with many dependencies. But once you start using them, the costs will start to appear. |
It doesn't have to be, an entirely new build profile wouldn't impact existing projects and would give developers full control over whether they want to have better performance with worse compile times, or not. |
Sure, but that's a separate discussion that should be led on a separate issue. FWIW, "would give developers full control" already happens today, people can just create their own profile, and even use tools such as https://github.com/Kobzol/cargo-wizard to prepare it for them. There are I think two main problems with a new profile, first is backwards compatibility, because an existing profile with the same name could have already existed in an existing project (but I think that should be possible to override), and the second is.. bikeshedding :) It's very difficult to say what are the "best options" for runtime performance. Should it have CGU=1? For some projects more CGUs result in better performance (optimizations are a heuristic after all). Should it have debug = 0? Many projects set at least |
A crater run perhaps? |
That's true and that's a very good point :)
What I meant by this is modifying the |
For such cases Cargo might have special
as it was suggested by
Wow! Nice book! Probably, Cargo-books needs to split profile flags by logical sections like debug-info + symbols / optimization / compilation process flags.
For just simply running
and enables LTO for release builds. So, enabling LTO by default in new projects just simplifies life of Rust developers.
4x times only for build process. Testing takes some time too. So, current 14 min (like this for Linux) turns into 17 min and it is only 1.2x slowdown. For MacOS, it will be only 1.1x slowdown. So, it is an exchange of 1.1x-1.2x slowdown for 1.05x speedup. Sounds cool! I will measure our massive code (1.5M LoC) with and without LTO tonight, it has funny results.
Anyway,
Yep, there is a lot of reason why C/C++ applications prefer not to be compiled with LTO, sometimes even -O3 breaks the code. For example, I fixed today one bug which happens only with novel GCC's here. Fortunately, in Rust, we do not have such a large number of undefined behaviours that allows Rust users to get higher level of optimizations.
According to this issue in CMake, MSVC uses LTO for Release starting from Visual Studio 2008. You may find more info with Google.
At the starting point, each rust package is a hello world, so I still do not see a problem with enabling LTO by default for new projects via extra info in Cargo.toml. When project status will be changed, users might start to think about disabling LTO and they will see an effect of enabling/disabling LTO.
How? Only new packages will be affected. Changing of default behaviour of Cargo for new project as it was proposed in #14738 does not affect existing. I suppose that we started to misunderstand each other at some point. My suggestion is not to enable LTO by default for |
Ah, you just reminded me that we disable using LTO for rustc on Windows, because it was producing miscompilations. Also, LTO + PGO has been broken (rust-lang/rust#115344) for several years. So sadly, yes, we do in fact have various LTO bugs. I'm not personally comfortable with enabling it by default across the board at the moment.
Oh, that's definitely not what I understood from your earlier messages :) In that case, please create a new issue, that's something different than what this issue is about. |
16 core machines build our app (mixed app with 1.5M LoC of heavy-mathematical Fortran code + small part in C++) with different setups with GCC 14.1 (the same behaviour is at least for GCC 10-14, but I do not test it this time).
Without LTO, binary size is 422 Mb; with LTO, binary size is 341 Mb. So, LTO may not only increase compilation time ;-)
Does this miscompilation happen in Rust source code or somewhere in LLVM?
That is sad.
The increasing popularity of LTO may attract attention on this optimization in LLVM community so less number of bugs will be with this technology.
Done! See #14741 |
No idea, it was discovered at the beginning of 2023, since then we don't use LTO for the compiler on Windows. |
Yes I think so :)
I'm pretty sure in most cases it increases compilation time, plus I don't think pushing an optimization which faces many bugs in a current build profile would make people happy. @Kobzol Do you have any concerns about adding a new ultra-optimized build profile? |
I personally don't have concerns about that, per-se (I'm also not on the Cargo team, though, just a random onlooker). I'm not sure if it's needed though, would it really help discoverability if there was a built-in profile with a special name (that we probably couldn't even ever change due to backwards compatibility)? People would still need to figure out that something like that exists. And as already mentioned, it would have trade-offs, there is no single profile that is best for all use-cases. In that case it seems to me that it might be better to just make peoplemore aware of the Performance book or https://github.com/Kobzol/cargo-wizard, e.g. by linking to them somewhere in the Cargo docs. Because there they can find the various trade-offs explained. |
Hoho! I suppose we can enable LTO and these optimizations for all packages that uses Rust 2024 edition since it is not stable yet! Easy solution! :) Since it is breaking change, we can do it :) |
I disagree. As I said before, yes no single profile is the best for all use-cases but there are some optimization flags that are almost guaranteed to improve runtime performance for most projects.
For "new" people, I think they would figure out it exists the same way they would figure out a "release" or a "bench" profile exists (if shown in the same way, of course). Plus, a new built-in profile which guarantees better runtime performance for most projects would be easier for developers than having to install an extra subcommand, although I do agree that developers need to be made aware of the performance trade-offs/Performance Book. |
Well, we are already frequently encountering the issue that people are not aware even of the release profile, it has become a running joke :) So even that is not very discoverable on its own. But yes, I suppose that being a builtin profile would be slightly more discoverable than being a third-party solution. |
I'm curious what the cargo team thinks about adding a new built-in profile. |
Speaking only for myself, I'm not a fan
|
As I mentioned in #14741, I tend to find it best to have Issues focus on the underlying need, rather than being overly fixated on one specific way of solving the problem. This makes it easier to fully weigh out the solutions rather than splitting the conversation up between several issues, making it harder to track, weigh against each other, and coordinate across interested parties. If we come to the point that we have settled on a solution but it isn't sufficient for some reason, we can then split the issues up then. |
I believe we are past the deadline for locking in everything that will be in the 2024 edition. The RFC cut off for it was a while ago. |
Totally agreed with these arguments against new profiles.
Ok. Let's continue discussion here.
That is a pity. But we still are able to start to generate a bit more optimized profiles at |
I understand. If it's a definitive no, how about modifying release to further optimize it without significantly increasing compilation time, as I initially proposed in this issue ? |
Perhaps an equivalent of this: [profile.release.package."*"]
opt-level = 3 could be set by the packages themselves? There are certain crates, like compressors and encoders, that know they're slow and need to be optimized to be usable, so they could have something in their manifest to tell Cargo to optimize them harder. This would be even more useful in debug builds. |
Not seeing it in the meeting notes to remember the context but I think the idea of a package providing default profile settings came up in yesterday's Cargo team meeting. It also came up previously when discussing mitigations for the downsides of mir-only rlibs. |
Problem
The release profile is used when wanting to build an optimized binary. Thus, I think it would be logical to optimize it to the fullest, sometimes at the cost of a greater build time, but I think it is a good compromise for many people, including me.
Proposed Solution
More optimizations flags should be added to the release build profile, notably lto=true and codegen-units=1, as well as optimizing all the packages with opt-level=3.
This greatly enhances runtime performance, although yes, at the cost of a bigger binary size and longer build time.
Notes
I have tested what I am proposing, with the following added in my cargo.toml:
This improved the output binary's runtime performance by orders of magnitude, thus this issue.
The text was updated successfully, but these errors were encountered: