Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for MIR-only RLIBs #38913

Closed
michaelwoerister opened this issue Jan 8, 2017 · 36 comments
Closed

Tracking issue for MIR-only RLIBs #38913

michaelwoerister opened this issue Jan 8, 2017 · 36 comments
Labels
A-codegen Area: Code generation A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@michaelwoerister
Copy link
Member

michaelwoerister commented Jan 8, 2017

There's been some talk about switching RLIBs to "MIR-only", that is, make RLIBs contain only the MIR representation of a program and not the LLVM IR and machine code as they do now. This issue will try to collect some advantages, disadvantages, and other concerns such an approach would entail:

Advantages

  • Less code duplication, which has four benefits:
    • RLIBs would be smaller because they would not contain LLVM IR and machine code anymore.
    • RLIBs and leaf crates would be smaller because, at the moment, instantiations of generic functions show up multiple times in the object code and LLVM IR.
    • RLIBs and leaf crates would be smaller because the compiler would be able instantiate monomorphic functions strictly on demand, as @japaric points out.
    • Possibly faster whole-project compiles, since generic instances are never compiled multiple times (although see "Disadvantages")
  • RLIBs would compile faster because the trans and LLVM passes would always be skipped (much like when compiling with -C metadata).
  • At the moment libstd is compiled with -Cdebuginfo=1, which is good in general but as a side-effect increases the size of Rust binaries, even if they are built without debuginfo (because the debuginfo from libstd gets statically linked into the binaries). This problem would not exist with MIR-only rlibs.
  • In the past we've had problems with WeakODR linkage and COMDAT sections on MinGW. WeakODR linkage is one way to deal with duplicate generic instances and avoiding those would also remove any reason to use WeakODR.
  • We would always get LTO-grade compiler optimizations since all code is available at codegen time.
  • Some targets, like NVPTX, don't seem to support regular linking (see NVPTX: non-inlined functions can't be used cross crate #38787). Only generating object code in leaf crates would solve this problem.
  • There seems be some indication that MIR-only RLIBs would help with making the Rust compiler more backend agnostic (see WASM-related issue Migrate wasm target to LLVM wasm backend #38804).
  • Generating LLVM IR only in leaf crates would make it easier to add comprehensive LLVM-based instrumentation like LeakSanitizer without recompiling libstd (see LeakSanitizer, ThreadSanitizer, AddressSanitizer and MemorySanitizer support #38699), as @japaric points out.
  • All Rust code (even that from libstd) can be compiled with -C target-cpu=native, potentially resulting in better code, as @japaric points out.
  • The build process of multi-crate project would gain more parallelism, since downstream crates don't need to wait for upstream crate's codegen, even though they could already compile up until the linking phase, as @est31 points out.

Disadvantages

  • The leaf crates (executables, staticlibs, dylibs, cdylibs) would take more time to compile because
    1. the machine code of monomorphic functions from upstream crates would not be "cached" anymore, and
    2. since LLVM sees more code at once, some super-linear optimizations would take dis-proportionally more time (like when one compiles with LTO now)
  • People might rely on pub #[no_mangle] items being exported from RLIBs and link against them directly. This would not be possible anymore, as @nagisa points out.

Non-Advantages

  • MIR-only libs would not be platform independent. One could think that that should be the case but because of cfg switches, MIR is not platform independent either.

Mitigation strategies for disadvantages:

  1. The problem of caching machine code would be solved in a generalized form by incremental compilation. One has to keep in mind though that incremental compilation will produce less performant code because it prevents many opportunities for inlining.
  2. We could provide an additional, more coarse-grained codegen unit partitioning scheme for incremental compilation (e.g. one CGU per crate) for better runtime performance at the cost of longer compile times.
  3. The amount of code LLVM sees at once can easily be controlled via -C codegen-units already, which provides a means of reducing super-linear optimizations.

Open Questions

  • I think we support "bundling" native libraries into RLIBs. We might still need to keep supporting this, even if we don't store machine code originating from Rust?

Please help collect more data on the viability of MIR-only RLIBs.

cc @rust-lang/core @rust-lang/compiler @rust-lang/tools @rkruppe

@michaelwoerister michaelwoerister added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-tools labels Jan 8, 2017
@nagisa
Copy link
Member

nagisa commented Jan 8, 2017

This is also potentially breaking people who are linking to rlibs expecting them to at least expose the extern #[no_mangle] functions like it does currently.

I did that at least once before, though the application where I did it was already very hacky for other reasons and I do not think the project is around anymore.

@japaric
Copy link
Member

japaric commented Jan 8, 2017

Another advantage I see is that pure MIR RLIBs effectively let us "recompile"
std with different codegen options without needing std-aware Cargo or Xargo.
This is assuming that the std component that rustup installs will also be
pure MIR.

Basically, cargo rustc --release -- -C target-cpu=native would optimize std
for the host CPU "on the fly". Today, this requires using Xargo to recompile
std (i.e. RUSTFLAGS="-C target-cpu=native xargo build").

Other case where one uses Xargo to recompile the std is producing an
executable that aborts on panic!s without the overhead of landing pads (the
std component that rustup install contains landing pads because it's
compiled with -C panic=unwind). With pure MIR RLIBs, after you set panic = "abort" in your Cargo.toml, cargo build will give you an executable that
doesn't contain landing pads (everything would get compiled with -C panic=abort).

cc @brson ^ pure MIR RLIBs would eliminate the need for std-aware Cargo and
Xargo for some scenarios.

I expect the above will also make using sanitizers (cf #38699)
straightforward. Using a sanitizer requires (re)compiling everything with an
extra LLVM pass and linking to the sanitizer runtime, which is written in C/C++.
With pure MIR RLIBs using a sanitizer would become as simple as cargo rustc -- -Z sanitizer=address; that would compile everything, including std, with the
extra LLVM pass and also link the runtime, which would be provided as an e.g.
librustc_asan.rlib in the std component.

cc @alexcrichton ^ relevant to sanitizer support

I think we support "bundling" native libraries into RLIBs. We might still need to keep supporting this, even if we don't store machine code originating from Rust?

This would be required for the "easy sanitizers" scenario I'm describing above.
Or we could ship the sanitizers as "static libraries", i.e. librustc_asan.a.

Also, note that today one can build statically linked Rust programs using the
MUSL targets without needing to have MUSL installed because libc.a is
embedded inside the std rlib (libstd.rlib) that ships with the std
component.

@nagisa
Copy link
Member

nagisa commented Jan 8, 2017

@japaric that’s trickly advantage as it prevents us from adding any MIR optimisations that depend on the codegen options set :) We already have one which acts upon the -Zno-landing-pads (ergo --panic=abort)

@est31
Copy link
Member

est31 commented Jan 8, 2017

@nagisa @japaric isn't platform independence listed in the issue description as non advantage?

MIR-only libs would not be platform independent. One could think that that should be the case but because of cfg switches, MIR is not platform independent either.

I'd add to the advantages that it would add more parallelism, as the passes up to MIR being finished take less time than passes up to codegen being finished, and in combination with codegen-units, you can now compile the code more in parallel than before. E.g. right now when I bootstrap the compiler, the "whole world" waits for the rustc crate to compile in a single thread. With the change, we wait less, as only its mir has to be available before we can continue. Afterwards when doing the codegen for the binary, we simply can use codegen-units to get the maximum amount of parallelism the hardware gives us.

@japaric
Copy link
Member

japaric commented Jan 8, 2017

Could you elaborate on how it would prevent you from adding such optimizations? The way I see it is that the std component will probably continue to be compiled with -C panic=unwind so if you then compile your app with -C panic=abort then LLVM won't be able to optimize as well (or as fast) as if you had recompiled std with -C panic=abort because of the MIR optimizations you mention. However, we would still be better off than today where the std component is shipped filled with landing pads. Or does LLVM always emit landing pads everywhere if the MIR "optimization" you mention is not present? (In that case, it no longer sounds like an optimization but more like a requirement)

If you want the most optimized code possible then, yeah, you would have to use Xargo or std-aware Cargo to opt into MIR optimizations that depend on codegen options. While you are at it you can also throw in --mir-opt-level=3, etc.

@est31
Copy link
Member

est31 commented Jan 8, 2017

I agree that MIR optimisations don't really prevent you to have platform agnostic MIR. As both their input and output is MIR, those optimisations could be run in the leaf crates, once the target and other info is known.

However, if earlier stages in the compiler depend on the target, which is the case with cfg, one would either have to refactor the entire compiler to understand cfg's in all later stages, or simulate compilation with all possible combinations of cfg's enabled/disabled (in the end cfg is an on/off question). The first approach will probably hugely bloat code complexity of the compiler, the second approach would bloat runtime complexity exponentially by the number of kinds of used cfg's.

So MIR will probably stay platform dependent for some time.

@japaric
Copy link
Member

japaric commented Jan 8, 2017

@est31

isn't platform independence listed in the issue description as non advantage?

I'm not sure what are trying to get at? The -C target-cpu=native optimizations I'm referring to are about LLVM having access to the IR of all functions so it can apply autovectorization, CPU scheduling optimizations, etc. Whereas today -C target-cpu=native is not as good because libstd.rlib already contains machine code that was optimized with -C target-cpu=generic. All these optimizations are "within an architecture", e.g. x86, and after e.g. cfg(target_arch) has taken effect so I'm not sure how "platform agnostic MIR" is related

@nagisa
Copy link
Member

nagisa commented Jan 8, 2017

@japaric Removing landing pads from MIR is already somewhat a problem since you cannot add them back after fact, so you already lose some of the so-called advantage by being unable to reverse that. Later on we might want to add something more invasive. For a completely hypothetical example consider something resembling autovectorisation which, again, is not exactly reversible and thus -C target-feature=-stuff would become no-op as well. -C debuginfo=2? Stripped to keep binaries smaller because of -C debuginfo=0 before. -C debug-assertions? No-op even without MIR optimisations as debug assertions is essentially a #[cfg].

So, what I’m trying to say is that specifying codegen options on leaf crates only would still not be equivalent (and diverge more over time with extra hypothetical MIR opts) to specifying the codegen option(s) for every crate.

You could (as @est31 did just now) argue for storing unoptimised MIR instead, but that, in addition to inreasing size of intermediate rlibs, serializes MIR opts.

isn't platform independence listed in the issue description as non advantage

Codegen options aren’t exactly related to platform independence in this context.

@japaric
Copy link
Member

japaric commented Jan 8, 2017

@michaelwoerister

I'm not sure if this can be listed as an advantage but pure MIR RLIBs would have prevented #38824. The TL;DR is that LLVM raises assertions when you try lower functions that take/return i128 values to PTX code / MPS430 instructions because of bugs in LLVM. With pure MIR RLIBs I expect that if the leaf crate doesn't make use of i128 at all then those functions that use i128 would never be fed into LLVM thus the LLVM assertions wouldn't have been triggered. I suppose that would be some sort of "dead code elimination" pass at the MIR level. So, basically less IR could be fed into LLVM with the right analysis.

@est31
Copy link
Member

est31 commented Jan 8, 2017

@japaric

I'm not sure what are trying to get at?

Ah, sorry, I've misread, you only talked about codegen options.

@eddyb
Copy link
Member

eddyb commented Jan 8, 2017

cc @solson @oli-obk (miri develooers)

@michaelwoerister
Copy link
Member Author

Great points @nagisa, @japaric, and @est31! I've added all of them to the list.

@solson
Copy link
Member

solson commented Jan 8, 2017

I don't think it will affect const evaluation one way or another, but it would help us test Miri outside rustc to be able to easily build dependencies as MIR-only rlibs (with MIR for all items, not just generic/inline/constant ones like in the existing metadata).

Largely, Miri is just like another backend in this context, so it is an instance of this previously mentioned advantage:

There seems be some indication that MIR-only RLIBs would help with making the Rust compiler more backend agnostic (see WASM-related issue #38804).

@retep998
Copy link
Member

retep998 commented Jan 9, 2017

The problem of caching machine code would be solved in a generalized form by incremental compilation. One has to keep in mind though that incremental compilation will produce less performant code because it prevents many opportunities for inlining.

MSVC using /LTCG:INCREMENTAL is able to achieve LTO with incremental compilation with very fine granularity without sacrificing inlining. According to a blog post, the runtime performance cost of their incremental LTCG vs standard LTCG is less than half a percent, while providing massive gains in link time. So doing something equivalent in Rust is definitely a practical possibility, although it would require a significant amount of support from LLVM. Hopefully ThinLTO will be the magic bullet that provides the necessary support.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Jan 10, 2017

Regarding #[no_mangle] pub items from rlibs: While it's unfortunate to break anyone's use case, I think this is only a minor disadvantage. It is not documented that rlibs are ordinary archives with some special contents, in fact this is an implementation detail. In addition, we've had breaking changes in compiler output (e.g., #29520, and at this very second #38876) for lesser reasons.

(I would have more sympathy if someone could give a good reason for using rlibs as archives that isn't already covered by staticlib, cdylib, and other existing tools.)

@brson
Copy link
Contributor

brson commented Jan 10, 2017

I'm very enthusiastic about this. I think separating the type checking and code generation into two phases is smart no matter exactly the strategy for when the MIR finally get translated. Gives us a lot of flexibility for coordinating the build. For example, we don't have to delay code generation until the final crate. Cargo itself could spawn parallel processes to do code generation for already-typechecked crates, while their downstreams continue type checking.

By collapsing duplicate monomorphizations, I'm hopeful that this will lead to significant improvements to the major disadvantages of monomorphization, the bloat and the compile time. We could end up in a position where we can say, "the generics model is like C++, but more efficient". That could be a major advantage.

One significant disadvantage with this model is link-time scalability. This will put massive memory pressure on the leaf crate builds, and that could bite us in the future as bigger projects are written in Rust.

@brson
Copy link
Contributor

brson commented Jan 10, 2017

LTO is a downside too because of compile time. I'd expect we'd need a range of strategies for the actual codegen, to accomplish different goals in -O0 vs -O3.

@SimonSapin
Copy link
Contributor

SimonSapin commented Jan 24, 2017

The leaf crates (executables, staticlibs, dylibs, cdylibs) would take more time to compile

I’ve very worried about this for Servo. There’s currently 319 crates in the dependency graph, but after an initial build only a few of them are recompiled in the typical edit-build-test cycle. Even so, compile times are already pretty bad.

Do MIR-only rlibs mean doing code generation for the entire dependency graph every time? This sounds like unacceptable explosion of compile times.

@eddyb
Copy link
Member

eddyb commented Jan 24, 2017

@SimonSapin I see no point in experimenting with this on Servo's scale without enabling incremental recompilation (with ThinLTO in the future, too).

Btw I hear @rkruppe is making good progress towards such a compilation mode.

@michaelwoerister
Copy link
Member Author

We discussed this in the last @rust-lang/tools meeting and the consensus was that this looks like a good idea in many ways but we will not pursue it as long as it would mean a significant compile regression.

@retep998
Copy link
Member

So then we'll be pursuing this as soon as Rust is able to fully take advantage of incremental compilation using ThinLTO?

@DemiMarie
Copy link
Contributor

Given the recent work to make the compiler incremental, I wonder if it will be possible to perform incremental builds at the level of individual functions, caching anything that hasn't changed. That could allow amazing feats, such as executables that are incrementally updated as the user compiles their source code.

@hanna-kruppe
Copy link
Contributor

Just jotting this down before I forget about it again: Currently, statics are always translated locally, never cross-crate. If we stick to this, rlibs would still generate some object files that contain only statics, no code. However, that invites a bunch of headaches. For example, if a static references a function (e.g. an interrupt vector table storing function pointers), we'd need to translate those too — or remember them somewhere and use them as roots for trans item collection in downstream crates.

So it would be cleaner to also delay translation of statics to the final binary/staticlib/cdylib. This requires non-trivial refactoring though, as a lot of the current code is written under the assumption that all statics to translate come from the current crate (e.g., TransItem::Static stores a NodeId, not a DefId).

It also means metadata needs a way to enumerate all the statics and other collector roots (monomorphic functions, and some more things in "eager" mode) from other crates. The information is all there, but there's no efficient/easy way to enumerate them.

@Mark-Simulacrum Mark-Simulacrum added the C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC label Jul 26, 2017
@aep
Copy link

aep commented Sep 22, 2017

uuh i'm scratching my head what i'm missing here: how is this gong to work with -C linker= ? We're relying on the fact that the hash of the input file to the linker is the same every invocation. If the objects get translated to native code before being passed to the linker, is the translation stable? Or will the targets system linker actually only ever see a single already relocated and re-ordered object file ?

@hanna-kruppe
Copy link
Contributor

This issue only affects which Rust code (or monomorphization of generic Rust code) gets translated into which LLVM compilation unit. It doesn't affect what happens afterwards with these LLVM modules, the resulting object files, etc. — and while it's plausible that MIR-only rlibs would enable more innovation in the later stages of the backend, nothing along those lines has been proposed or even discussed as far as I remember.

@hanna-kruppe
Copy link
Contributor

Another (marginal) benefit, assuming #[inline] stops copying function bodies into multiple codegen units as discussed in the context of #44941: #[inline] becomes less necessary (only adds inlinehint instead of enabling inlining at all in certain cases) and less complicated (easier to explain, easier to tell if it's useful).

@michaelwoerister
Copy link
Member Author

I've put together a proof-of-concept implementation of this in #48373. Although the implementation crashes for many crates, I was able to collect timings for a number of projects. The tables show the aggregate time spent for various tasks while compiling the whole crate graph. In many cases we do less work overall but due to worse parallelization, wall-clock time increases. I.e. everything seems to be bottlenecked on the MIR-to-LLVM translation in the leaf crates. To me this suggests that MIR-only RLIBs are blocked on the compiler internals being parallelized.

ripgrep - cargo build

regular MIR-only %
LLVM codegen passes 33.90 32.52 95.9 %
LLVM function passes 1.39 1.35 97.5 %
LLVM module passes 2.18 1.95 89.8 %
MonoItem collection 2.80 2.09 74.4 %
translation 23.73 19.97 84.1 %
LLVM total 37.46 35.83 95.6 %
BUILD total 20.92 26.14 125.0 %

encoding-rs - cargo test --no-run

regular MIR-only %
LLVM codegen passes 13.11 7.28 55.6 %
LLVM function passes 0.57 0.33 58.1 %
LLVM module passes 0.90 0.44 48.7 %
MonoItem collection 1.19 0.69 58.1 %
translation 8.68 6.08 70.1 %
LLVM total 14.59 8.06 55.2 %
BUILD total 15.73 14.37 91.4 %

webrender - cargo build

regular MIR-only %
LLVM codegen passes 109.42 69.17 63.2 %
LLVM function passes 4.55 3.10 68.2 %
LLVM module passes 1.63 1.06 64.7 %
MonoItem collection 10.70 5.64 52.7 %
translation 102.95 58.70 57.0 %
LLVM total 115.60 73.33 63.4 %
BUILD total 72.30 68.64 94.9 %

futures-rs - cargo test --no-run

regular MIR-only %
LLVM codegen passes 41.19 48.67 118.1 %
LLVM function passes 1.68 1.93 115.0 %
LLVM module passes 0.21 0.22 107.3 %
MonoItem collection 5.86 6.90 117.8 %
translation 55.48 69.84 125.9 %
LLVM total 43.08 50.82 118.0 %
BUILD total 17.28 19.18 111.0 %

tokio-webpush-simple - cargo build

regular MIR-only %
LLVM codegen passes 33.98 22.55 66.4 %
LLVM function passes 1.53 0.95 62.0 %
LLVM module passes 0.30 0.20 66.8 %
MonoItem collection 3.80 2.11 55.5 %
translation 39.09 22.99 58.8 %
LLVM total 35.81 23.70 66.2 %
BUILD total 22.28 21.54 96.7 %

Number of LLVM function definitions generated for whole crate graph

MIR-only regular
ripgrep 22683 34239
encoding-rs test 8393 15116
webrender 72238 114239
futures-rs test 57565 46935
tokio-webpush-simple 27346 44961

@eddyb
Copy link
Member

eddyb commented Sep 7, 2018

What if we did this, but for libcore...libstd, at stage1? It might be worth it, despite the huge number of tests, and should be a huge improvement when running just a few tests.

(prompted by @dwijnand's comments on Discord about their workflow of changing librustc and re-checking only one test - with incremental, most of the time is spent building libcore...libstd)

EDIT: here's some data, since I wanted to replicate what @dwijnand was seeing:

  • no incremental, libstd:
    • stage0 check: 29s (./x.py check src/libstd)
    • stage0 build: 36s
    • stage1 build: 58s (maybe because of LLVM/debug-assertions?)
  • incremental at stage0, after doing touch src/librustc/lib.rs:
Building stage0 compiler artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
    Finished release [optimized] target(s) in 1m 01s
Building stage0 codegen artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu, llvm)
    Finished release [optimized] target(s) in 48.75s
Building stage1 std artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
    Finished release [optimized] target(s) in 7m 39s
Building stage1 test artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
    Finished release [optimized] target(s) in 30.12s

Most of the time is spent building libstd, which should be improved once #53673 ends up in beta (perhaps at the cost of the rustc build time?), so the performance impact of using MIR-only rlibs might become less significant - we'll have to wait and see, I suppose.

@jyn514
Copy link
Member

jyn514 commented Nov 10, 2020

Now that cargo passes --embed-bitcode=no, is there anything left to do for this?

@jyn514
Copy link
Member

jyn514 commented Nov 10, 2020

It turns out I was confused - this issue is about never going through LLVM at all, while --embed-bitcode=no instead embeds the object code generated by LLVM.

https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/How.20to.20learn.20more.20about.20crate.20metadata.3F/near/216169138

@pnkfelix
Copy link
Member

Visiting during T-compiler backlog bonanza meeting. We decided that this is at best S-tracking-unimplemented, but it also just represents an experimental idea that was being floated around, not something that anyone committed to.

Thus, it doesn't deserve a tracking issue: It deserves a separate MCP at this point.

Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests