Tracking issue for MIR-only RLIBs

There's been some talk about switching RLIBs to "MIR-only", that is, make RLIBs contain only the MIR representation of a program and not the LLVM IR and machine code as they do now. This issue will try to collect some advantages, disadvantages, and other concerns such an approach would entail:

## Advantages
- Less code duplication, which has four benefits:
  - RLIBs would be smaller because they would not contain LLVM IR and machine code anymore.
  - RLIBs and leaf crates would be smaller because, at the moment, instantiations of generic functions show up multiple times in the object code and LLVM IR.
  - RLIBs and leaf crates would be smaller because the compiler would be able instantiate monomorphic functions strictly on demand, as @japaric points out.
  - Possibly faster whole-project compiles, since generic instances are never compiled multiple times (although see "Disadvantages")
 - RLIBs would compile faster because the trans and LLVM passes would always be skipped (much like when compiling with `-C metadata`).
 - At the moment `libstd` is compiled with `-Cdebuginfo=1`, which is good in general but as a side-effect increases the size of Rust binaries, even if they are built without debuginfo (because the debuginfo from `libstd` gets statically linked into the binaries). This problem would not exist with MIR-only rlibs.
 - In the past we've had problems with WeakODR linkage and COMDAT sections on MinGW. WeakODR linkage is one way to deal with duplicate generic instances and avoiding those would also remove any reason to use WeakODR.
 - We would always get LTO-grade compiler optimizations since all code is available at codegen time.
 - Some targets, like NVPTX, don't seem to support regular linking (see #38787). Only generating object code in leaf crates would solve this problem.
 - There seems be some indication that MIR-only RLIBs would help with making the Rust compiler more backend agnostic (see WASM-related issue #38804).
 - Generating LLVM IR only in leaf crates would make it easier to add comprehensive LLVM-based instrumentation like LeakSanitizer without recompiling `libstd` (see #38699), as @japaric points out.
 - All Rust code (even that from `libstd`) can be compiled with `-C target-cpu=native`, potentially resulting in better code, as @japaric points out.
 - The build process of multi-crate project would gain more parallelism, since downstream crates don't need to wait for upstream crate's codegen, even though they could already compile up until the linking phase, as @est31 points out.

## Disadvantages
- The leaf crates (executables, staticlibs, dylibs, cdylibs) would take more time to compile because 
   1. the machine code of monomorphic functions from upstream crates would not be "cached" anymore, and
   2. since LLVM sees more code at once, some super-linear optimizations would take dis-proportionally more time (like when one compiles with LTO now)
- People might rely on `pub #[no_mangle]` items being exported from RLIBs and link against them directly. This would not be possible anymore, as @nagisa points out.

## Non-Advantages
- MIR-only libs would not be platform independent. One could think that that should be the case but because of `cfg` switches, MIR is not platform independent either.

## Mitigation strategies for disadvantages:
1. The problem of caching machine code would be solved in a generalized form by incremental compilation. One has to keep in mind though that incremental compilation will produce less performant code because it prevents many opportunities for inlining.
2. We could provide an additional, more coarse-grained codegen unit partitioning scheme for incremental compilation (e.g. one CGU per crate) for better runtime performance at the cost of longer compile times.
3. The amount of code LLVM sees at once can easily be controlled via `-C codegen-units` already, which provides a means of reducing super-linear optimizations.

## Open Questions
- I think we support "bundling" native libraries into RLIBs. We might still need to keep supporting this, even if we don't store machine code originating from Rust?

Please help collect more data on the viability of MIR-only RLIBs.

cc @rust-lang/core @rust-lang/compiler @rust-lang/tools @rkruppe 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tracking issue for MIR-only RLIBs #38913

Advantages

Disadvantages

Non-Advantages

Mitigation strategies for disadvantages:

Open Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tracking issue for MIR-only RLIBs #38913

Description

Advantages

Disadvantages

Non-Advantages

Mitigation strategies for disadvantages:

Open Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions