-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce -Zvirtual-function-elimination
codegen flag
#96285
Conversation
Some changes occured to rustc_codegen_gcc cc @antoyo |
r? @nagisa (rust-highfive has picked a reviewer for you, use r? to override) |
Did you mean |
I think this flag fits better as a |
Is there a reason why we shouldn't (or LLVM shouldn't) enable this optimization by default with full LTO? Is there an expectation that it might miscompile something or otherwise cause us unexpected trouble? |
let trait_ref_self = trait_ref.with_self_ty(cx.tcx, ty); | ||
let trait_ref_self = cx.tcx.erase_regions(trait_ref_self); | ||
let trait_def_id = trait_ref_self.def_id(); | ||
let trait_vis = cx.tcx.visibility(trait_def_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the closest existing concept similar to vcall_visibility
would be reachability, rather than visibility.
Example with an incorrectly optimized vtables showing why visibility is insufficient
trait Foo { fn foo(&self) { println!("foo") } }
trait Bar { fn bar(&self) { println!("bar") } }
trait Baz { fn baz(&self) { println!("baz") } }
impl Foo for usize {}
impl Bar for usize {}
impl Baz for usize {}
pub struct FooBox(Box<dyn Foo>);
pub struct BarBox(Box<dyn Bar>);
pub struct BazBox(Box<dyn Baz>);
pub fn make_foo() -> FooBox { FooBox(Box::new(0)) }
pub fn make_bar() -> BarBox { BarBox(Box::new(0)) }
pub fn make_baz() -> BazBox { BazBox(Box::new(0)) }
#[inline]
pub fn f(a: FooBox) { a.0.foo() }
pub fn g<T>(b: BarBox) { b.0.bar() }
#[inline]
fn h(c: BazBox) { c.0.baz() }
pub const H: fn(BazBox) = h;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. I don't really have a solution for this right now. I tried looking at the reachability of the trait_ref
and the members of the traits with the privacy_access_levels
query. But for all of the traits and the functions in the traits (queried with tcx.vtable_entries
) this just returns None
, meaning it is not reachable IIUC.
I guess I would somehow have to check the reachability for the dyn Trait
types. But I couldn't figure out how to do that.
Do you have ideas on how to best address this issue?
Yes, new flags are always added as unstable |
module.module_llvm.llmod(), | ||
llvm::LLVMModFlagBehavior::Error, | ||
"LTOPostLink\0".as_ptr().cast(), | ||
1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be 11?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the HasModuleFlag
should be checking for 1
, but also adding this flag here seems super sketchy to me. This is a pretty internal thing to LTO AFAICT.
In that context, it would be great if this code had a comment with context as to why this is being done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the mono repo, I found the value 1.
The LTOGenerator only adds the flag.
https://github.com/llvm/llvm-project/blob/09c2b7c35af8c4bad39f03e9f60df8bd07323028/llvm/lib/LTO/LTOCodeGenerator.cpp#L541
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here is based on the file tschuett linked above. In the LLVM repo that code is responsible to set up LTO. Adding this flag was missing from the Rust code that should emulate the code in LTOCodeGenerator.cpp
.
The 11
argument in the HasModuleFlag
is the size of the string passed to this function, not the value of the flag. The 1
argument in the AddModuleFlag
function is the value of the module flag. LLVM doesn't need a string size here, because it expects a \0
terminated string in this function.
module.module_llvm.llmod(), | ||
llvm::LLVMModFlagBehavior::Error, | ||
"LTOPostLink\0".as_ptr().cast(), | ||
1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the HasModuleFlag
should be checking for 1
, but also adding this flag here seems super sketchy to me. This is a pretty internal thing to LTO AFAICT.
In that context, it would be great if this code had a comment with context as to why this is being done.
Thanks for all the feedback. I think I addressed all the comments (I will clean up / squash the commits before this PR gets merged), except for the reachability issue, which needs a little bit more work/thinking on how to address this.
Clang/LLVM doesn't set this flag by default, so I figured rustc shouldn't set this flag by default either. I don't know if there are any known issues with this optimization in LLVM |
This seems reasonable to me for a first shot at the functionality. Since it is unstable, I'm not super concerned about this being partially broken in some ways as e.g. pointed out by @tmiasko above, although we should definitely make sure the information about this limitation/bug is preserved in e.g. the tracking issue. Another thing I'm not seeing is documentation for this feature the unstable book. Please write something about this feature here, and especially please note the known limitations and/or issues as well. |
I added the documentation, cleaned up the commits and rebased on top of latest master. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Very nice work. |
@bors r+ Thanks! |
📌 Commit 798a2d927f3796c0b21fc6ee2707ce21833bfbd0 has been approved by |
⌛ Testing commit 798a2d927f3796c0b21fc6ee2707ce21833bfbd0 with merge 2c9812125037f96b4ba035183f237dfacc0f8d4a... |
This comment has been minimized.
This comment has been minimized.
Adds the virtual-function-elimination unstable compiler flag and a check that this flag is only used in combination with -Clto. LLVM can only apply this optimization with fat LTO.
To apply the optimization the `Virtual Function Elim` module flag has to be set. To apply this optimization post-link the `LTOPostLink` module flag has to be set.
This function computes a Itanium-like typeid for a trait_ref. This is required for the VFE optimization in LLVM. It is used to map `llvm.type.checked.load` invocations, that is loading the function from a vtable, to the vtables this function could be from. It is important to note that `typeid`s are not unique. So multiple vtables of the same trait can share `typeid`s.
Add the intrinsic declare {i8*, i1} @llvm.type.checked.load(i8* %ptr, i32 %offset, metadata %type) This is used in the VFE optimization when lowering loading functions from vtables to LLVM IR. The `metadata` is used to map the function to all vtables this function could belong to. This ensures that functions from vtables that might be used somewhere won't get removed.
This adds the typeid and `vcall_visibility` metadata to vtables when the -Cvirtual-function-elimination flag is set. The typeid is generated in the same way as for the `llvm.type.checked.load` intrinsic from the trait_ref. The offset that is added to the typeid is always 0. This is because LLVM assumes that vtables are constructed according to the definition in the Itanium ABI. This includes an "address point" of the vtable. In C++ this is the offset in the vtable where information for RTTI is placed. Since there is no RTTI information in Rust's vtables, this "address point" is always 0. This "address point" in combination with the offset passed to the `llvm.type.checked.load` intrinsic determines the final function that should be loaded from the vtable in the `WholeProgramDevirtualization` pass in LLVM. That's why the `llvm.type.checked.load` intrinsics are generated with the typeid of the trait, rather than with that of the function that is called. This matches what `clang` does for C++. The vcall_visibility metadata depends on three factors: 1. LTO level: Currently this is always fat LTO, because LLVM only supports this optimization with fat LTO. 2. Visibility of the trait: If the trait is publicly visible, VFE can only act on its vtables after linking. 3. Number of CGUs: if there is more than one CGU, also vtables with restricted visibility could be seen outside of the CGU, so VFE can only act on them after linking. To reflect this, there are three visibility levels: Public, LinkageUnit, and TranslationUnit.
The offset in the llvm.type.checked.load intrinsic differs on 32 bit platforms
I updated the 32-bit test with: diff --git a/src/test/codegen/virtual-function-elimination-32bit.rs b/src/test/codegen/virtual-function-elimination-32bit.rs
index dd200278bb1..6f963363a99 100644
--- a/src/test/codegen/virtual-function-elimination-32bit.rs
+++ b/src/test/codegen/virtual-function-elimination-32bit.rs
@@ -31,5 +31,5 @@ pub fn main() {
taking_t(&s);
}
-// CHECK: ![[TYPE0]] = !{i64 0, !"[[MANGLED_TYPE0]]"}
+// CHECK: ![[TYPE0]] = !{i32 0, !"[[MANGLED_TYPE0]]"}
// CHECK: ![[VCALL_VIS0]] = !{i64 2} and rebased on latest master (to test locally again). No other changes. @bors r=nagisa |
📌 Commit 195f208 has been approved by |
☀️ Test successful - checks-actions |
Finished benchmarking commit (2d1e075): comparison url. Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results
CyclesThis benchmark run did not return any relevant results for this metric. If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. @rustbot label: -perf-regression Footnotes |
Fixes #68262
This PR adds a codegen flag
-Zvirtual-function-elimination
to enable the VFE optimization in LLVM. To make this work, additonal information has to be added to vtables (!vcall_visibility
metadata and atypeid
of the trait). Furthermore, instead of justload
ing functions, thellvm.type.checked.load
intrinsic has to be used to map functions to vtables.For technical details of the changes, see the commit messages.
I also tested this flag on https://github.com/tock/tock on different boards to verify that this fixes the issue tock/tock#2594. This flag is able to improve the size of the resulting binary by about 8k-9k bytes by removing the unused debug print functions.
Rendered documentation update