Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support whole program devirtualization #68262

Open
tmandry opened this issue Jan 15, 2020 · 14 comments · Fixed by #96285
Open

Support whole program devirtualization #68262

tmandry opened this issue Jan 15, 2020 · 14 comments · Fixed by #96285
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-trait-system Area: Trait system C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@tmandry
Copy link
Member

tmandry commented Jan 15, 2020

When doing LTO, LLVM supports a devirtualization pass that converts some indirect calls to static, inlineable calls. This can unlock a lot of optimizations when dynamic dispatch is used frequently in a program.

I don't know how hard this would be to support in Rust. Clang has a flag -fwhole-program-vtables which we would need to emulate. From there it seems like it might "just work". I'm opening this issue to start collecting more information.

I also found #11915 which mentions this, but not sure if any of the information there is still accurate.

@jonas-schievink jonas-schievink added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-trait-system Area: Trait system C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 15, 2020
@sanxiyn
Copy link
Member

sanxiyn commented Feb 3, 2020

In LLVM terms, this means emitting type metadata and using llvm.type.test.

@flip1995
Copy link
Member

I'd be interested to look into this issue. Since I'm not that familiar with this part of the compiler (but want to get more familiar with it) I first wanted to ask what would be involved in fixing this.

AFAICT to fix this, a flag would have to be added to the compiler and if that flag is set, (some) types would have to be marked with the !vcall_visibility metadata when lowering to LLVM IR. We should be able to query the visibility of items with the TyCtxt::visibility query which we would then use to convert to the corresponding !vcall_visibility like this:

rustc_middle::ty::Visibility !vcall_visibility
Public 0 (Public)
Restricted 2 (Translation Unit)
Invisible 1 (Linkage Unit)

@tmandry could you give me some pointers where to start with this issue? Do you have an estimate how much work would be involved here?

@bjorn3
Copy link
Member

bjorn3 commented Jun 10, 2021

rustc_middle::ty::Visibility !vcall_visibility
[...] [...]
Restricted 2 (Translation Unit)
Invisible 1 (Linkage Unit)

This is not correct. A trait marked as invisible or restricted can still used by a foreign codegen unit or a foreign shared library. For example if it is used by a public generic or #[inline] function. It can also be implemented for downstream types if there is a blanket implementation.

@flip1995
Copy link
Member

flip1995 commented Jun 10, 2021

Ah right of course, so it has to depend on the visibility of the trait and its implementors. So if the trait is public, or if there are any generics involved or blanket impls exist, than it should be marked as !vcall_visibility=0 (pub). But if the trait is not public and only specific implementations exist and the types of those specific impls are restricted, translation unit or linkage unit could be chosen.

@Aaron1011
Copy link
Member

Aaron1011 commented Dec 14, 2021

It looks like we're already getting some form of this for free. The following code:

trait Foo {
    fn my_method(&self) {}
}

impl Foo for bool {
    fn my_method(&self) {
        std::process::abort();
    }
}

#[inline(never)]
pub fn use_it(val: &dyn Foo) {
    val.my_method();
}

pub fn main() {
    use_it(&true);
}

results in the following optimized LLVM IR for use_it:

; playground::use_it
; Function Attrs: noinline noreturn nonlazybind uwtable
define internal fastcc void @_ZN10playground6use_it17h9abb7f0a088ea9f2E() unnamed_addr #4 {
start:
; call std::process::abort
  tail call void @_ZN3std7process5abort17h887d510ab1f66ceaE() #10
  unreachable
}

@bjorn3
Copy link
Member

bjorn3 commented Dec 15, 2021

This only happens when use_it isn't public. Notice how the llvm ir for use_it doesn't have any arguments. I think LLVM propagates the argument into use_it as it is constant, sees a load of a function pointer from an immutable vtable, replaces the load with a constant representing the actual function pointer and then turns the indirect call to the function pointer into a regular direct call.

@hudson-ayers
Copy link
Contributor

Using the code you shared, I still see a vtable in the final binary: https://rust.godbolt.org/z/qzb6o9MKG . Were you using a different version of Rust, or different set of compilation flags to achieve this?

@flip1995
Copy link
Member

flip1995 commented Dec 15, 2021

FYI, I have a semi-working version here: https://github.com/flip1995/rust/tree/pk-vfe

It seems to still remove a bit too many functions or possibly does something wrong when converting the type.checked.load intrinsics back to normal loads. I'm currently investigating this.


If the above example is compiled as a crate-type=lib, then the my_method doesn't get removed. Godbolt compiles everything as a lib AFAIK. I guess as a binary, LLVM just get's rid of the whole vtable of the crate, since it can determine all the types and can get rid of dynamic dispatch and with that can remove the vtable of Foo entirely. (A short investigation revealed that this is most likely done by the GlobalOptPass in LLVM:

Optimize globals that never have their address taken.

@Aaron1011
Copy link
Member

@hudson-ayers I didn't scroll down far enough in the playground output. While the function is no longer making a call through the vtable, the vtable itself is still present in the binary, as you noticed.

@hudson-ayers
Copy link
Contributor

Is it correct this was closed by #96285 ? While dead virtual function elimination is awesome, it is not the same as whole program devirtualization, which can replace actually-called virtual functions with static calls. Supporting the first is not equivalent to supporting the second. And if -Zvirtual-function-elimination actually enables both, then it seems it should have a name that indicates as such

@flip1995
Copy link
Member

This shouldn't have been closed, because it isn't quite finished yet. IIRC (it has been some time since I actively worked on this) it only enables VFE, not WPD. Though WPD is enabled by default (for LTO at least) and VFE is only an extension of the WPD optimization.

@flip1995 flip1995 reopened this Jun 15, 2022
@flip1995
Copy link
Member

@workingjubilee workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023
@Ragarnoy
Copy link

Ragarnoy commented May 2, 2024

Is this dead in the water then?

@flip1995
Copy link
Member

flip1995 commented May 2, 2024

I don't think anyone is working on this right now. But in theory it should be possible to enable this optimization in rustc. What's missing is an analysis, that correctly determines the visibility of the vtable in the documented example. This is easier said then done though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-trait-system Area: Trait system C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants