Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prologue of function missing in FatLTO but present in ThinLTO #103619

Closed
MasterAwesome opened this issue Oct 27, 2022 · 9 comments
Closed

Prologue of function missing in FatLTO but present in ThinLTO #103619

MasterAwesome opened this issue Oct 27, 2022 · 9 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-LTO Area: Link-time optimization (LTO) C-bug Category: This is a bug. P-high High priority regression-untriaged Untriaged performance or correctness regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@MasterAwesome
Copy link
Contributor

MasterAwesome commented Oct 27, 2022

Compiling with branch-target-enforcement = 1(-Zbranch-protection=bti) on aarch64-unknown-linux-gnu causes different code to emitted in the prologue of the function for fat LTO and thin LTO in debug mode.

Baseline assembly for function add defined as follows:

#[no_mangle]
pub extern "C" fn add(left: usize, right: usize) -> usize {
    left.wrapping_add(right)
}

compiled into a cdylib.

[DEBUG] llvm-objdump on ThinLTO

0000000000010408 <add>:
   10408: d503245f     	bti	c ; not present in FatLTO
   1040c: d10043ff     	sub	sp, sp, #16
   10410: 8b010008     	add	x8, x0, x1
   10414: f90007e8     	str	x8, [sp, #8]
   10418: 14000001     	b	0x1041c <add+0x14>
   1041c: f94007e0     	ldr	x0, [sp, #8]
   10420: 910043ff     	add	sp, sp, #16
   10424: d65f03c0     	ret

[DEBUG] llvm-objdump on FatLTO

0000000000010408 <add>:
   10408: d10043ff     	sub	sp, sp, #16
   1040c: 8b010008     	add	x8, x0, x1
   10410: f90007e8     	str	x8, [sp, #8]
   10414: 14000001     	b	0x10418 <add+0x10>
   10418: f94007e0     	ldr	x0, [sp, #8]
   1041c: 910043ff     	add	sp, sp, #16
   10420: d65f03c0     	ret

[RELEASE] llvm-objdump on both FatLTO and ThinLTO

0000000000010238 <add>:
   10238: d503245f     	bti	c
   1023c: 8b000020     	add	x0, x1, x0
   10240: d65f03c0     	ret

rustc version with failure first occuring:

rustc 1.65.0-nightly (75b7e52e9 2022-08-13)
binary: rustc
commit-hash: 75b7e52e92c3b00fc891b47f5b2efdff0a2be55a
commit-date: 2022-08-13
host: x86_64-unknown-linux-gnu
release: 1.65.0-nightly
LLVM version: 15.0.0

cargo-bisect-rustc

searched nightlies: from nightly-2022-08-12 to nightly-2022-08-14

regressed nightly: nightly-2022-08-13

searched commit range: 20ffea6...f22819b

regressed commit: e2b52ff

bisected with cargo-bisect-rustc v0.6.4

Host triple: x86_64-unknown-linux-gnu

Reproduce with:

cargo bisect-rustc --start=2022-08-12 --end=2022-08-14 --script=test.sh --target aarch64-unknown-linux-gnu

Might be related to: #102162 as reverting the same patches in that issue solves the problem.

@MasterAwesome MasterAwesome added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Oct 27, 2022
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Oct 27, 2022
@apiraino apiraino added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 28, 2022
@inquisitivecrystal
Copy link
Contributor

@MasterAwesome Pardon, but what's the practical implication of this regression? Does it cause some sort of runtime problem, is it a code-size or performance problem, or something else? I'm afraid I can't read assembly.

@saethlin
Copy link
Member

The bti instruction isn't present in the fat LTO case, even though it's explicitly requested via compiler flag. I can't really read this either, just diffing by eye and reading the comment.

The fact that the bti is present in release but not debug is extremely spooky. Would be good to have the opinion of an LLVM expert.

@inquisitivecrystal inquisitivecrystal added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-LTO Area: Link-time optimization (LTO) labels Oct 31, 2022
@MasterAwesome
Copy link
Contributor Author

MasterAwesome commented Oct 31, 2022

@MasterAwesome Pardon, but what's the practical implication of this regression? Does it cause some sort of runtime problem, is it a code-size or performance problem, or something else? I'm afraid I can't read assembly.

The BTI C is required when supporting ARM64 that has BTI enabled and rust function is being called from a location that's expecting the bti c. This could lead to exceptions such as synchronous aborts. Especially since rust is being used in quite low layers and init-ed from a C world. On ARM platforms this is a compiler flaw; when a bug occurs on release mode it makes it really hard to debug it since the debug builds are broken.

In fact we don't know where in LLVM this is breaking yet which means even on release we could theoretically have this issue.

TLDR: It is a runtime problem that can cause a synchronous abort. It was working till LLVM-14; Broke during upgrade to LLVM-15

@apiraino
Copy link
Contributor

apiraino commented Nov 1, 2022

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-high

@rustbot rustbot added P-high High priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Nov 1, 2022
@nikic
Copy link
Contributor

nikic commented Nov 1, 2022

Not particular familiar with BTI, but if this works anything like CET I'd expect it to be an "all or nothing" thing. Your full LTO build likely contains a module that does not use BTI, so it's not possible to use CFI for anything.

Does this actually work with thin LTO and not just get automatically disabled at a higher layer due to a missing GNU property note on the object that causes this to be disabled in the full LTO case?

@nikic
Copy link
Contributor

nikic commented Nov 1, 2022

The difference between debug and release here is likely just that debug ends up using code from libstd (which presumably does not have BTI enabled, unless you're using build-std), while it is inlined in the release case.

So at least on the surface, I think everything works as expected here.

@MasterAwesome
Copy link
Contributor Author

MasterAwesome commented Nov 1, 2022

Right, but this is broken even with build-std as mentioned in the other thread as of nightly 2022-08-14.

Not particular familiar with BTI, but if this works anything like CET I'd expect it to be an "all or nothing" thing. Your full LTO build likely contains a module that does not use BTI, so it's not possible to use CFI for anything.

Does this actually work with thin LTO and not just get automatically disabled at a higher layer due to a missing GNU property note on the object that causes this to be disabled in the full LTO case?

correct, there's a module that's not part of std or something that's getting compiled with BTI attribute in the llvm ir off. Not sure what it is because it doesn't have a name just some hash? And this doesn't work in both release and debug refer #102162

I've also pasted the llvm ir in the regressed version where the code is just as straight forward to be inlined on release. And in bigger libraries, like the one I'm testing on an arm platform. This causes a synchronous abort unless I use the nightly-2022-08-12. If this is expected behavior, how do I mitigate that nightly-2022-08-14 has this changed behavior. Maybe the https://doc.rust-lang.org/nightly/unstable-book/compiler-flags/branch-protection.html is missing some documentation.

PS: I'd love to contribute whether it's documentation or a code change. Let me know :)

@MasterAwesome
Copy link
Contributor Author

MasterAwesome commented Dec 19, 2022

Looks like on debug mode, rustc_std_workspace_core has branch-target-enforcement set to 0 but on release mode it doesn't get compiled (probably because the compiler optimized that module out).

Using -Zbuild-std=core sets that to 1 on debug mode and the codegen is correct.

I still think the Min of all branch-target-enforcement is a wrong approach since the application requesting it requires it which should propagate across dependent modules and rebuild if needed. With nightly-2022-08-12 this behaviour was observed and regressed due to the patches mentioned in OP.

@MasterAwesome
Copy link
Contributor Author

MasterAwesome commented Dec 20, 2022

Looks like this is LLVM-15's expected behaviour to accept Min of module flags for BTI/PAC etc.

This results in the -Zbranch-protection=bti requiring -Zbuild-std UNLESS the std and other supporting crates are also built with BTI enabled.

As to why this doesn't show up on release builds on Fat LTO is because the modules required for the min never show up causing the min to be only the crate alone. For debug builds on this particular issue, it causes rustc_std_workspace_core to be built without BTI causing it to fallback to Min which is BTI disabled.

The only thing left is to support LLVM-15's Min option in rustc_codegen_llvm which is done in #105932

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-LTO Area: Link-time optimization (LTO) C-bug Category: This is a bug. P-high High priority regression-untriaged Untriaged performance or correctness regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants