Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscompilation on ARM-M with nightly-2021-04-23 #85351

Closed
cbiffle opened this issue May 15, 2021 · 15 comments · Fixed by #86020
Closed

Miscompilation on ARM-M with nightly-2021-04-23 #85351

cbiffle opened this issue May 15, 2021 · 15 comments · Fixed by #86020
Labels
C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example ICEBreaker-LLVM Bugs identified for the LLVM ICE-breaker group P-critical Critical priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@cbiffle
Copy link
Contributor

cbiffle commented May 15, 2021

We are seeing a subtle occasional miscompilation on ARM-M using nightly-2021-04-23 in rust-toolchain. It is difficult to elicit and reproduce, since subtle changes to the layout of the code will cause the compiler to make decisions that either do or do not trigger the bug. It appears to have something to do with stack frame maintenance in outlined functions. We are definitely observing it on thumbv8m.main-none-eabihf, but it's subtle enough that we may also be getting it on thumbv7em-none-eabihf and just haven't noticed it yet.

As of somewhat recently (late April?) output at opt-level = "z" has started including outlined functions that look like this (actual example):

000211e6 <OUTLINED_FUNCTION_2>:
   211e6:	f84d ed08 	str.w	lr, [sp, #-8]!
   211ea:	e9cd 5007 	strd	r5, r0, [sp, #28]
   211ee:	4620      	mov	r0, r4
   211f0:	f8ad 6034 	strh.w	r6, [sp, #52]	; 0x34
   211f4:	e9cd 1209 	strd	r1, r2, [sp, #36]	; 0x24
   211f8:	f7fe ff33 	bl	20062 <_ZN7userlib2hl7Message5fixed17h5f2a9abf3d25035aE>
   211fc:	2800      	cmp	r0, #0
   211fe:	f85d eb08 	ldr.w	lr, [sp], #8
   21202:	4770      	bx	lr

Now, note that the instructions at 0x211e6 and 0x211fe are setting up and tearing down a temporary stack frame, respectively. This will become important in a bit.

It appears that the stack frame offsets used in instructions while this temporary stack frame exists are not being updated to reflect its existence. Stack variables updated within the outlined function above are being deposited 8 bytes off where they should be.

I do not currently have a compact repro case, and the code in question has not yet been published (though I could arrange to publish it if it would help, we intend to open source it). Here are two execution traces of programs showing correct behavior vs corrupt behavior. Both traces set up arguments to a syscall, which uses struct return and deposits a struct onto the stack; the routines then shuffle the results around before calling a library function. It is during the shuffling that things go awry.

In this working trace I have called the struct return buffer in the stack frame R and another related-but-separate buffer B. I've omitted instructions that don't contribute by control flow or value-dominating the registers at the end. S refers to the value of the stack pointer on entry to the trace.

   2006e:	4606      	mov	r6, r0  ; r6 = B
   20070:	a801      	add	r0, sp, #4  ; r0 = R = S + 4
   20072:	460d      	mov	r5, r1
   20074:	9000      	str	r0, [sp, #0]
   20076:	4630      	mov	r0, r6
   20078:	2102      	movs	r1, #2
   2007a:	2200      	movs	r2, #0
   2007c:	2300      	movs	r3, #0
   2007e:	f001 f8bc 	bl	211fa <sys_recv_stub>
   20082:	2800      	cmp	r0, #0
    ...
   2009c:	9803      	ldr	r0, [sp, #12] ; r0 = [S + 12] = [R + 8]
    ...
   200a2:	e9dd 1204 	ldrd	r1, r2, [sp, #16]
    ; r1 = [S + 16] = [R + 12]
    ; r2 = [S + 20] = [R + 16]
    ...
   200b0:	f001 f89c 	bl	211ec <OUTLINED_FUNCTION_1>
    ; (following call)
    ...
   211f0:	e9cd 1203 	strd	r1, r2, [sp, #12]
    ; [S + 12] = r1 = [R + 12]
    ; [S + 16] = r2 = [R + 16]
   211f4:	e9cd 6001 	strd	r6, r0, [sp, #4]
    ; [S + 4] = r6 = B
    ; [S + 8] = [R + 8]
   211f8:	4770      	bx	lr
    ; (returns)
   200b4:	a801      	add	r0, sp, #4  ; r0 = S + 4 = R
   200b6:	f000 f875 	bl	201a4 <_ZN7userlib2hl7Message5fixed17h5f2a9abf3d25035aE>
    ; function invoked with r0 = R = S + 4
    ; words [S+4], [S+8], [S+12], [S+16] initialized
    ; everything's good

Now, here is the non-working trace with the same sort of annotations. Note that while the function at the end is still called with one argument R (stack frame plus 28), the actual struct being passed is deposited starting 8 bytes lower at stack frame plus 20:

   200e4:	ac07      	add	r4, sp, #28 ; r4 = S + 28 = R
    ...
   20106:	ad06      	add	r5, sp, #24 ; r5 = S + 24 = B
    ...
   2010a:	4628      	mov	r0, r5  ; r0 = S + 24
   2010c:	2102      	movs	r1, #2
   2010e:	2200      	movs	r2, #0
   20110:	2300      	movs	r3, #0
   20112:	9400      	str	r4, [sp, #0]  ; stack arg = r4 = S + 28 = R
   20114:	f001 f876 	bl	21204 <sys_recv_stub>
   20118:	2800      	cmp	r0, #0
    ...
   20136:	e9dd 120a 	ldrd	r1, r2, [sp, #40]
    ; r1 = [S + 40] = [R + 12]
    ; r2 = [S + 44] = [R + 16]
    ...
   20144:	f001 f84f 	bl	211e6 <OUTLINED_FUNCTION_2>
    ; (following call)
   211e6:	f84d ed08 	str.w	lr, [sp, #-8]!  ; sp = S - 8 <---- stack frame adjust
   211ea:	e9cd 5007 	strd	r5, r0, [sp, #28]
    ; [S + 20] = B
    ; [S + 24] = r0 = (known to be zero from CFG, omitted)
   211ee:	4620      	mov	r0, r4  ; r0 = r4 = R, set above, before call
    ...
   211f4:	e9cd 1209 	strd	r1, r2, [sp, #36]
    ; [S + 28] = r1 = [R + 12]
    ; [S + 32] = r2 = [R + 16]
   211f8:	f7fe ff33 	bl	20062 <_ZN7userlib2hl7Message5fixed17h5f2a9abf3d25035aE>
    ; function invoked with r0 = S + 28
    ; actual struct written at: S+20.

Additional notes:

  • We started seeing this after bumping rust-toolchain from nightly-2020-12-29 to nightly-2021-04-23, so the behavior was introduced somewhere between those points. (@luqmana points out that this likely includes the LLVM 11-12 transition.)
  • While we have seen this on ARMv7/8-M, that's because that's the architecture we're using -- it might affect other platforms, not sure.
  • We build mostly at opt-level = "z" but this may or may not be specific to that opt level.
  • This was found by @labbott. Thanks also to @bcantrill for helping reduce the behavior.

Meta

rustc --version --verbose:

rustc 1.53.0-nightly (7f4afdf02 2021-04-22)
binary: rustc
commit-hash: 7f4afdf0255600306bf67432da722c7b5d2cbf82
commit-date: 2021-04-22
host: x86_64-unknown-linux-gnu
release: 1.53.0-nightly
LLVM version: 12.0.0
@cbiffle cbiffle added the C-bug Category: This is a bug. label May 15, 2021
@azerupi
Copy link
Contributor

azerupi commented May 17, 2021

We are having problems with Rust 1.52 on thumbv6m-none-eabi while there were no problems on 1.51. This does indeed coincide with the LLVM 12 upgrade. I'm not sure if it's the same problem, but if it is we can add the following information:

We are usually compiling with

[profile.release]
codegen-units = 1
lto = true
debug = false
opt-level = "s"

But we tried without specifying the opt-level and it didn't make any difference.
The problem is dependent on the code, we have another branch that behaves just fine.

Unfortunately I can't share the code to reproduce.

@cbiffle
Copy link
Contributor Author

cbiffle commented May 17, 2021

@luqmana suggested adding -C llvm-args=--enable-machine-outliner=never to RUSTFLAGS and that fixes our software.

CC: @yroux

@cbiffle cbiffle changed the title Miscompilation on ARM-M with nightly-2021-04-23 using opt-level z Miscompilation on ARM-M with nightly-2021-04-23 May 17, 2021
cbiffle added a commit to oxidecomputer/hubris that referenced this issue May 17, 2021
labbott added a commit to oxidecomputer/hubris that referenced this issue May 17, 2021
The last toolchain bump exposed some issue with outlining, turn
it off for now

See rust-lang/rust#85351
labbott added a commit to oxidecomputer/hubris that referenced this issue May 17, 2021
The last toolchain bump exposed some issue with outlining, turn
it off for now

See rust-lang/rust#85351
@yroux
Copy link

yroux commented May 17, 2021

Thanks for the heads-up and analysis.

I confirm that the issue was introduced with LLVM-12 due to the last developments made on the Machine Outliner. Notice that it is only enabled under -Oz optimization level for 32-bit ARM M-profile and AArch64 targets, unless the --enable-machine-outliner flag is used. So, other targets should be fine and the suggested flag to disable machine outlining is a proper workaround.

To give you a bit more context here, Machine Outlining is a code size optimization which, in a nutshell, is the reverse of inlining (it replaces repeated sequences of instructions by function calls). In our case here, since the extracted peace of code contains a call, the link register needs to be saved on the entry of the block and restored to be able jump back to the call site, the offsets of the instruction which are using the stack are changed accordingly to reflect thios change, but it doesn't take into account that a stack pointer was saved into a register (r4) and used here as well.

I'll fix the issue in LLVM and let you know the status.

@cbiffle
Copy link
Contributor Author

cbiffle commented May 17, 2021

In our case here, since the extracted peace of code contains a call, the link register needs to be saved on the entry of the block and restored to be able jump back to the call site, the offsets of the instruction which are using the stack are changed accordingly to reflect thios change, but it doesn't take into account that a stack pointer was saved into a register (r4) and used here as well.

@yroux, I'm not sure this is correct. The stack pointer relative address saved in r4 is the address of the struct allocated in the stack frame at S + 28 (instruction at 200e4 in the second trace). The instructions starting at 211ea are filling that struct, but they are doing so using immediate 28 offsets, which would be the correct instructions in the absence of outlining, but which target the wrong address post-outlining. This suggests to me that they were not patched by the outliner.

You know this algorithm better than I do, of course, so let me know if I've missed something.

@yroux
Copy link

yroux commented May 17, 2021

@cbiffle no sorry, you are right, I read it too quickly and I shouldn't work that late ;-)
I'll look at the patching logic tomorrow

@rustbot rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. regression-from-stable-to-stable Performance or correctness regression from one stable version to another. regression-from-stable-to-beta Performance or correctness regression from stable to beta. labels May 18, 2021
@apiraino
Copy link
Contributor

Assigning priority as discussed in the Zulip thread of the Prioritization Working Group.

@rustbot label -I-prioritize +P-critical +T-compiler

@rustbot rustbot added P-critical Critical priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels May 19, 2021
@pnkfelix
Copy link
Member

@TriageBot ping llvm

@rustbot rustbot added the ICEBreaker-LLVM Bugs identified for the LLVM ICE-breaker group label May 20, 2021
@rustbot
Copy link
Collaborator

rustbot commented May 20, 2021

Hey LLVM ICE-breakers! This bug has been identified as a good
"LLVM ICE-breaking candidate". In case it's useful, here are some
instructions for tackling these sorts of bugs. Maybe take a look?
Thanks! <3

cc @camelid @comex @cuviper @DutchGhost @hdhoang @henryboisdequin @heyrutvik @higuoxing @JOE1994 @jryans @mmilenko @nagisa @nikic @Noah-Kennedy @SiavoshZarrasvand @spastorino @vertexclique

@Mark-Simulacrum
Copy link
Member

@rustbot ping llvm

@rustbot
Copy link
Collaborator

rustbot commented May 20, 2021

Hey LLVM ICE-breakers! This bug has been identified as a good
"LLVM ICE-breaking candidate". In case it's useful, here are some
instructions for tackling these sorts of bugs. Maybe take a look?
Thanks! <3

cc @camelid @comex @cuviper @DutchGhost @hdhoang @henryboisdequin @heyrutvik @higuoxing @JOE1994 @jryans @mmilenko @nagisa @nikic @Noah-Kennedy @SiavoshZarrasvand @spastorino @vertexclique

@nagisa
Copy link
Member

nagisa commented May 20, 2021

A couple points: IIRC outliner is a pretty recent addition to LLVM. Also, to work on this it is going to be important to have some code that reproduces this (I'm not seeing any, please tell me if I missed anything)

@nagisa nagisa added the E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example label May 20, 2021
@yroux
Copy link

yroux commented May 20, 2021

Machine Outliner initial support for ARM was added into LLVM-11, but it was improved to handled more cases (such as ld/st stack instructions involved in this issue) and enabled into -Oz for M-profile targets in LLVM-12 release.

I managed to reproduce the issue on a reduce LLVM MIR test, and I'm working on a fix.

@yroux
Copy link

yroux commented May 26, 2021

Issue reported into llvm bugzilla: https://bugs.llvm.org/show_bug.cgi?id=50481
Fix submitted: https://reviews.llvm.org/D103167

I hope to have it part of llvm 12.0.1 which is currently in RC1 state

@yroux
Copy link

yroux commented Jun 9, 2021

Fix commited into mainline as: https://reviews.llvm.org/rG6c78dbd4ca1f

@luqmana
Copy link
Member

luqmana commented Jun 10, 2021

Thanks @yroux! Will you be driving getting it into 12.0.1? I believe the deadline for getting fixes in is Friday.

EDIT: Ah, I see @nikic marked it as a release-12.0.1 blocker on the bug. Thanks!

bors added a commit to rust-lang-ci/rust that referenced this issue Jun 10, 2021
Disable the machine outliner by default

This addresses a codegen-issue that needs to be fixed upstream in LLVM.
While we wait for the fix, we can disable it.

Verified manually that the outliner is no longer run when
`-Copt-level=z` is specified, and also that you can override this with
`-Cllvm-args=-enable-machine-outliner` if you need it anyway.

A regression test is not really feasible in this instance, given that we
do not have any minimal reproducers.

Fixes rust-lang#85351

cc `@pnkfelix`
@bors bors closed this as completed in c63a1c0 Jun 10, 2021
Mark-Simulacrum pushed a commit to Mark-Simulacrum/rust that referenced this issue Jun 11, 2021
This addresses a codegen-issue that needs to be fixed upstream in LLVM.
While we wait for the fix, we can disable it.

Verified manually that the outliner is no longer run when
`-Copt-level=z` is specified, and also that you can override this with
`-Cllvm-args=-enable-machine-outliner` if you need it anyway.

A regression test is not really feasible in this instance, given that we
do not have any minimal reproducers.

Fixes rust-lang#85351
bors added a commit to rust-lang-ci/rust that referenced this issue Aug 28, 2021
Revert "Disable the machine outliner by default"

The fix commit is already in the fork: rust-lang/llvm-project@6c78dbd4ca1f
Linked:
- rust-lang#85351
- rust-lang#86020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example ICEBreaker-LLVM Bugs identified for the LLVM ICE-breaker group P-critical Critical priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants