Skip to content

Select8/Select16 expansion does not preserve existing basic block fallthrough #123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
carlos4242 opened this issue Dec 22, 2018 · 25 comments
Labels
A-llvm Affects the LLVM AVR backend bug has-llvm-commit This issue should be fixed in upstream LLVM has-reduced-testcase A small LLVM IR file exists that demonstrates the problem

Comments

@carlos4242
Copy link

I'm wondering if this is a regression on #83 or me using a wrong branch that doesn't have a fix in (i'm losing track) or a genuine new bug.

Anyway, I've built from this commit...

commit 2a1cdeadd3ea8e1eba9cc681037b83f07332763b (HEAD, tag: avr-support-7.0-llvm, origin/rust-llvm-release-8-0-0-v1)
Author: Alex Crichton <alex@alexcrichton.com>
Date:   Mon Jul 2 15:28:26 2018 -0700

    Fix compile on dist-x86_64-linux builder
    
    Apparently glibc is so old it doesn't have the _POSIX_ARG_MAX constant. This
    shouldn't affect anything we use anyway though.
    
    https://travis-ci.org/rust-lang/rust/jobs/399333071

and I'm seeing out of order assembly produced from my IR when optimisations are turned on in llc. I started noticing weird behaviour on my AVR program and finally tracked it to broken MC.

This function in LLVM IR...

define hidden void @_TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_(i8, i8) local_unnamed_addr #1 {
entry:
  switch i8 %0, label %9 [
    i8 6, label %2
    i8 7, label %8
  ]

; <label>:2:                                      ; preds = %entry
  %3 = icmp ugt i8 %1, 90
  %4 = icmp ult i8 %1, 5
  %. = select i1 %4, i8 5, i8 %1
  %5 = select i1 %3, i8 90, i8 %.
  store i8 %5, i8* getelementptr inbounds (%Vs5UInt8, %Vs5UInt8* @_Tv4main11delayFactorVs5UInt8, i64 0, i32 0), align 1
  %6 = zext i8 %5 to i32
  %7 = mul nuw nsw i32 %6, 100
  store i32 %7, i32* getelementptr inbounds (%Vs6UInt32, %Vs6UInt32* @_Tv4main7delayUsVs6UInt32, i64 0, i32 0), align 4
  tail call void @_TF3AVR11writeEEPROMFT7addressVs6UInt165valueVs5UInt8_T_(i16 34, i8 %5)
  br label %9

; <label>:8:                                      ; preds = %entry
  %not. = icmp ne i8 %1, 0
  %.2 = zext i1 %not. to i8
  store i1 %not., i1* getelementptr inbounds (%Sb, %Sb* @_Tv4main7enabledSb, i64 0, i32 0), align 1
  tail call void @_TF3AVR11writeEEPROMFT7addressVs6UInt165valueVs5UInt8_T_(i16 35, i8 %.2)
  br label %9

; <label>:9:                                      ; preds = %8, %2, %entry
  ret void
}

is producing this MC...

00000420 <_TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_>:
     420:	1f 93       	push	r17
     422:	87 30       	cpi	r24, 0x07	; 7
     424:	09 f4       	brne	.+2      	; 0x428 <LBB4_1>
     426:	26 c0       	rjmp	.+76     	; 0x474 <LBB4_8>

00000428 <LBB4_1>:
     428:	86 30       	cpi	r24, 0x06	; 6
     42a:	09 f0       	breq	.+2      	; 0x42e <LBB4_2>
     42c:	21 c0       	rjmp	.+66     	; 0x470 <LBB4_7>

0000042e <LBB4_2>:
     42e:	85 e0       	ldi	r24, 0x05	; 5
     430:	65 30       	cpi	r22, 0x05	; 5
     432:	08 f0       	brcs	.+2      	; 0x436 <LBB4_4>
     434:	86 2f       	mov	r24, r22

00000436 <LBB4_4>:
     436:	1a e5       	ldi	r17, 0x5A	; 90
     438:	6b 35       	cpi	r22, 0x5B	; 91
     43a:	08 f4       	brcc	.+2      	; 0x43e <LBB4_6>
     43c:	18 2f       	mov	r17, r24

0000043e <LBB4_6>:
     43e:	10 93 b8 01 	sts	0x01B8, r17
     442:	61 2f       	mov	r22, r17
     444:	77 27       	eor	r23, r23
     446:	24 e6       	ldi	r18, 0x64	; 100
     448:	30 e0       	ldi	r19, 0x00	; 0
     44a:	80 e0       	ldi	r24, 0x00	; 0
     44c:	90 e0       	ldi	r25, 0x00	; 0
     44e:	48 2f       	mov	r20, r24
     450:	59 2f       	mov	r21, r25
     452:	0e 94 33 16 	call	0x2c66	; 0x2c66 <__mulsi3>
     456:	90 93 bf 01 	sts	0x01BF, r25
     45a:	80 93 be 01 	sts	0x01BE, r24
     45e:	70 93 bd 01 	sts	0x01BD, r23
     462:	60 93 bc 01 	sts	0x01BC, r22
     466:	82 e2       	ldi	r24, 0x22	; 34
     468:	90 e0       	ldi	r25, 0x00	; 0
     46a:	61 2f       	mov	r22, r17
     46c:	0e 94 fd 02 	call	0x5fa	; 0x5fa <_TF3AVR11writeEEPROMFT7addressVs6UInt165valueVs5UInt8_T_>

00000470 <LBB4_7>:
     470:	1f 91       	pop	r17
     472:	08 95       	ret

00000474 <LBB4_8>:
     474:	21 e0       	ldi	r18, 0x01	; 1
     476:	60 30       	cpi	r22, 0x00	; 0
     478:	09 f4       	brne	.+2      	; 0x47c <LBB4_10>
     47a:	20 e0       	ldi	r18, 0x00	; 0

0000047c <LBB4_10>:
     47c:	82 2f       	mov	r24, r18
     47e:	81 70       	andi	r24, 0x01	; 1
     480:	80 93 c0 01 	sts	0x01C0, r24
     484:	83 e2       	ldi	r24, 0x23	; 35
     486:	90 e0       	ldi	r25, 0x00	; 0
     488:	62 2f       	mov	r22, r18
     48a:	0e 94 fd 02 	call	0x5fa	; 0x5fa <_TF3AVR11writeEEPROMFT7addressVs6UInt165valueVs5UInt8_T_>

; *** falls through to the next method ***

0000048e <_TF4main11updateDelayFVs5UInt8T_>:
     48e:	1f 93       	push	r17
     490:	95 e0       	...

It looks like the BB LBB4_7 should be at the end.

Any suggestions?

@carlos4242
Copy link
Author

@carlos4242
Copy link
Author

Before I lost my laptop, I found the step that was causing the broken reordering. The pass in question was "Control Flow Optimizer". After that pass, the code is in the wrong order, with the last block falling through into the next function. Unfortunately I lost the debug output with the laptop but the instruction order is definitely fine before this step and definitely broken after this step. And it stays broken all the way down into machine code.

What I can't quite understand is how this pass can cause broken ordering because it does not seem to have target specific code?

@carlos4242
Copy link
Author

carlos4242 commented Jan 1, 2019

Putting more colour in this, if I compile using the same build of llc but for target x86 then it works fine. So it's pretty definitely an AVR thing. :-/

Look at the code in __TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_.

x86 code is ordered sensibly after branch folding...

llc -O3 -march=x86 -filetype=asm -o - mc-block-order-regression.ll

	.section	__TEXT,__text,regular,pure_instructions
	.macosx_version_min 10, 9
	.globl	_main                   ## -- Begin function main
	.p2align	4, 0x90
_main:                                  ## @main
	.cfi_startproc
## %bb.0:                               ## %entry
	subl	$12, %esp
	.cfi_def_cfa_offset 16
	movl	$__TToF4mainU_FTVs5UInt8S0__T_, (%esp)
	calll	__TF3AVR36i2cSlaveSetupRegisterReceiveCallbackFT8callbackcTVs5UInt8S0__T__T_
	.p2align	4, 0x90
LBB0_1:                                 ## =>This Inner Loop Header: Depth=1
	jmp	LBB0_1
	.cfi_endproc
                                        ## -- End function
	.private_extern	__TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_ ## -- Begin function _TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_
	.globl	__TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_
	.p2align	4, 0x90
__TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_: ## @_TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_
	.cfi_startproc
## %bb.0:                               ## %entry
	subl	$12, %esp
	.cfi_def_cfa_offset 16
	movb	20(%esp), %al
	movb	16(%esp), %cl
	cmpb	$7, %cl
	je	LBB1_7
## %bb.1:                               ## %entry
	cmpb	$6, %cl
	jne	LBB1_9
## %bb.2:
	cmpb	$5, %al
	movb	$5, %cl
	jb	LBB1_4
## %bb.3:
	movl	%eax, %ecx
LBB1_4:
	cmpb	$90, %al
	movb	$90, %al
	ja	LBB1_6
## %bb.5:
	movl	%ecx, %eax
LBB1_6:
	movb	%al, __Tv4main11delayFactorVs5UInt8
	movzbl	%al, %eax
	imull	$100, %eax, %ecx
	movl	%ecx, __Tv4main7delayUsVs6UInt32
	movl	%eax, 4(%esp)
	movl	$34, (%esp)
	jmp	LBB1_8
LBB1_7:
	xorl	%ecx, %ecx
	testb	%al, %al
	setne	%cl
	setne	__Tv4main7enabledSb
	movl	%ecx, 4(%esp)
	movl	$35, (%esp)
LBB1_8:
	calll	__TF3AVR11writeEEPROMFT7addressVs6UInt165valueVs5UInt8_T_
LBB1_9:
	addl	$12, %esp
	retl
	.cfi_endproc
                                        ## -- End function
	.private_extern	__TF4main11updateDelayFVs5UInt8T_ ## -- Begin function _TF4main11updateDelayFVs5UInt8T_
	.globl	__TF4main11updateDelayFVs5UInt8T_
	.p2align	4, 0x90
__TF4main11updateDelayFVs5UInt8T_:      ## @_TF4main11updateDelayFVs5UInt8T_
	.cfi_startproc
## %bb.0:                               ## %entry
	movb	4(%esp), %al
	movb	%al, __Tv4main11delayFactorVs5UInt8
	retl
	.cfi_endproc
                                        ## -- End function
	.private_extern	__TToF4mainU_FTVs5UInt8S0__T_ ## -- Begin function _TToF4mainU_FTVs5UInt8S0__T_
	.globl	__TToF4mainU_FTVs5UInt8S0__T_
	.weak_definition	__TToF4mainU_FTVs5UInt8S0__T_
	.p2align	4, 0x90
__TToF4mainU_FTVs5UInt8S0__T_:          ## @_TToF4mainU_FTVs5UInt8S0__T_
	.cfi_startproc
## %bb.0:                               ## %entry
	subl	$12, %esp
	.cfi_def_cfa_offset 16
	movl	20(%esp), %eax
	movl	%eax, 4(%esp)
	movl	16(%esp), %eax
	movl	%eax, (%esp)
	calll	__TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_
	addl	$12, %esp
	retl
	.cfi_endproc
                                        ## -- End function
	.private_extern	__Tv4main11delayFactorVs5UInt8 ## @_Tv4main11delayFactorVs5UInt8
	.globl	__Tv4main11delayFactorVs5UInt8
.zerofill __DATA,__common,__Tv4main11delayFactorVs5UInt8,1,0
	.private_extern	__Tv4main7delayUsVs6UInt32 ## @_Tv4main7delayUsVs6UInt32
	.globl	__Tv4main7delayUsVs6UInt32
.zerofill __DATA,__common,__Tv4main7delayUsVs6UInt32,4,2
	.private_extern	__Tv4main7enabledSb ## @_Tv4main7enabledSb
	.globl	__Tv4main7enabledSb
.zerofill __DATA,__common,__Tv4main7enabledSb,1,0

.subsections_via_symbols

But AVR code is miscompiled...

llc -O3 -march=avr -filetype=asm -o - mc-block-order-regression.ll

	.text
	.file	"mc-block-order-regression.ll"
	.globl	main                    ; -- Begin function main
	.p2align	1
	.type	main,@function
main:                                   ; @main
; %bb.0:                                ; %entry
	ldi	r24, pm_lo8(_TToF4mainU_FTVs5UInt8S0__T_)
	ldi	r25, pm_hi8(_TToF4mainU_FTVs5UInt8S0__T_)
	call	_TF3AVR36i2cSlaveSetupRegisterReceiveCallbackFT8callbackcTVs5UInt8S0__T__T_
LBB0_1:                                 ; =>This Inner Loop Header: Depth=1
	rjmp	LBB0_1
.Lfunc_end0:
	.size	main, .Lfunc_end0-main
                                        ; -- End function
	.hidden	_TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_ ; -- Begin function _TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_
	.globl	_TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_
	.p2align	1
	.type	_TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_,@function
_TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_: ; @_TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_
; %bb.0:                                ; %entry
	push	r17
	cpi	r24, 7
	brne	LBB1_1
	rjmp	LBB1_8
LBB1_1:                                 ; %entry
	cpi	r24, 6
	breq	LBB1_2
	rjmp	LBB1_7
LBB1_2:
	ldi	r24, 5
	cpi	r22, 5
	brlo	LBB1_4
; %bb.3:
	mov	r24, r22
LBB1_4:
	ldi	r17, 90
	cpi	r22, 91
	brsh	LBB1_6
; %bb.5:
	mov	r17, r24
LBB1_6:
	sts	_Tv4main11delayFactorVs5UInt8, r17
	mov	r22, r17
	clr	r23
	ldi	r18, 100
	ldi	r19, 0
	ldi	r24, 0
	ldi	r25, 0
	mov	r20, r24
	mov	r21, r25
	call	__mulsi3
	sts	_Tv4main7delayUsVs6UInt32+3, r25
	sts	_Tv4main7delayUsVs6UInt32+2, r24
	sts	_Tv4main7delayUsVs6UInt32+1, r23
	sts	_Tv4main7delayUsVs6UInt32, r22
	ldi	r24, 34
	ldi	r25, 0
	mov	r22, r17
	call	_TF3AVR11writeEEPROMFT7addressVs6UInt165valueVs5UInt8_T_
LBB1_7:
	pop	r17
	ret
LBB1_8:
	ldi	r18, 1
	cpi	r22, 0
	brne	LBB1_10
; %bb.9:
	ldi	r18, 0
LBB1_10:
	mov	r24, r18
	andi	r24, 1
	sts	_Tv4main7enabledSb, r24
	ldi	r24, 35
	ldi	r25, 0
	mov	r22, r18
	call	_TF3AVR11writeEEPROMFT7addressVs6UInt165valueVs5UInt8_T_
.Lfunc_end1:
	.size	_TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_, .Lfunc_end1-_TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_
                                        ; -- End function
	.hidden	_TF4main11updateDelayFVs5UInt8T_ ; -- Begin function _TF4main11updateDelayFVs5UInt8T_
	.globl	_TF4main11updateDelayFVs5UInt8T_
	.p2align	1
	.type	_TF4main11updateDelayFVs5UInt8T_,@function
_TF4main11updateDelayFVs5UInt8T_:       ; @_TF4main11updateDelayFVs5UInt8T_
; %bb.0:                                ; %entry
	sts	_Tv4main11delayFactorVs5UInt8, r24
	ret
.Lfunc_end2:
	.size	_TF4main11updateDelayFVs5UInt8T_, .Lfunc_end2-_TF4main11updateDelayFVs5UInt8T_
                                        ; -- End function
	.hidden	_TToF4mainU_FTVs5UInt8S0__T_ ; -- Begin function _TToF4mainU_FTVs5UInt8S0__T_
	.weak	_TToF4mainU_FTVs5UInt8S0__T_
	.p2align	1
	.type	_TToF4mainU_FTVs5UInt8S0__T_,@function
_TToF4mainU_FTVs5UInt8S0__T_:           ; @_TToF4mainU_FTVs5UInt8S0__T_
; %bb.0:                                ; %entry
	call	_TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_
	ret
.Lfunc_end3:
	.size	_TToF4mainU_FTVs5UInt8S0__T_, .Lfunc_end3-_TToF4mainU_FTVs5UInt8S0__T_
                                        ; -- End function
	.hidden	_Tv4main11delayFactorVs5UInt8 ; @_Tv4main11delayFactorVs5UInt8
	.type	_Tv4main11delayFactorVs5UInt8,@object
	.section	.bss,"aw",@nobits
	.globl	_Tv4main11delayFactorVs5UInt8
_Tv4main11delayFactorVs5UInt8:
	.zero	1
	.size	_Tv4main11delayFactorVs5UInt8, 1

	.hidden	_Tv4main7delayUsVs6UInt32 ; @_Tv4main7delayUsVs6UInt32
	.type	_Tv4main7delayUsVs6UInt32,@object
	.globl	_Tv4main7delayUsVs6UInt32
	.p2align	2
_Tv4main7delayUsVs6UInt32:
	.zero	4
	.size	_Tv4main7delayUsVs6UInt32, 4

	.hidden	_Tv4main7enabledSb      ; @_Tv4main7enabledSb
	.type	_Tv4main7enabledSb,@object
	.globl	_Tv4main7enabledSb
_Tv4main7enabledSb:
	.zero	1
	.size	_Tv4main7enabledSb, 1


	; Declaring this symbol tells the CRT that it should
	;copy all variables from program memory to RAM on startup
	.globl	__do_copy_data
	; Declaring this symbol tells the CRT that it should
	;clear the zeroed data section on startup
	.globl	__do_clear_bss

@carlos4242
Copy link
Author

@brainlag, @TimNN, @dylanmckay can anyone give any ideas how to track this down?

I've done a full debug/pass log on this bug: https://gist.github.com/carlos4242/bb47ee95ba4b3638a58be92316c80c88

I'm struggling to debug this because I'm not sure what pass is at fault. Earlier I thought it was the "Control Flow Optimiser" but I'm not so sure now. For a start, I couldn't figure out why this issue is only affecting AVR (I did an output to X86_64 and it's fine). I think that pass is rearranging the machine basic blocks but the issue is that one of the blocks should have a branch at the end to jump to the function epilog.

Looking about 1/3 of the way down, I can see this...

# *** IR Dump After AVR DAG->DAG Instruction Selection ***:
# Machine code for function _TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_: IsSSA, TracksLiveness
Function Live Ins: $r24 in %0, $r22 in %1

bb.0.entry:
  successors: %bb.2(0x2aaaaaab), %bb.4(0x55555555); %bb.2(33.33%), %bb.4(66.67%)
  liveins: $r24, $r22
  %1:ld8 = COPY $r22
  %0:ld8 = COPY $r24
  CPIRdK %0:ld8, 7, implicit-def $sreg
  BREQk %bb.2, implicit $sreg
  RJMPk %bb.4

bb.4.entry:
; predecessors: %bb.0
  successors: %bb.1(0x40000001), %bb.3(0x3fffffff); %bb.1(50.00%), %bb.3(50.00%)

  CPIRdK %0:ld8, 6, implicit-def $sreg
  BRNEk %bb.3, implicit $sreg
  RJMPk %bb.1

bb.1 (%ir-block.2):
; predecessors: %bb.4
  successors: %bb.3(0x80000000); %bb.3(200.00%)

  %7:ld8 = LDIRdK 5
  CPIRdK %1:ld8, 5, implicit-def $sreg
  %8:gpr8 = Select8 killed %7:ld8, %1:ld8, 5, implicit $sreg
  %9:ld8 = LDIRdK 90
  CPIRdK %1:ld8, 91, implicit-def $sreg
  %10:gpr8 = Select8 killed %9:ld8, killed %8:gpr8, 4, implicit $sreg
  STSKRr @_Tv4main11delayFactorVs5UInt8, %10:gpr8 :: (store 1 into `i8* getelementptr inbounds (%Vs5UInt8, %Vs5UInt8* @_Tv4main11delayFactorVs5UInt8, i64 0, i32 0)`)
  %11:dregs = ZEXT %10:gpr8, implicit-def dead $sreg
  ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp
  %12:dldregs = LDIWRdK 100
  %13:dldregs = LDIWRdK 0
  $r23r22 = COPY %11:dregs
  $r25r24 = COPY %13:dldregs
  $r19r18 = COPY %12:dldregs
  $r21r20 = COPY %13:dldregs
  CALLk &__mulsi3, <regmask $r2 $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $r12 $r13 $r14 $r15 $r16 $r17 $r28 $r29 $r3r2 $r5r4 $r7r6 $r9r8 $r11r10 $r13r12 $r15r14 $r17r16 $r29r28>, implicit $sp, implicit $r23r22, implicit $r25r24, implicit $r19r18, implicit $r21r20, implicit-def $sp, implicit-def $r23r22, implicit-def $r25r24
  ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp
  %14:dregs = COPY $r23r22
  %15:dregs = COPY $r25r24
  STSWKRr @_Tv4main7delayUsVs6UInt32 + 2, %15:dregs :: (store 2 into `i32* getelementptr inbounds (%Vs6UInt32, %Vs6UInt32* @_Tv4main7delayUsVs6UInt32, i64 0, i32 0)` + 2)
  STSWKRr @_Tv4main7delayUsVs6UInt32, %14:dregs :: (store 2 into `i32* getelementptr inbounds (%Vs6UInt32, %Vs6UInt32* @_Tv4main7delayUsVs6UInt32, i64 0, i32 0)`, align 4)
  ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp
  %16:dldregs = LDIWRdK 34
  $r25r24 = COPY %16:dldregs
  $r22 = COPY %10:gpr8
  CALLk @_TF3AVR11writeEEPROMFT7addressVs6UInt165valueVs5UInt8_T_, <regmask $r2 $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $r12 $r13 $r14 $r15 $r16 $r17 $r28 $r29 $r3r2 $r5r4 $r7r6 $r9r8 $r11r10 $r13r12 $r15r14 $r17r16 $r29r28>, implicit $sp, implicit $r25r24, implicit $r22, implicit-def $sp
  ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp
  RJMPk %bb.3

bb.2 (%ir-block.8):
; predecessors: %bb.0
  successors: %bb.3(0x80000000); %bb.3(200.00%)

  %2:ld8 = LDIRdK 0
  %3:ld8 = LDIRdK 1
  CPIRdK %1:ld8, 0, implicit-def $sreg
  %4:ld8 = Select8 killed %3:ld8, killed %2:ld8, 1, implicit $sreg
  %5:ld8 = ANDIRdK %4:ld8, 1, implicit-def dead $sreg
  STSKRr @_Tv4main7enabledSb, killed %5:ld8 :: (store 1 into `i1* getelementptr inbounds (%Sb, %Sb* @_Tv4main7enabledSb, i64 0, i32 0)`)
  ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp
  %6:dldregs = LDIWRdK 35
  $r25r24 = COPY %6:dldregs
  $r22 = COPY %4:ld8
  CALLk @_TF3AVR11writeEEPROMFT7addressVs6UInt165valueVs5UInt8_T_, <regmask $r2 $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $r12 $r13 $r14 $r15 $r16 $r17 $r28 $r29 $r3r2 $r5r4 $r7r6 $r9r8 $r11r10 $r13r12 $r15r14 $r17r16 $r29r28>, implicit $sp, implicit $r25r24, implicit $r22, implicit-def $sp
  ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp

bb.3 (%ir-block.9):
; predecessors: %bb.4, %bb.2, %bb.1

  RET

# End machine code for function _TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_.

...comparing the basic blocks, e.g. bb.1 and bb.2, I can see that both are similar but bb.1 ends with a RJMP and with bb.2 it is missing. I think the issue is after the basic blocks get rearranged, late in optimisation, bb.2 is put at the end of the machine code, after bb.3, the function exit block and because it doesn't have a branch/jump, execution drops out of the end of the function and on into whatever code is placed after the function after compiling/linking.

From a naive point of view, I'd say that either legalisation or instruction selection for bb.2 has gone wrong and has removed the trailing jump/branch.

Anyone else have any ideas?

@carlos4242
Copy link
Author

Confirmed on llvm branch avr-rust-2018-12-18

@carlos4242
Copy link
Author

Update for the benefit of @DoubleHyhpen_gitlab... the original Swift code that this LLVM came from...

import AVR
typealias IntegerLiteralType = UInt8

SetupSerial(baudRate: 9600)
print(message: Starting3)

// configuration constants
let triacPin = 4
let storedBrightnessLocation: UInt16 = 34
let storedOnOffLocation: UInt16 = 35
let brightnessi2cRegister = 6
let onOffi2cRegister = 7
let rotaryPin1 = 6
let rotaryPin2 = 8
let centerPin = 7

print(message: Another0)


// defaults until EEPROM is read
var delayFactor = 90
var delayUs: UInt32 = 9000
var enabled = false

// setup triac pin
pinMode(pin: triacPin, mode: OUTPUT)

func boolForUInt8(_ value: UInt8) -> Bool {
  if value > 0 {
    return true
  } else {
    return false
  }
}


func uint8ForBool(_ value: Bool) -> UInt8 {
  if value {
    return 1
  }

  return 0
}

func i2cUpdate(register: UInt8, value: UInt8) {
  if register == brightnessi2cRegister {
    updateDelay(value)
  } else if register == onOffi2cRegister {
    updateEnabled(boolForUInt8(value))
  }
}

func i2cRead(register: UInt8) -> UInt8 {
  if register == brightnessi2cRegister {
    return delayFactor
  } else if register == onOffi2cRegister {
    return uint8ForBool(enabled)
  }

  return 0
}

// the update function, calcs and storage
func updateDelay(_ newDelayFactorIn: UInt8) {
  var newDelayFactor = newDelayFactorIn

  if newDelayFactor > 90 {
    newDelayFactor = 90
  }

  if newDelayFactor < 5 {
    newDelayFactor = 5
  }

  delayFactor = newDelayFactor

  delayUs = UInt32(newDelayFactor) &* 100

  writeEEPROM(address: storedBrightnessLocation, value: newDelayFactor)
}

func updateEnabled(_ newEnabled: Bool) {
  enabled = newEnabled
  writeEEPROM(address: storedOnOffLocation, value: uint8ForBool(newEnabled))
}


// initialise brightness from EEPROM
updateDelay(readEEPROM(address: storedBrightnessLocation))
enabled = boolForUInt8(readEEPROM(address: storedOnOffLocation))



// the core of the program, a delayed triac fire a set time after zero cross
setupPin2InterruptCallback(edgeType: RISING_EDGE) {
  setupTimerSingleShotInterruptCallback(microseconds: delayUs) {
    digitalWrite(pin: triacPin, value: HIGH)
    delay(microseconds: 200)
    digitalWrite(pin: triacPin, value: LOW)
  }
}

// Use I2C to communicate to the pi for remote control
i2cSlaveSetupRegisterReceiveCallback { register, value in
  i2cUpdate(register: register, value: value)
}

i2cSlaveSetupRegisterSendCallback { register -> UInt8 in
  return i2cRead(register: register)
}

setupI2CSlave(address: 0x23)

func incrementBrightness() {
  print(message: Brightness0)
  updateDelay(delayFactor &+ 5)
}

func decrementBrightness() {
  print(message: Brightness1)
  updateDelay(delayFactor &- 5)  

}

func centerPinPressed() {
  print(message: Center2)
//  updateEnabled(!enabled)
}

//var lastPinState: PinsState = (false, false, false)
//
//pinMode(pin: rotaryPin1, mode: INPUT)
//digitalWrite(pin: rotaryPin1, value: HIGH)
//pinMode(pin: rotaryPin2, mode: INPUT)
//digitalWrite(pin: rotaryPin2, value: HIGH)
//pinMode(pin: centerPin, mode: INPUT)
//digitalWrite(pin: centerPin, value: HIGH)

print(message: Started1)

while (true) {

//  checkRotaryEncoder(
//    pin1: rotaryPin1,
//    pin2: rotaryPin2,
//    centerPin: centerPin,
//    lastPinState: &lastPinState,
//    clockwise: incrementBrightness,
//    counterclockwise: decrementBrightness,
//    centerPinPressed: centerPinPressed)

}

The overall program runs a mains dimmer switch for a light in my front room! It's I2C controlled by my raspberry pi, which is running homebridge.js.

The function that is miscompiling is i2cUpdate. It is compiled down to the IR shown at the start of this issue by the swift compiler, which also includes inlining the updateDelay and updateEnabled functions. Hope this helps.

@dylanmckay
Copy link
Member

This happens at -O1 but not at -O0.

@dylanmckay
Copy link
Member

I've written a script that takes an LLVM IR test file and generates 89 different permutations, each disabling a different pass in the LLVM pipeline.

I've ran the test with each of these 89 passes disabled, and the issue still occurs with each of them disabled.

Here is the script.

#! /usr/bin/env ruby

require 'fileutils'

file_path = ARGV[0] or raise "no LL test file given on command line"
file_contents = File.read(file_path)

possible_passes_raw = <<END
-disable-2addr-hack
-disable-a15-sd-optimization
-disable-adv-copy-opt
-disable-basicaa
-disable-block-placement
-disable-bpf-peephole
-disable-branch-fold
-disable-cgp
-disable-cgp-branch-opts
-disable-cgp-ext-ld-promotion
-disable-cgp-gc-opts
-disable-cgp-select2branch
-disable-cgp-store-extract
-disable-cleanups
-disable-complex-addr-modes
-disable-const64
-disable-constant-hoisting
-disable-copyprop
-disable-debug-info-print
-disable-demotion
-disable-dfa-sched
-disable-early-ifcvt
-disable-early-taildup
-disable-gisel-legality-check
-disable-hcp
-disable-hsdr
-disable-icp
-disable-ifcvt-diamond
-disable-ifcvt-forked-diamond
-disable-ifcvt-simple
-disable-ifcvt-simple-false
-disable-ifcvt-triangle
-disable-ifcvt-triangle-false
-disable-ifcvt-triangle-false-rev
-disable-ifcvt-triangle-rev
-disable-inlined-alloca-merging
-disable-interleaved-load-combine
-disable-lanai-mem-alu-combiner
-disable-lftr
-disable-libcalls-shrinkwrap
-disable-licm-promotion
-disable-lsr
-disable-machine-cse
-disable-machine-dce
-disable-machine-licm
-disable-machine-sink
-disable-memcpy-idiom
-disable-memmove-idiom
-disable-memop-opt
-disable-merge-into-combines
-disable-mergeicmps
-disable-mr-partial-inlining
-disable-non-allocatable-phys-copy-opt
-disable-nounwind-inference
-disable-ondemand-mds-loading
-disable-packetizer
-disable-partial-inlining
-disable-partial-libcall-inlining
-disable-peephole
-disable-phi-elim-edge-splitting
-disable-post-ra
-disable-postra-machine-licm
-disable-postra-machine-sink
-disable-preheader-prot
-disable-preinline
-disable-promote-alloca-to-lds
-disable-promote-alloca-to-vector
-disable-sched-critical-path
-disable-sched-cycles
-disable-sched-hazard
-disable-sched-height
-disable-sched-live-uses
-disable-sched-physreg-join
-disable-sched-reg-pressure
-disable-sched-stalls
-disable-sched-vrcycle
-disable-separate-const-offset-from-gep
-disable-shifter-op
-disable-simplify-libcalls
-disable-spill-fusing
-disable-spill-hoist
-disable-ssc
-disable-store-widen
-disable-symbolication
-disable-tail-calls
-disable-tail-duplicate
-disable-vecdbl-nv-stores
-disable-verify
-disable-vp
END

possible_passes = possible_passes_raw.lines.map(&:chomp)

new_dir_name = "#{File.basename(file_path)}-without-passes"
new_dir = File.join(File.dirname(file_path), new_dir_name)

FileUtils.mkdir_p(new_dir)

possible_passes.each do |possible_pass|
  pass_name = possible_pass.gsub("-disable-", "")

  test_without_pass = file_contents.gsub("$DISABLE_PASSES", possible_pass)

  if test_without_pass == file_contents
    raise "file does not contain a '$DISABLE_PASSES' for substitution"
  end

  test_path = File.join(new_dir, "disable-#{pass_name}.ll")
  File.write(test_path, test_without_pass)
end

The bug is not caused by any of the passes mentioned.

@dylanmckay
Copy link
Member

Here is the output with control flow optimiztion disabled (under the branch folding pass)

_TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_: ; @_TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_
; %bb.0:                                ; %entry
	push	r17
	cpi	r24, 7
	brne	LBB0_1
	rjmp	LBB0_7
LBB0_1:                                 ; %entry
	cpi	r24, 6
	breq	LBB0_2
	rjmp	LBB0_10
LBB0_2:
	ldi	r24, 5
	cpi	r22, 5
	brlo	LBB0_4
; %bb.3:
	mov	r24, r22
LBB0_4:
	ldi	r17, 90
	cpi	r22, 91
	brsh	LBB0_6
; %bb.5:
	mov	r17, r24
LBB0_6:
	sts	_Tv4main11delayFactorVs5UInt8, r17
	mov	r22, r17
	clr	r23
	ldi	r18, 100
	ldi	r19, 0
	ldi	r20, 0
	ldi	r21, 0
	movw	r24, r20
	call	__mulsi3
	sts	_Tv4main7delayUsVs6UInt32+3, r25
	sts	_Tv4main7delayUsVs6UInt32+2, r24
	sts	_Tv4main7delayUsVs6UInt32+1, r23
	sts	_Tv4main7delayUsVs6UInt32, r22
	ldi	r24, 34
	ldi	r25, 0
	mov	r22, r17
	call	_TF3AVR11writeEEPROMFT7addressVs6UInt165valueVs5UInt8_T_
	pop	r17
	ret
LBB0_7:
	ldi	r18, 1
	cpi	r22, 0
	brne	LBB0_9
; %bb.8:
	ldi	r18, 0
LBB0_9:
	mov	r24, r18
	andi	r24, 1
	sts	_Tv4main7enabledSb, r24
	ldi	r24, 35
	ldi	r25, 0
	mov	r22, r18
	call	_TF3AVR11writeEEPROMFT7addressVs6UInt165valueVs5UInt8_T_
	pop	r17
	ret
LBB0_10:
	pop	r17
	ret

@dylanmckay
Copy link
Member

It's definitely occurring in the branch folding pass.

@dylanmckay
Copy link
Member

dylanmckay commented Jan 20, 2019

The incorrect fallthrough is being created somewhere within this block in BranchFolding.cpp.

I figured this out by progressively guarding all of the different optimization logic blocks in the pass with if(false) until the file came out correctly.

@dylanmckay
Copy link
Member

Here, the branch folding pass is trying to minimize the number of instructions (particularly, branch instructions) by reorganizing basic blocks so that basic blocks fallthrough to each other, rather than requiring explicitly branch instructions.

In this particular case, LLVM notices that %bb.10, the terminating basic block with the RET instruction, does not fall through to any basic block, nor do any of it's predecessor basic blocks fall through to it. It has two predecessors; %bb.1 and %bb.4.

LLVM is deciding to move %bb.10 directly beneath %bb.4 so that %bb.4 call simply fall through to %bb.10 without an explicit branch. %bb.1 still requires an unconditional branch to %bb.10.

%bb.10:

bb.10 (%ir-block.9):
; predecessors: %bb.1, %bb.4

  $r17 = POPRd implicit-def $sp, implicit $sp
  RET
$13 = void

%bb.4:

bb.4 (%ir-block.2):
; predecessors: %bb.3, %bb.5
  successors: %bb.10(0x80000000); %bb.10(100.00%)
  liveins: $r17
  STSKRr @_Tv4main11delayFactorVs5UInt8, $r17 :: (store 1 into `i8* getelementptr inbounds (%Vs5UInt8, %Vs5UInt8* @_Tv4main11delayFactorVs5UInt8, i64 0, i32 0)`)
  $r23r22 = ZEXT $r17, implicit-def dead $sreg
  $r19r18 = LDIWRdK 100
  $r21r20 = LDIWRdK 0
  $r25r24 = COPY $r21r20
  CALLk &__mulsi3, <regmask $r2 $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $r12 $r13 $r14 $r15 $r16 $r17 $r28 $r29 $r3r2 $r5r4 $r7r6 $r9r8 $r11r10 $r13r12 $r15r14 $r17r16 $r29r28>, implicit $sp, implicit $r23r22, implicit $r25r24, implicit $r19r18, implicit $r21r20, implicit-def $sp, implicit-def $r23r22, implicit-def $r25r24
  STSWKRr @_Tv4main7delayUsVs6UInt32 + 2, killed $r25r24 :: (store 2 into `i32* getelementptr inbounds (%Vs6UInt32, %Vs6UInt32* @_Tv4main7delayUsVs6UInt32, i64 0, i32 0)` + 2)
  STSWKRr @_Tv4main7delayUsVs6UInt32, killed $r23r22 :: (store 2 into `i32* getelementptr inbounds (%Vs6UInt32, %Vs6UInt32* @_Tv4main7delayUsVs6UInt32, i64 0, i32 0)`, align 4)
  $r25r24 = LDIWRdK 34
  $r22 = COPY killed $r17
  CALLk @_TF3AVR11writeEEPROMFT7addressVs6UInt165valueVs5UInt8_T_, <regmask $r2 $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $r12 $r13 $r14 $r15 $r16 $r17 $r28 $r29 $r3r2 $r5r4 $r7r6 $r9r8 $r11r10 $r13r12 $r15r14 $r17r16 $r29r28>, implicit $sp, implicit $r25r24, implicit $r22, implicit-def $sp
  RJMPk %bb.10
$14 = void

@dylanmckay
Copy link
Member

I think the problem is here.

This is an IR dump right at the start of the BranchFolding pass, before any branch folding actually occurs.

bb.8 (%ir-block.8):
; predecessors: %bb.7, %bb.9
  successors: %bb.10(0x80000000); %bb.10(100.00%)
  liveins: $r18
  $r24 = COPY $r18
  $r24 = ANDIRdK killed $r24(tied-def 0), 1, implicit-def dead $sreg
  STSKRr @_Tv4main7enabledSb, killed $r24 :: (store 1 into `i1* getelementptr inbounds (%Sb, %Sb* @_Tv4main7enabledSb, i64 0, i32 0)`)
  $r25r24 = LDIWRdK 35
  $r22 = COPY killed $r18
  CALLk @_TF3AVR11writeEEPROMFT7addressVs6UInt165valueVs5UInt8_T_, <regmask $r2 $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $r12 $r13 $r14 $r15 $r16 $r17 $r28 $r29 $r3r2 $r5r4 $r7r6 $r9r8 $r11r10 $r13r12 $r15r14 $r17r16 $r29r28>, implicit $sp, implicit $r25r24, implicit $r22, implicit-def $sp

bb.9 (%ir-block.8):
; predecessors: %bb.7
  successors: %bb.8(0x80000000); %bb.8(100.00%)

  $r18 = LDIRdK 0
  RJMPk %bb.8

bb.10 (%ir-block.9):
; predecessors: %bb.1, %bb.4, %bb.8

  $r17 = POPRd implicit-def $sp, implicit $sp
  RET

Look at the block %bb.8, the one that is left dangling due to our bug after the pass runs. It has %bb.10 as a successor, meaning that %bb.8 should branch to %bb.10. %bb.10 also correctly has %bb.8 as a predecessor.

This seems invalid. %bb.8, at the start of the pass, falls through to %bb.9, which unconditionally branches back to %bb.8 in an infinite loop. %bb.9 does not have %bb.8 as a predecessor, nor does %bb.8 has %bb.9 as a successor. For these two blocks to be marked as predecessors and successors, at some point they must have actually branched to each other. It seems as if the predecessor and successor lists have become out of sync with the underlying instructions.

bb.9 (%ir-block.8):
; predecessors: %bb.7
  successors: %bb.8(0x80000000); %bb.8(100.00%)

  $r18 = LDIRdK 0
  RJMPk %bb.8

bb.10 (%ir-block.9):
; predecessors: %bb.1, %bb.4

  $r17 = POPRd implicit-def $sp, implicit $sp
  RET

bb.8 (%ir-block.8):
; predecessors: %bb.7, %bb.9
  liveins: $r18
  $r24 = COPY $r18
  $r24 = ANDIRdK killed $r24(tied-def 0), 1, implicit-def dead $sreg
  STSKRr @_Tv4main7enabledSb, killed $r24 :: (store 1 into `i1* getelementptr inbounds (%Sb, %Sb* @_Tv4main7enabledSb, i64 0, i32 0)`)
  $r25r24 = LDIWRdK 35
  $r22 = COPY killed $r18
  CALLk @_TF3AVR11writeEEPROMFT7addressVs6UInt165valueVs5UInt8_T_, <regmask $r2 $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $r12 $r13 $r14 $
r15 $r16 $r17 $r28 $r29 $r3r2 $r5r4 $r7r6 $r9r8 $r11r10 $r13r12 $r15r14 $r17r16 $r29r28>, implicit $sp, implicit $r25r24, implicit $r22, implicit-def $sp

# End machine code for function _TF4main9i2cUpdateFT8registerVs5UInt85valueS0__T_.

@dylanmckay
Copy link
Member

Earlier on in the pass pipeline, the call did branch to the terminating block via fallthrough. The predecessor and successor lists matched the underlying instructions and basic blocks.

bb.2 (%ir-block.8):
; predecessors: %bb.0
  successors: %bb.3(0x80000000); %bb.3(100.00%)

  %2:ld8 = LDIRdK 0
  %3:ld8 = LDIRdK 1
  CPIRdK %1:ld8, 0, implicit-def $sreg
  %4:ld8 = Select8 killed %3:ld8, killed %2:ld8, 1, implicit $sreg
  %5:ld8 = ANDIRdK %4:ld8(tied-def 0), 1, implicit-def dead $sreg
  STSKRr @_Tv4main7enabledSb, killed %5:ld8 :: (store 1 into `i1* getelementptr inbounds (%Sb, %Sb* @_Tv4main7enabledSb, i64 0, i32 0)
`)
  ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp                                                
  %6:dldregs = LDIWRdK 35
  $r25r24 = COPY %6:dldregs
  $r22 = COPY %4:ld8
  CALLk @_TF3AVR11writeEEPROMFT7addressVs6UInt165valueVs5UInt8_T_, <regmask $r2 $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $r12 $r13 $r14 $
r15 $r16 $r17 $r28 $r29 $r3r2 $r5r4 $r7r6 $r9r8 $r11r10 $r13r12 $r15r14 $r17r16 $r29r28>, implicit $sp, implicit $r25r24, implicit $r2
2, implicit-def $sp
  ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp                                                  

bb.3 (%ir-block.9):
; predecessors: %bb.4, %bb.2, %bb.1

  RET

@dylanmckay
Copy link
Member

IR dump before "Expand ISel Pseudo-instructions" pass.

bb.2 (%ir-block.8):
; predecessors: %bb.0
  successors: %bb.3(0x80000000); %bb.3(100.00%)

  %2:ld8 = LDIRdK 0
  %3:ld8 = LDIRdK 1
  CPIRdK %1:ld8, 0, implicit-def $sreg
  %4:ld8 = Select8 killed %3:ld8, killed %2:ld8, 1, implicit $sreg
  %5:ld8 = ANDIRdK %4:ld8(tied-def 0), 1, implicit-def dead $sreg
  STSKRr @_Tv4main7enabledSb, killed %5:ld8 :: (store 1 into `i1* getelementptr inbounds (%Sb, %Sb* @_Tv4main7enabledSb, i64 0, i32 0)
`)
  ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp
  %6:dldregs = LDIWRdK 35
  $r25r24 = COPY %6:dldregs
  $r22 = COPY %4:ld8
  CALLk @_TF3AVR11writeEEPROMFT7addressVs6UInt165valueVs5UInt8_T_, <regmask $r2 $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $r12 $r13 $r14 $
r15 $r16 $r17 $r28 $r29 $r3r2 $r5r4 $r7r6 $r9r8 $r11r10 $r13r12 $r15r14 $r17r16 $r29r28>, implicit $sp, implicit $r25r24, implicit $r2
2, implicit-def $sp
  ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp

bb.3 (%ir-block.9):
; predecessors: %bb.4, %bb.2, %bb.1

  RET

Looks legit, from the store to _Tv4main7enabledSb to the EEPROM write call then fallthrough to RET, correctly described by the predecessor and successor lists.

Then after the Expand ISel pseudos pass

bb.9 (%ir-block.8):
; predecessors: %bb.2, %bb.10
  successors: %bb.3(0x80000000); %bb.3(100.00%)

  %4:ld8 = PHI %3:ld8, %bb.2, %2:ld8, %bb.10
  %5:ld8 = ANDIRdK %4:ld8(tied-def 0), 1, implicit-def dead $sreg
  STSKRr @_Tv4main7enabledSb, killed %5:ld8 :: (store 1 into `i1* getelementptr inbounds (%Sb, %Sb* @_Tv4main7enabledSb, i64 0, i32 0)
`)
  ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp
  %6:dldregs = LDIWRdK 35
  $r25r24 = COPY %6:dldregs
  $r22 = COPY %4:ld8
  CALLk @_TF3AVR11writeEEPROMFT7addressVs6UInt165valueVs5UInt8_T_, <regmask $r2 $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $r12 $r13 $r14 $
r15 $r16 $r17 $r28 $r29 $r3r2 $r5r4 $r7r6 $r9r8 $r11r10 $r13r12 $r15r14 $r17r16 $r29r28>, implicit $sp, implicit $r25r24, implicit $r2
2, implicit-def $sp
  ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp

bb.10 (%ir-block.8):
; predecessors: %bb.2
  successors: %bb.9(0x80000000); %bb.9(100.00%)

  RJMPk %bb.9

bb.3 (%ir-block.9):
; predecessors: %bb.4, %bb.7, %bb.9

  RET

At this point, %bb.9 still has the terminating block %bb.3 in the successor list, except it doesn't fall through to the terminating block anymore, it unconditionally falls through to %bb.10 and then goes to infinite loop back to itself.

This bug is likely caused by the Expand ISel pseudo instructions pass, which calls into AVR-specific expansion logic, corrupting the predecessor and successor lists, making them no longer reflect the realized control flow.

@dylanmckay
Copy link
Member

Bug is probably in AVRTargetLowering::EmitInstrWithCustomInserter.

The branch folding pass is just operating on invalid assumptions.

@dylanmckay
Copy link
Member

The only three instructions expanded by this function are:

  • %8:gpr8 = Select8 killed %7:ld8, %1:ld8, 5, implicit $sreg
  • %10:gpr8 = Select8 killed %9:ld8, killed %8:gpr8, 4, implicit $sreg
  • %4:ld8 = Select8 killed %3:ld8, killed %2:ld8, 1, implicit $sreg

@dylanmckay
Copy link
Member

I believe the expansion issue is occurring in the third one, %4:ld8 = Select8 killed %3:ld8, killed %2:ld8, 1, implicit $sreg

@dylanmckay
Copy link
Member

Fix incoming.

dylanmckay added a commit to dylanmckay/llvm that referenced this issue Jan 20, 2019
…with fallthrough

This updates the AVR Select8/Select16 expansion code so that, when
inserting the two basic blocks for true and false conditions, any
existing fallthrough on the previous block is preserved.

Prior to this patch, if the block before the Select pseudo fell through
to the subsequent block, two new basic blocks would be inserted at the
prior fallthrough point, changing the fallthrough destination.

The predecessor or successor lists were not updated, causing the
BranchFolding pass at -O1 and above the rearrange basic blocks, causing
an infinite loop. Not to mention the unconditional fallthrough to the
true block is incorrect in of itself.

This patch modifies the Select8/16 expansion so that, if inserting true
and false basic blocks at a fallthrough point, the implicit branch is
preserved by means of an explicit, unconditional branch to the previous
fallthrough destination.

Thanks to Carl Peto for reporting this bug.

This fixes avr-rust bug avr-rust/rust-legacy-fork#123.
@dylanmckay
Copy link
Member

Here's a fix, try it out @carlos4242, let me know if it fixes the bug and I'll upstream it.

From 30136884e213e8641eb1da332de462399fa6e920 Mon Sep 17 00:00:00 2001
From: Dylan McKay <me@dylanmckay.io>
Date: Sun, 20 Jan 2019 23:20:48 +1300
Subject: [PATCH] [AVR] Insert unconditional branch when inserting MBBs between
 blocks with fallthrough

This updates the AVR Select8/Select16 expansion code so that, when
inserting the two basic blocks for true and false conditions, any
existing fallthrough on the previous block is preserved.

Prior to this patch, if the block before the Select pseudo fell through
to the subsequent block, two new basic blocks would be inserted at the
prior fallthrough point, changing the fallthrough destination.

The predecessor or successor lists were not updated, causing the
BranchFolding pass at -O1 and above the rearrange basic blocks, causing
an infinite loop. Not to mention the unconditional fallthrough to the
true block is incorrect in of itself.

This patch modifies the Select8/16 expansion so that, if inserting true
and false basic blocks at a fallthrough point, the implicit branch is
preserved by means of an explicit, unconditional branch to the previous
fallthrough destination.

Thanks to Carl Peto for reporting this bug.

This fixes avr-rust bug https://github.com/avr-rust/rust/issues/123.
---
 lib/Target/AVR/AVRISelLowering.cpp     |  9 ++++
 test/CodeGen/AVR/avr-rust-issue-123.ll | 65 ++++++++++++++++++++++++++
 2 files changed, 74 insertions(+)
 create mode 100644 test/CodeGen/AVR/avr-rust-issue-123.ll

diff --git a/lib/Target/AVR/AVRISelLowering.cpp b/lib/Target/AVR/AVRISelLowering.cpp
index 3116eb2a5032..e95a41472bff 100644
--- a/lib/Target/AVR/AVRISelLowering.cpp
+++ b/lib/Target/AVR/AVRISelLowering.cpp
@@ -1634,6 +1634,15 @@ AVRTargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
 
   MachineFunction *MF = MBB->getParent();
   const BasicBlock *LLVM_BB = MBB->getBasicBlock();
+  MachineBasicBlock *FallThrough = MBB->getFallThrough();
+
+  // If the current basic block falls through to another basic block,
+  // we must insert an unconditional branch to the fallthrough destination
+  // if we are to insert basic blocks at the prior fallthrough point.
+  if (FallThrough != nullptr) {
+    BuildMI(MBB, dl, TII.get(AVR::RJMPk)).addMBB(FallThrough);
+  }
+
   MachineBasicBlock *trueMBB = MF->CreateMachineBasicBlock(LLVM_BB);
   MachineBasicBlock *falseMBB = MF->CreateMachineBasicBlock(LLVM_BB);
 
diff --git a/test/CodeGen/AVR/avr-rust-issue-123.ll b/test/CodeGen/AVR/avr-rust-issue-123.ll
new file mode 100644
index 000000000000..19e3f9a13fb3
--- /dev/null
+++ b/test/CodeGen/AVR/avr-rust-issue-123.ll
@@ -0,0 +1,65 @@
+; RUN: llc -O1 < %s -march=avr | FileCheck %s
+
+; This test ensures that the Select8/Select16 expansion
+; pass inserts an unconditional branch to the previous adjacent
+; basic block when inserting new basic blocks when the
+; prior block has a fallthrough.
+;
+; Before this bug was fixed, Select8/Select16 expansion
+; would leave a dangling fallthrough to an undefined block.
+;
+; The BranchFolding pass would later rearrange the basic
+; blocks based on predecessor/successor list assumptions
+; which were made incorrect due to the invalid Select
+; expansion.
+
+; More information in
+; https://github.com/avr-rust/rust/issues/123.
+
+%UInt8 = type <{ i8 }>
+%UInt32 = type <{ i32 }>
+%Sb = type <{ i1 }>
+
+@delayFactor = hidden global %UInt8 zeroinitializer, align 1
+@delay = hidden global %UInt32 zeroinitializer, align 4
+@flag = hidden global %Sb zeroinitializer, align 1
+
+declare void @eeprom_write(i16, i8)
+
+define hidden void @update_register(i8 %arg, i8 %arg1) {
+entry:
+  switch i8 %arg, label %bb7 [
+    i8 6, label %bb
+    i8 7, label %bb6
+  ]
+
+bb:                                               ; preds = %entry
+  %tmp = icmp ugt i8 %arg1, 90
+  %tmp2 = icmp ult i8 %arg1, 5
+  %. = select i1 %tmp2, i8 5, i8 %arg1
+  %tmp3 = select i1 %tmp, i8 90, i8 %.
+  store i8 %tmp3, i8* getelementptr inbounds (%UInt8, %UInt8* @delayFactor, i64 0, i32 0), align 1
+  %tmp4 = zext i8 %tmp3 to i32
+  %tmp5 = mul nuw nsw i32 %tmp4, 100
+  store i32 %tmp5, i32* getelementptr inbounds (%UInt32, %UInt32* @delay, i64 0, i32 0), align 4
+  tail call void @eeprom_write(i16 34, i8 %tmp3)
+  br label %bb7
+
+bb6:                                              ; preds = %entry
+  %not. = icmp ne i8 %arg1, 0
+  %.2 = zext i1 %not. to i8
+  store i1 %not., i1* getelementptr inbounds (%Sb, %Sb* @flag, i64 0, i32 0), align 1
+
+  ; CHECK: LBB0_{{[0-9]+}}:
+  ; CHECK: call eeprom_write
+  ; CHECK-NEXT: LBB0_{{[0-9]+}}
+  ; CHECK-NEXT: pop r{{[0-9]+}}
+  ; CHECK-NEXT: ret
+
+  tail call void @eeprom_write(i16 35, i8 %.2)
+  br label %bb7
+
+bb7:                                              ; preds = %bb6, %bb, %entry
+  ret void
+}
+

@dylanmckay dylanmckay added bug A-llvm Affects the LLVM AVR backend has-llvm-commit This issue should be fixed in upstream LLVM has-reduced-testcase A small LLVM IR file exists that demonstrates the problem labels Jan 20, 2019
@dylanmckay dylanmckay changed the title optimised code order - regression? Select8/Select16 expansion does not preserve existing basic block fallthrough Jan 20, 2019
@carlos4242
Copy link
Author

This looks good to me. I've had a think through how the code works, and it looks correct. Plus I tested it on my original sample and it produces exactly the right code. Cool. :)

Tested and approved as far as I'm concerned.

@carlos4242
Copy link
Author

oh, one caveat... the unit test isn't strong enough, I just tested it on the old compiler, pre-patch and it passed... I'll beef it up and send you a patch on just the unit test

bogner pushed a commit to bogner/llvm-zipper-prototype that referenced this issue Jan 21, 2019
…with fallthrough

This updates the AVR Select8/Select16 expansion code so that, when
inserting the two basic blocks for true and false conditions, any
existing fallthrough on the previous block is preserved.

Prior to this patch, if the block before the Select pseudo fell through
to the subsequent block, two new basic blocks would be inserted at the
prior fallthrough point, changing the fallthrough destination.

The predecessor or successor lists were not updated, causing the
BranchFolding pass at -O1 and above the rearrange basic blocks, causing
an infinite loop. Not to mention the unconditional fallthrough to the
true block is incorrect in of itself.

This patch modifies the Select8/16 expansion so that, if inserting true
and false basic blocks at a fallthrough point, the implicit branch is
preserved by means of an explicit, unconditional branch to the previous
fallthrough destination.

Thanks to Carl Peto for reporting this bug.

This fixes avr-rust bug avr-rust/rust-legacy-fork#123.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351718 91177308-0d34-0410-b5e6-96231b3b80d8
llvm-git-migration pushed a commit to llvm/llvm-project that referenced this issue Jan 21, 2019
…with fallthrough

This updates the AVR Select8/Select16 expansion code so that, when
inserting the two basic blocks for true and false conditions, any
existing fallthrough on the previous block is preserved.

Prior to this patch, if the block before the Select pseudo fell through
to the subsequent block, two new basic blocks would be inserted at the
prior fallthrough point, changing the fallthrough destination.

The predecessor or successor lists were not updated, causing the
BranchFolding pass at -O1 and above the rearrange basic blocks, causing
an infinite loop. Not to mention the unconditional fallthrough to the
true block is incorrect in of itself.

This patch modifies the Select8/16 expansion so that, if inserting true
and false basic blocks at a fallthrough point, the implicit branch is
preserved by means of an explicit, unconditional branch to the previous
fallthrough destination.

Thanks to Carl Peto for reporting this bug.

This fixes avr-rust bug avr-rust/rust-legacy-fork#123.

llvm-svn: 351718
@dylanmckay
Copy link
Member

dylanmckay commented Jan 21, 2019

Good catch, thanks!

I committed and then immediately reverted it. I will recommit once the test is improved.

earl pushed a commit to earl/llvm-mirror that referenced this issue Jan 21, 2019
…with fallthrough

This updates the AVR Select8/Select16 expansion code so that, when
inserting the two basic blocks for true and false conditions, any
existing fallthrough on the previous block is preserved.

Prior to this patch, if the block before the Select pseudo fell through
to the subsequent block, two new basic blocks would be inserted at the
prior fallthrough point, changing the fallthrough destination.

The predecessor or successor lists were not updated, causing the
BranchFolding pass at -O1 and above the rearrange basic blocks, causing
an infinite loop. Not to mention the unconditional fallthrough to the
true block is incorrect in of itself.

This patch modifies the Select8/16 expansion so that, if inserting true
and false basic blocks at a fallthrough point, the implicit branch is
preserved by means of an explicit, unconditional branch to the previous
fallthrough destination.

Thanks to Carl Peto for reporting this bug.

This fixes avr-rust bug avr-rust/rust-legacy-fork#123.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351718 91177308-0d34-0410-b5e6-96231b3b80d8
llvm-git-migration pushed a commit to llvm/llvm-project that referenced this issue Jan 21, 2019
…with fallthrough

This updates the AVR Select8/Select16 expansion code so that, when
inserting the two basic blocks for true and false conditions, any
existing fallthrough on the previous block is preserved.

Prior to this patch, if the block before the Select pseudo fell through
to the subsequent block, two new basic blocks would be inserted at the
prior fallthrough point, changing the fallthrough destination.

The predecessor or successor lists were not updated, causing the
BranchFolding pass at -O1 and above the rearrange basic blocks, causing
an infinite loop. Not to mention the unconditional fallthrough to the
true block is incorrect in of itself.

This patch modifies the Select8/16 expansion so that, if inserting true
and false basic blocks at a fallthrough point, the implicit branch is
preserved by means of an explicit, unconditional branch to the previous
fallthrough destination.

Thanks to Carl Peto for reporting this bug.

This fixes avr-rust bug avr-rust/rust-legacy-fork#123.

llvm-svn: 351721
bogner pushed a commit to bogner/llvm-zipper-prototype that referenced this issue Jan 21, 2019
…with fallthrough

This updates the AVR Select8/Select16 expansion code so that, when
inserting the two basic blocks for true and false conditions, any
existing fallthrough on the previous block is preserved.

Prior to this patch, if the block before the Select pseudo fell through
to the subsequent block, two new basic blocks would be inserted at the
prior fallthrough point, changing the fallthrough destination.

The predecessor or successor lists were not updated, causing the
BranchFolding pass at -O1 and above the rearrange basic blocks, causing
an infinite loop. Not to mention the unconditional fallthrough to the
true block is incorrect in of itself.

This patch modifies the Select8/16 expansion so that, if inserting true
and false basic blocks at a fallthrough point, the implicit branch is
preserved by means of an explicit, unconditional branch to the previous
fallthrough destination.

Thanks to Carl Peto for reporting this bug.

This fixes avr-rust bug avr-rust/rust-legacy-fork#123.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351721 91177308-0d34-0410-b5e6-96231b3b80d8
@dylanmckay
Copy link
Member

I've taken the updated testcase from here and committed the patch upstream in r351721 (llvm/llvm-project@5c23410).

Cherry-picked the upstream patch in cea79f7.

earl pushed a commit to earl/llvm-mirror that referenced this issue Jan 21, 2019
…with fallthrough

This updates the AVR Select8/Select16 expansion code so that, when
inserting the two basic blocks for true and false conditions, any
existing fallthrough on the previous block is preserved.

Prior to this patch, if the block before the Select pseudo fell through
to the subsequent block, two new basic blocks would be inserted at the
prior fallthrough point, changing the fallthrough destination.

The predecessor or successor lists were not updated, causing the
BranchFolding pass at -O1 and above the rearrange basic blocks, causing
an infinite loop. Not to mention the unconditional fallthrough to the
true block is incorrect in of itself.

This patch modifies the Select8/16 expansion so that, if inserting true
and false basic blocks at a fallthrough point, the implicit branch is
preserved by means of an explicit, unconditional branch to the previous
fallthrough destination.

Thanks to Carl Peto for reporting this bug.

This fixes avr-rust bug avr-rust/rust-legacy-fork#123.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351721 91177308-0d34-0410-b5e6-96231b3b80d8
dylanmckay pushed a commit to dylanmckay/llvm that referenced this issue Jan 21, 2019
…with fallthrough

This updates the AVR Select8/Select16 expansion code so that, when
inserting the two basic blocks for true and false conditions, any
existing fallthrough on the previous block is preserved.

Prior to this patch, if the block before the Select pseudo fell through
to the subsequent block, two new basic blocks would be inserted at the
prior fallthrough point, changing the fallthrough destination.

The predecessor or successor lists were not updated, causing the
BranchFolding pass at -O1 and above the rearrange basic blocks, causing
an infinite loop. Not to mention the unconditional fallthrough to the
true block is incorrect in of itself.

This patch modifies the Select8/16 expansion so that, if inserting true
and false basic blocks at a fallthrough point, the implicit branch is
preserved by means of an explicit, unconditional branch to the previous
fallthrough destination.

Thanks to Carl Peto for reporting this bug.

This fixes avr-rust bug avr-rust/rust-legacy-fork#123.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351718 91177308-0d34-0410-b5e6-96231b3b80d8
dylanmckay pushed a commit to dylanmckay/llvm that referenced this issue Jan 21, 2019
…with fallthrough

This updates the AVR Select8/Select16 expansion code so that, when
inserting the two basic blocks for true and false conditions, any
existing fallthrough on the previous block is preserved.

Prior to this patch, if the block before the Select pseudo fell through
to the subsequent block, two new basic blocks would be inserted at the
prior fallthrough point, changing the fallthrough destination.

The predecessor or successor lists were not updated, causing the
BranchFolding pass at -O1 and above the rearrange basic blocks, causing
an infinite loop. Not to mention the unconditional fallthrough to the
true block is incorrect in of itself.

This patch modifies the Select8/16 expansion so that, if inserting true
and false basic blocks at a fallthrough point, the implicit branch is
preserved by means of an explicit, unconditional branch to the previous
fallthrough destination.

Thanks to Carl Peto for reporting this bug.

This fixes avr-rust bug avr-rust/rust-legacy-fork#123.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351721 91177308-0d34-0410-b5e6-96231b3b80d8
@carlos4242
Copy link
Author

Awesome. :)

dylanmckay pushed a commit to avr-rust/llvm that referenced this issue Jan 26, 2019
…with fallthrough

This updates the AVR Select8/Select16 expansion code so that, when
inserting the two basic blocks for true and false conditions, any
existing fallthrough on the previous block is preserved.

Prior to this patch, if the block before the Select pseudo fell through
to the subsequent block, two new basic blocks would be inserted at the
prior fallthrough point, changing the fallthrough destination.

The predecessor or successor lists were not updated, causing the
BranchFolding pass at -O1 and above the rearrange basic blocks, causing
an infinite loop. Not to mention the unconditional fallthrough to the
true block is incorrect in of itself.

This patch modifies the Select8/16 expansion so that, if inserting true
and false basic blocks at a fallthrough point, the implicit branch is
preserved by means of an explicit, unconditional branch to the previous
fallthrough destination.

Thanks to Carl Peto for reporting this bug.

This fixes avr-rust bug avr-rust/rust-legacy-fork#123.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351721 91177308-0d34-0410-b5e6-96231b3b80d8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-llvm Affects the LLVM AVR backend bug has-llvm-commit This issue should be fixed in upstream LLVM has-reduced-testcase A small LLVM IR file exists that demonstrates the problem
Projects
None yet
Development

No branches or pull requests

2 participants