Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remat applied to updated value #3006

Closed
llvmbot opened this issue Aug 5, 2008 · 5 comments
Closed

Remat applied to updated value #3006

llvmbot opened this issue Aug 5, 2008 · 5 comments
Labels
bugzilla Issues migrated from bugzilla llvm:regalloc

Comments

@llvmbot
Copy link
Member

llvmbot commented Aug 5, 2008

Bugzilla Link 2634
Resolution FIXED
Resolved on Oct 27, 2008 12:39
Version 2.3
OS Linux
Reporter LLVM Bugzilla Contributor
CC @asl,@sunfishcode

Extended Description

In SPEC CPU 2006 leslie3d, a value is spilled, split and one of the splits is spilled again. The value happens to be rematable and when the split interval is spilled two-address rewriting causes the enregistered value to be changed. But live interval analysis has outdated information and thinks it can insert a remat of the old value just after the new value is written to the register.

@llvmbot
Copy link
Member Author

llvmbot commented Aug 5, 2008

Sent to llvmdev:

Ok, I've been tracking down a remat bug for over a week. I think I finally
understand what is going on. This happens in specfp2006 leslie3d
compiled with our frontend and optimizer. Unfortunately, getting a testcase
is going to be impossible due to restrictions on SPEC redistribution and
the complexity of the sequence of events leading to the bug.

The bug is extremely subtle and nasty. Therefore this explanation is very
long and complicated. I'm looking for a check on my analysis and suggestions
of how best to fix the problem.

This is with 2.3 plus a few extra patches (from upstream) that fix bugs
in the regalloc/spill area. I believe the bug exists in TOT, however.
There's nothing I see in the logs that indicates anything has been changed
to fix this problem.

My comments delimited with #####. The rest is llvm debug output (lots
of stuff added by me so you won't get the same output with upstream
llvm -debug).

There's a sequence of instructions that looks like this before linear scan:

2800 %reg1591 = SUB64rr %reg1591, %reg1589, %EFLAGS<imp-def,dead> ;
srcLine 0
2808 %reg1591 = IMUL64rr %reg1591, %reg1055, %EFLAGS<imp-def,dead> ;
srcLine 0
2816 %reg1591 = ADD64rr %reg1591, %reg1579, %EFLAGS<imp-def,dead> ;
srcLine 0
2820 %reg1198 = LEA64r %reg1591, 1, %reg1574, 0 ; srcLine 0

%reg1591 gets spilled by linear scan with all of the uses in the sequence
above reused by the "spltting" code in LiveIntervals. Note especially that
%reg1591 is (correctly) identified at rematable (for the SUB64rr):

                    spilling(a): %reg1591,3.3557 = [2178,2802:4)[2802,2810:3)[2810,2818:2)

[2818,2966:1)[2966,7840:0) 0@2966 1@2818-(2966) 2@2810-(2818) 3@2802-(2810)
4@2178-(2802)
Set %reg2559 rematerialized with def: %reg1591 = MOV64ri64i32 32 ;
srcLine 0
Assigned remat id 262151 to %reg2559

+[2180,2182:0) Added new interval: %reg2559,0 = [2180,2182:0) 0@2180
+[2182,2526:0) Added interval split reuse: %reg2559,1000 = [2180,2526:0)
0@2180
+[2526,2802:0) +[2802,2803:1) Added interval split reuse: %reg2559,2000 =
[2180,2802:0)[2802,2803:1) 0@2180 1@2802
+[2803,2810:1) +[2810,2811:2) Added interval split reuse: %reg2559,4000 =
[2180,2802:0)[2802,2810:1)[2810,2811:2) 0@2180 1@2802 2@2810
+[2811,2818:2) +[2818,2819:3) Added interval split reuse: %reg2559,6000 =
[2180,2802:0)[2802,2810:1)[2810,2818:2)[2818,2819:3) 0@2180 1@2802 2@2810
3@2818
+[2819,2822:3) Added interval split reuse: %reg2559,800 = [2180,2802:0)
[2802,2810:1)[2810,2818:2)[2818,2822:3) 0@2180 1@2802 2@2810 3@2818
+[2822,2966:3) +[2966,2967:4) Added interval split reuse: %reg2559,1800 =
[2180,2802:0)[2802,2810:1)[2810,2818:2)[2818,2966:3)[2966,2967:4) 0@2180
1@2802 2@2810 3@2818 4@2966
+[2967,3226:4) Added interval split reuse: %reg2559,3800 = [2180,2802:0)
[2802,2810:1)[2810,2818:2)[2818,2966:3)[2966,3226:4) 0@2180 1@2802 2@2810
3@2818 4@2966

It turns out that %reg1559 is ALSO spilled (remember it was reused so is not a
short interval). This time, there is no splitting reuse:

                    spilling(a): %reg2559,0.458891 = [2180,2802:0)[2802,2810:1)[2810,2818:2)

[2818,2966:3)[2966,3226:4) 0@2180 1@2802 2@2810 3@2818 4@2966
adding intervals for spills for interval: %reg2559,0.458891 =
[2180,2802:0)[2802,2810:1)[2810,2818:2)[2818,2966:3)[2966,3226:4) 0@2180
1@2802 2@2810 3@2818 4@2966
Set %reg2561 rematerialized with def: %reg1591 = MOV64ri64i32 32 ;
srcLine 0
Assigned remat id 262152 to %reg2561
+[2180,2182:0) Added new interval: %reg2561,0 = [2180,2182:0) 0@2180
Set %reg2562 rematerialized with def: %reg1591 = MOV64ri64i32 32 ;
srcLine 0
Assigned remat id 262152 to %reg2562
+[2524,2526:0) Added new interval: %reg2562,0 = [2524,2526:0) 0@2524
Set %reg2563 rematerialized with def: %reg1591 = MOV64ri64i32 32 ;
srcLine 0
Assigned remat id 262152 to %reg2563
+[2800,2802:0) +[2802,2803:1) Added new interval: %reg2563,0 =
[2800,2802:0)[2802,2803:1) 0@2800 1@2802
+[2808,2810:0) +[2810,2811:1) Added new interval: %reg2564,0 =
[2808,2810:0)[2810,2811:1) 0@2808 1@2810
Mapped %reg2559 and folded instruction: %reg2559 = ADD64rr %reg2559,
%reg1579, %EFLAGS<imp-def,dead> ; srcLine 0

into: ADD64mr <fi#165>, 1, %reg0, 0, %reg1579, %mreg23<imp-def,dead> ;
srcLine 0

Virt folded mapped NewMI 0x9405f70: ADD64mr <fi#165>, 1, %reg0, 0, %reg1579,
%mreg23<imp-def,dead> ; srcLine 0
to %reg2559

+[2820,2822:0) Added new interval: %reg2565,0 = [2820,2822:0) 0@2820
Mapped %reg2559 and folded instruction: %reg2559 = ADD64rr %reg2559,
%reg1599, %EFLAGS<imp-def,dead> ; srcLine 0

into: ADD64mr <fi#165>, 1, %reg0, 0, %reg1599, %mreg23<imp-def,dead> ;
srcLine 0

Virt folded mapped NewMI 0x98af530: ADD64mr <fi#165>, 1, %reg0, 0,
%reg1599, %mreg23<imp-def,dead> ; srcLine 0
to %reg2559
Mapped %reg2559 and folded instruction: %reg1191 = MOV64rr %reg2559 ;
srcLine 0

into: %reg1191 = MOV64rm <fi#165>, 1, %reg0, 0 ; srcLine 0

Virt folded mapped NewMI 0x92459e0: %reg1191 = MOV64rm <fi#165>, 1,
%reg0, 0 ; srcLine 0
to %reg2559

The remats identified above are all correct. Note that NO remats are defined
for %reg2564 or %reg2565. Note the fold as well. This becomes important
later.

%reg1579 (the other operand of the ADD64mr) is also spilled with a bunch of
split reuses:

                    spilling(a): %reg1579,5.85252 = [2714,2718:5)[2718,2726:4)[2726,2766:3)

[2766,2926:2)[2926,2934:1)[2934,7840:0) 0@2934 1@2926-(2934) 2@2766-(2926)
3@2726-(2766) 4@2718-(2726) 5@2714-(2718)
adding intervals for spills for interval: %reg1579,5.85252 = [2714,2718:5)
[2718,2726:4)[2726,2766:3)[2766,2926:2)[2926,2934:1)[2934,7840:0) 0@2934
1@2926-(2934) 2@2766-(2926) 3@2726-(2766) 4@2718-(2726) 5@2714-(2718)
assigned stack slot 184 [offset -1]
+[2714,2715:0) Added new interval: %reg2613,0 = [2714,2715:0) 0@2714
+[2715,2718:0) +[2718,2719:1) Added interval split reuse: %reg2613,1000 =
[2714,2718:0)[2718,2719:1) 0@2714 1@2718
+[2719,2726:1) +[2726,2727:2) Added interval split reuse: %reg2613,300 =
[2714,2718:0)[2718,2726:1)[2726,2727:2) 0@2714 1@2718 2@2726
+[2727,2730:2) Added interval split reuse: %reg2613,2300 = [2714,2718:0)
[2718,2726:1)[2726,2730:2) 0@2714 1@2718 2@2726
+[2730,2766:2) +[2766,2767:3) Added interval split reuse: %reg2613,3300 =
[2714,2718:0)[2718,2726:1)[2726,2766:2)[2766,2767:3) 0@2714 1@2718 2@2726
3@2766
+[2767,2818:3) Added interval split reuse: %reg2613,5300 = [2714,2718:0)
[2718,2726:1)[2726,2766:2)[2766,2818:3) 0@2714 1@2718 2@2726 3@2766
+[2818,2842:3) Added interval split reuse: %reg2613,6300 = [2714,2718:0)
[2718,2726:1)[2726,2766:2)[2766,2842:3) 0@2714 1@2718 2@2726 3@2766
+[2842,2878:3) Added interval split reuse: %reg2613,7300 = [2714,2718:0)
[2718,2726:1)[2726,2766:2)[2766,2878:3) 0@2714 1@2718 2@2726 3@2766
+[2878,2926:3) +[2926,2927:4) Added interval split reuse: %reg2613,8300 =
[2714,2718:0)[2718,2726:1)[2726,2766:2)[2766,2926:3)[2926,2927:4) 0@2714
1@2718 2@2726 3@2766 4@2926
+[2927,2934:4) +[2934,2935:5) Added interval split reuse: %reg2613,10300 =
[2714,2718:0)[2718,2726:1)[2726,2766:2)[2766,2926:3)[2926,2934:4)
[2934,2935:5) 0@2714 1@2718 2@2726 3@2766 4@2926 5@2934
+[6232,6234:0) Added new interval: %reg2614,0 = [6232,6234:0) 0@6232
+[7260,7262:0) Added new interval: %reg2615,0 = [7260,7262:0) 0@7260

%reg2613 (split reuses of the spilled %reg1579) is ALSO spilled:

                    spilling(a): %reg2613,5.56561 = [2714,2718:0)[2718,2726:1)[2726,2766:2)

[2766,2926:3)[2926,2934:4)[2934,2935:5) 0@2714 1@2718 2@2726 3@2766 4@2926
5@2934
adding intervals for spills for interval: %reg2613,5.56561 = [2714,2718:0)
[2718,2726:1)[2726,2766:2)[2766,2926:3)[2926,2934:4)[2934,2935:5) 0@2714
1@2718 2@2726 3@2766 4@2926 5@2934
Mapped %reg2613 and folded instruction: %reg2613 = MOV64rr %reg1607 ;
srcLine 0

into: MOV64mr <fi#184>, 1, %reg0, 0, %reg1607 ; srcLine 0

Virt folded mapped NewMI 0x9883480: MOV64mr <fi#184>, 1, %reg0, 0, %reg1607 ;
srcLine 0
to %reg2613
Mapped %reg2613 and folded instruction: %reg2613 = SUB64rr %reg2613,
%reg1577, %EFLAGS<imp-def,dead> ; srcLine 0

into: SUB64mr <fi#184>, 1, %reg0, 0, %reg1577, %mreg23<imp-def,dead> ;
srcLine 0

Virt folded mapped NewMI 0x93f7c90: SUB64mr <fi#184>, 1, %reg0, 0,
%reg1577, %mreg23<imp-def,dead> ; srcLine 0
to %reg2613
+[2724,2726:0) +[2726,2727:1) Added new interval: %reg2616,0 =
[2724,2726:0)[2726,2727:1) 0@2724 1@2726
+[2728,2730:0) Added new interval: %reg2617,0 = [2728,2730:0) 0@2728
Mapped %reg2613 and folded instruction: %reg2613 = ADD64rr %reg2613,
%reg1584, %EFLAGS<imp-def,dead> ; srcLine 0

into: ADD64mr <fi#184>, 1, %reg0, 0, %reg1584, %mreg23<imp-def,dead> ;
srcLine 0

Virt folded mapped NewMI 0x96dded0: ADD64mr <fi#184>, 1, %reg0, 0, %reg1584,
%mreg23<imp-def,dead> ; srcLine 0
to %reg2613
+[2816,2818:0) Added new interval: %reg2618,0 = [2816,2818:0) 0@2816
+[2840,2842:0) Added new interval: %reg2619,0 = [2840,2842:0) 0@2840
+[2876,2878:0) Added new interval: %reg2620,0 = [2876,2878:0) 0@2876
+[2924,2926:0) +[2926,2927:1) Added new interval: %reg2621,0 =
[2924,2926:0)[2926,2927:1) 0@2924 1@2926
Mapped %reg2613 and folded instruction: %reg2613 = ADD64rr %reg2613,
%reg1599, %EFLAGS<imp-def,dead> ; srcLine 0

into: ADD64mr <fi#184>, 1, %reg0, 0, %reg1599, %mreg23<imp-def,dead> ;
srcLine 0

Virt folded mapped NewMI 0x92450c0: ADD64mr <fi#184>, 1, %reg0, 0, %reg1599,
%mreg23<imp-def,dead> ; srcLine 0
to %reg2613

%reg1618 is what we care about here.

The final register map looks like this:

[%reg2561 -> R12]
[%reg2562 -> R14]
[%reg2563 -> R12]
[%reg2564 -> R12]
[%reg2565 -> R12]
[%reg2618 -> R12]

[%reg2561 -> fi#165]
[%reg2562 -> fi#165]
[%reg2563 -> fi#165]
[%reg2564 -> fi#165]
[%reg2565 -> fi#165]
[%reg2618 -> fi#184]

Now LocalSpiller comes along to do its work. The code before the spiller runs
looks like this (instructions in [] are added by me to show what a naive
spiller would do, just to show that stack slots line up correctly):

[%reg2563 = remat $32]
2800 %reg2563 = SUB64rr %reg2563, %reg1589, %EFLAGS<imp-def,dead> ;
srcLine 0
[fi#165 = %reg2563]
[%reg2564 = fi#165]
2808 %reg2564 = IMUL64rm %reg2564, <fi#113>, 1, %reg0, 0,
%EFLAGS<imp-def,dead> ; srcLine 0
[fi#165 = %reg2564]
[%reg2618 = fi#184]
2816 ADD64mr <fi#165>, 1, %reg0, 0, %reg2618, %EFLAGS<imp-def,dead> ; srcLine
0
[%reg2565 = fi#165]
2820 %reg2626 = LEA64r %reg2565, 1, %reg1574, 0 ; srcLine 0

Now for the fun part. The spiller processes each instruction in our sequence
in order. First the SUB64:

Processing instruction 0x98662d0: %reg2563 = SUB64rr %reg2563, %reg1589,
%EFLAGS<imp-def,dead> ; srcLine 0

Processing operand %reg2563
Processing operand %reg2563
Processing operand %reg1589
Processing operand %EFLAGS<imp-def,dead>
DoReMat = 1, SSorRMId = 262152
PhysReg R12 clobbered, invalidating SS#170
Remembering RM#8 in physreg R12
%R12 = MOV64ri64i32 32 ; srcLine 0
%reg2563 = SUB64rr %R12, %RBX, %EFLAGS<imp-def,dead> ; srcLine 0
Processing def operand %reg2563
StackSlot for def = 165
Store: MOV64mr <fi#165>, 1, %reg0, 0, %R12 ; srcLine 0
PhysReg R12 clobbered, invalidating RM#8
Remembering SS#165 in physreg R12

Good, it got the remat for %reg2563. It goes on to process the spill of
%reg2563 after the SUB64:

Processing instruction 0x99fdf10: MOV64mr <fi#165>, 1, %reg0, 0, %R12 ;
srcLine 0

Processing operand <fi#165>
Processing operand 1
Processing operand %reg0
Processing operand 0
Processing operand %R12
MOV64mr <fi#165>, 1, %reg0, 0, %R12 ; srcLine 0

Ok, pretty straightforward. Let's ho on to the IMUL64:

Processing instruction 0x947cf40: %reg2564 = IMUL64rm %reg2564, <fi#113>,
1, %reg0, 0, %EFLAGS<imp-def,dead> ; srcLine 0

Processing operand %reg2564
Processing operand %reg2564
Processing operand <fi#113>
Processing operand 1
Processing operand %reg0
Processing operand 0
Processing operand %EFLAGS<imp-def,dead>
DoReMat = 0, SSorRMId = 165
Got PhysReg = 51
Reusing SS#165 from physreg R12 for vreg2564 instead of reloading into physreg
R12
%reg2564 = IMUL64rm %R12, <fi#113>, 1, %reg0, 0,
%EFLAGS<imp-def,dead> ; srcLine 0
Folded vreg: 2395 MR: 1 - StackSlot: 113
Processing def operand %reg2564
StackSlot for def = 165
Store: MOV64mr <fi#165>, 1, %reg0, 0, %R12 ; srcLine 0
Removed dead store: MOV64mr <fi#165>, 1, %reg0, 0, %R12 ; srcLine 0
Remembering SS#165 in physreg R12

Again, this is good. There is NO remat for %reg2564. Now process the
spill of %reg2564 after the IMUL64:

Processing instruction 0x99fe0c0: MOV64mr <fi#165>, 1, %reg0, 0, %R12 ;
srcLine 0

Processing operand <fi#165>
Processing operand 1
Processing operand %reg0
Processing operand 0
Processing operand %R12
MOV64mr <fi#165>, 1, %reg0, 0, %R12 ; srcLine 0

Again, piece of cake. Now the ADD64:

Processing instruction 0x9405f70: ADD64mr <fi#165>, 1, %reg0, 0, %reg2618,
%EFLAGS<imp-def,dead> ; srcLine 0

Unfolded instruction: ADD64mr <fi#165>, 1, %reg0, 0, %reg2618,
%EFLAGS<imp-def,dead> ; srcLine 0

To: %reg2559 = ADD64rr %reg2559, %reg2618, %mreg23<imp-def,dead> ;
srcLine 0

Virt folded mapped MI 0x99fdf90: %reg2559 = ADD64rm %reg2559, <fi#184>,
1, %reg0, 0, %mreg23<imp-def,dead> ; srcLine 0
to %reg2618
Mapped %reg2618 to folded instructiion: %reg2559 = ADD64rm %reg2559,
<fi#184>, 1, %reg0, 0, %mreg23<imp-def,dead> ; srcLine 0

Final folded: %reg2559 = ADD64rm %reg2559, <fi#184>, 1, %reg0, 0,
%EFLAGS<imp-def,dead> ; srcLine 0

Processing operand %reg2559
Processing operand %reg2559
Processing operand <fi#184>
Processing operand 1
Processing operand %reg0
Processing operand 0
Processing operand %EFLAGS<imp-def,dead>
DoReMat = 1, SSorRMId = 262151
PhysReg R12 clobbered, invalidating SS#165
Remembering RM#7 in physreg R12
%R12 = MOV64ri64i32 32 ; srcLine 0
%reg2559 = ADD64rm %R12, <fi#184>, 1, %reg0, 0, %EFLAGS<imp-def,dead> ;
srcLine 0
Folded vreg: 2618 MR: 1 - StackSlot: 184

UH OH! When LocalSpiller unfolded the ADD64 it saw %reg2559. It then looked
up %reg2559 in the VirtRegMat remat table and found $32. So it inserted a
remat of $32 into R12 just before the ADD64.

This is wrong because this value of %reg2559 is different from the value
of %reg2559 before the SUB64. Remember, this came from the spill of %reg1591
where there were a bunch of split reuses which where then spilled into
%reg2563-%reg2565.

I believe one (not the only one!) problem is that VirtRegMap's remat map is
not updated when %reg2559 is spilled. %reg2563 gets a remat entry (as it
should) but there is nothing that removes %reg2559 from the remat map when
it is spilled.

Probably this is a very rare thing because it's rarer to spill registers that
were results of earlier spills (it only happens if they are split-reused) and
it's even rarer that the original register just happens to be rematter and
EVEN RARER that the instruction using the result of the second spill
happens to get folded by LiveIntervals and THEN unfolded by LocalSpiller.

I believe that there's also a bug in unfolding. Rather than returning
%reg2559 as the target reg when the ADD64mr is unfolded it should
return %reg2565. Since there's no remat entry for %reg2565 we wouldn't
insert the bogus remat.

Either one of these fixes would make this particular bug go away. My feeling
is that we need to fix BOTH.

Does this all sound plausible? Are the proposed fixes the correct way to go?
Any suggestions on how to implement them or other changes?

                                                 -Dave

@llvmbot
Copy link
Member Author

llvmbot commented Aug 5, 2008

I have a fix for this. I am packaging it up and will apply ASAP.

@lattner
Copy link
Collaborator

lattner commented Oct 16, 2008

David, ping?

@llvmbot
Copy link
Member Author

llvmbot commented Oct 23, 2008

I've ported the fix to trunk and am testing now.

@llvmbot
Copy link
Member Author

llvmbot commented Oct 27, 2008

Fixed with rev. 58255.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021
troelsbjerre pushed a commit to troelsbjerre/llvm-project that referenced this issue Jan 10, 2024
Support stripping indirectly referenced DILocations from !llvm.loop m…
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla llvm:regalloc
Projects
None yet
Development

No branches or pull requests

2 participants