Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reg,pass: refactor allocation of aliased registers #121

Merged
merged 25 commits into from
Jan 23, 2020

Conversation

mmcloughlin
Copy link
Owner

Issue #100 demonstrated that register allocation for aliased registers is
fundamentally broken. The root of the issue is that currently accesses to the
same virtual register with different masks are treated as different registers.
This PR takes a different approach:

  • Liveness analysis is masked: we now properly consider which parts of a register are live
  • Register allocation produces a mapping from virtual to physical ID, and aliasing is applied later

@klauspost
Copy link
Contributor

This is super awesome. All my code "compiles" now and the power feels... overwhelming.

Still have no idea if my code works, but at least now I can generate it and start debugging 👍

klauspost added a commit to klauspost/compress that referenced this pull request Jan 19, 2020
@codecov-io
Copy link

codecov-io commented Jan 20, 2020

Codecov Report

Merging #121 into master will increase coverage by 0.01%.
The diff coverage is 91.03%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #121      +/-   ##
==========================================
+ Coverage    77.6%   77.62%   +0.01%     
==========================================
  Files          55       55              
  Lines       33326    33376      +50     
==========================================
+ Hits        25864    25909      +45     
  Misses       7354     7354              
- Partials      108      113       +5
Flag Coverage Δ
#integration 6.4% <81.6%> (+0.18%) ⬆️
#unittests 75.58% <50.94%> (-0.09%) ⬇️
Impacted Files Coverage Δ
pass/pass.go 61.53% <ø> (ø) ⬆️
ir/ir.go 90.47% <ø> (ø) ⬆️
operand/types.go 95.77% <100%> (ø) ⬆️
build/global.go 80.43% <100%> (+0.88%) ⬆️
reg/collection.go 100% <100%> (+6.25%) ⬆️
reg/x86.go 100% <100%> (ø) ⬆️
reg/types.go 84.25% <81.25%> (-6.33%) ⬇️
pass/reg.go 84.37% <87.5%> (-3.13%) ⬇️
reg/set.go 87.27% <93.18%> (+5.22%) ⬆️
pass/alloc.go 89.79% <95.34%> (-0.21%) ⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 126469f...8c19550. Read the comment docs.

@mmcloughlin mmcloughlin marked this pull request as ready for review January 20, 2020 00:12
@klauspost
Copy link
Contributor

It seems like something is still a bit weird. I have two pieces of code that after some changes that shouldn't cause allocation changes are now causing issues.

Case 1

This is probably the easiest to understand. In this case it simply skips over the piece of code that are inlined below. Adding the branch should not make any difference on allocation, but it seems to do so.

Case 2

This skips a whole bunch of code if taken, but none of the registers allocated in that piece of code (rep, left, right) is referenced in where it is jumping to. The registers at the destination should already be carried over because the rest of the code can end up there anyway.

It can easily be a problem in my code, which is why it would be great to have more info than "failed to allocate registers".

@klauspost
Copy link
Contributor

klauspost commented Jan 20, 2020

FWIW, this commit is where it 'broke'. No other changes than size-related AFAICT.

I am looking at changing parts of it so I can see where it breaks.

Edit: ok, made a "mimimal" reproducer:

These changes breaks the allocator. Before this change things are fine, but when the changes are applied it runs out of registers. There is nothing in the change that by my understanding should change register allocation since we are just using 'As32()' to get the 32 bit equivalent.

@mmcloughlin
Copy link
Owner Author

@klauspost Sorry there's another problem, and thanks for your effort debugging this!

There's something I've been aware of but didn't think was going to be an immediate problem, specifically the behavior of 32-bit operands in 64-bit mode. From the Intel Manual:

When in 64-bit mode, operand size determines the number of valid bits in the destination general-purpose register:

  • 64-bit operands generate a 64-bit result in the destination general-purpose register.
  • 32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose register.
  • 8-bit and 16-bit operands generate an 8-bit or 16-bit result. The upper 56 bits or 48 bits (respectively) of the destination general-purpose register are not modified by the operation. If the result of an 8-bit or 16-bit operation is intended for 64-bit address calculation, explicitly sign-extend the register to the full 64-bits.

This means that 32-bit instructions will actually cause a write to the full 64-bit register. However the current implementation in avo does not handle this correctly: it sees a write to a 32-bit register and doesn't account for the zero-extension. So if a 32-bit write is followed by a 64-bit read of the same register, liveness analysis would consider there to be a dependency on the high 32-bits of the register, when in fact there is no such dependency.

The fact that your problem manifests when you change sizes from 64-bit to 32-bit makes me think the above issue might be to blame. However this is just a guess at this point, I'll have to dig into the details (I'll have time later today).

mmcloughlin added a commit that referenced this pull request Jan 20, 2020
@klauspost
Copy link
Contributor

Not completely sure I understand what it does. Kinda just assumed a GP64() lived as long as it was referenced (or an AsXX(), referring to the same register).

Anyway, if you don't crack it, a workaround would be great. I can at least run parts of the code now, and I got a reasonable setup with delve, so I can actually debug it somewhat, so making progress.

@mmcloughlin
Copy link
Owner Author

Okay, using my experimental debug printer (https://github.com/mmcloughlin/avo/compare/regalloc-debug) we see

function {
	name       = "encodeBlockAsm"
	attributes = 0x0000 0
	doc        = []string{"encodeBlockAsm encodes a non-empty src to a guaranteed-large-enough dst.", "It assumes that the varint-encoded length of the decompressed bytes has already been written.", ""}
	signature  = (dst []byte, src []byte) int
	local size = 65568
	nodes {
		instruction {
			addr        = 0x0xc00010a000
			opcode      = "MOVQ"
			terminal    = false
			branch      = false
			conditional = false
			operands {
				0: $0x00000200
				1: AX
			}
			inputs {
			}
			outputs {
				0: <virtual:0:1:8>
			}
			pred {
			}
			succ {
				0x0xc00010a0c0
			}
			livein {
				000b0101: 08
				00220101: 08
				001a0101: 08
				00250101: 08
				005e0101: 08
			}
			liveout {
				00000101: 0f
				000b0101: 08
				00220101: 08
				001a0101: 08
				00250101: 08
				005e0101: 08
			}
		}
		...

Note the livein and liveout sections show a map from register ID to live mask. The 0x08 masks here indicate that avo thinks the high 32 bits of those registers are live, lending some credence to the theory in my last comment.

@klauspost
Copy link
Contributor

ok, I managed to work around it by not converting registers as much.

@mmcloughlin
Copy link
Owner Author

ok, I managed to work around it by not converting registers as much.

Good to hear. I'm going to try to work on a proper fix for this.

Not completely sure I understand what it does. Kinda just assumed a GP64() lived as long as it was referenced (or an AsXX(), referring to the same register).

Just to clarify what the problem is, consider the following:

...
x := GP64()
MOVW(U16(42), x.As16())
ADDQ(x, ...)
...

The move instruction only writes the low 16 bits, and the add consume the full 64 bits. Therefore avo liveness analysis would consider the high 48 bits to be live prior to the move instruction (all the way to the beginning of the function if there's no other write to x). This is actually correct, since a 16-bit operation will only write to the low 16 bits of the register. But what about the same thing in 32 bit?

...
x := GP64()
MOVL(U32(42), x.As32())
ADDQ(x, ...)
...

This should not exhibit the same problem as the 16 bit version, because in x86-64 the 32-bit write caused by MOVL will be zero extended to a 64-bit write. Hence it's just a 64-bit write followed by a 64-bit read, perfectly fine. The problem is that avo at the moment does not account for the zero-extension, and it sees a 32-bit write followed by a 64-bit read, and thinks that the upper 32 bits of x is live prior to the move instruction.

Ultimately this means avo thinks there are more live registers than there actually are, and in your case this is bad enough it prevented register allocation. If I have avo account for the zero-extension, the problem should be resolved. I hope 😅

mmcloughlin added a commit that referenced this pull request Jan 20, 2020
Test to ensure that `avo` correctly accounts for zero-extension of
32-bit operands to 64 bits.

Updates #121
mmcloughlin added a commit that referenced this pull request Jan 21, 2020
mmcloughlin added a commit that referenced this pull request Jan 21, 2020
Test to ensure that `avo` correctly accounts for zero-extension of
32-bit operands to 64 bits.

Updates #121
mmcloughlin added a commit that referenced this pull request Jan 21, 2020
@mmcloughlin
Copy link
Owner Author

@klauspost I have added an additional pass that handles the 32-bit zero extension issue I described. With this change the code you referenced in #121 (comment) doesn't fail allocation.

Please let me know how you get on with this new version. I actually think your change in klauspost/compress@214d2e2 is probably preferable anyway, not just as a workaround for this problem.

Once again, thanks for working through this with me 😄

Requires all virtual registers to specify a mask rather than just a
size. Currently avo allows specifying an 8-bit register, which could
resolve to a low or high byte register. This complicates liveness
analysis and register allocation for what ends up being a very niche use
case.

Removes the `reg.Width` type since this is no longer used anywhere.

Updates #100
Provides register IDs. These will be used to help unify handling of
physical/virtual registers in liveness and allocation phases.

Updates #100
For liveness and allocation we actually want to know which parts of each
register are live at each time. The `reg.Set` structure was not correct.
This diff replaces it with `reg.MaskSet`, mapping a register ID to the
active mask.

Updates #100
Refactors the register allocator to be based on IDs, with masks applied
later.  This is just the result of hacking and slashing; it builds but
has not been tested at all.

Updates #100
Refactored allocator works on the ./examples/... generators.

Updates #100
Small example that triggers the aliasing problem. The refactored
allocator does not have the same problem.

Updates #100
Create a synthentic test case to verify correct handling of masking in
liveness analysis and the allocator.

Updates #100
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants