Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pass: unexpected register allocation failure #100

Closed
mmcloughlin opened this issue Dec 16, 2019 · 9 comments
Closed

pass: unexpected register allocation failure #100

mmcloughlin opened this issue Dec 16, 2019 · 9 comments
Labels
bug Something isn't working

Comments

@mmcloughlin
Copy link
Owner

@klauspost reported an unxepected register allocation failure in the following slack thread:

https://gophers.slack.com/archives/C6WDZJ70S/p1576430114006900?thread_ts=1576361104.001800&cid=C6WDZJ70S

The following code illustrates the problem (look for the FIXME comments)

https://github.com/klauspost/compress/pull/186/files#diff-60d6e6209d2fc7873851f07b946a5dc7R230

After an initial analysis using the experimental debug printer #6 it appears we are getting duplicate entries in the live register sets, for example <virtual:20:1:8> below.

...
		instruction {
			addr        = 0x0xc00010ed20
			opcode      = "CMPB"
			terminal    = false
			branch      = false
			conditional = false
			operands {
				0: CL
				1: DL
			}
			inputs {
				0: <virtual:22:1:1>
				1: <virtual:23:1:1>
			}
			outputs {
			}
			pred {
				0x0xc00010ec80
			}
			succ {
				0x0xc00010edc0
			}
			livein {
				size = 14
				<virtual:30:1:1>
				<virtual:20:1:8>
				<virtual:21:1:8>
				FP
				<virtual:20:1:8>
				<virtual:22:1:1>
				<virtual:30:1:4>
				SP
				<virtual:19:1:8>
				<virtual:29:1:2>
				<virtual:23:1:1>
				<virtual:29:1:4>
				<virtual:6:1:8>
				<virtual:29:1:1>
			}
			liveout {
				size = 12
				<virtual:19:1:8>
				<virtual:21:1:8>
				<virtual:20:1:8>
				<virtual:29:1:1>
				<virtual:29:1:4>
				<virtual:29:1:2>
				<virtual:6:1:8>
				<virtual:30:1:1>
				<virtual:30:1:4>
				<virtual:20:1:8>
				SP
				FP
			}
		}
...

Related #43 #65

@mmcloughlin mmcloughlin added the bug Something isn't working label Dec 16, 2019
mmcloughlin added a commit that referenced this issue Dec 18, 2019
Problem does appear to be related to register aliasing.
@mmcloughlin
Copy link
Owner Author

In the below, the t.As64() call causes this avo program to fail register allocation.

func main() {
	TEXT("Issue100", NOSPLIT, "func() uint64")
	x := GP64()
	XORQ(x, x)
	for i := 1; i <= 100; i++ {
		t := GP64()
		MOVQ(U32(i), t)
		ADDQ(t.As64(), x)
	}
	Store(x, ReturnIndex(0))
	RET()
	Generate()
}

klauspost added a commit to klauspost/avo that referenced this issue Jan 2, 2020
To help debugging keep track of where registers are allocated.

After a bit back and forth I ended up with this, which shows sorted physical id and where the virtual register is allocated in the calling code:

```
failed to allocate registers. 38 virtual:
00: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:81 main.genEncodeBlockAsm() (type:reg.gpv, kind:gpr64)
00: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:81 main.genEncodeBlockAsm() (type:reg.virtual, kind:gpr32)
01: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:83 main.genEncodeBlockAsm() (type:reg.gpv, kind:gpr64)
02: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:102 main.genEncodeBlockAsm() (type:reg.gpv, kind:gpr64)
02: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:102 main.genEncodeBlockAsm() (type:reg.virtual, kind:gpr32)
03: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:102 main.genEncodeBlockAsm() (type:reg.gpv, kind:gpr64)
03: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:102 main.genEncodeBlockAsm() (type:reg.virtual, kind:gpr32)
05: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:102 main.genEncodeBlockAsm() (type:reg.gpv, kind:gpr64)
05: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:102 main.genEncodeBlockAsm() (type:reg.virtual, kind:gpr32)
06: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:115 main.genEncodeBlockAsm() (type:reg.gpv, kind:gpr64)
06: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:115 main.genEncodeBlockAsm() (type:reg.virtual, kind:gpr32)
06: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:115 main.genEncodeBlockAsm() (type:reg.virtual, kind:gpr8)
07: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:134 main.genEncodeBlockAsm() (type:reg.gpv, kind:gpr64)
07: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:134 main.genEncodeBlockAsm() (type:reg.virtual, kind:gpr32)
08: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:160 main.genEncodeBlockAsm() (type:reg.gpv, kind:gpr64)
08: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:160 main.genEncodeBlockAsm() (type:reg.virtual, kind:gpr32)
09: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:176 main.genEncodeBlockAsm() (type:reg.gpv, kind:gpr64)
09: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:176 main.genEncodeBlockAsm() (type:reg.virtual, kind:gpr32)
10: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:176 main.genEncodeBlockAsm() (type:reg.gpv, kind:gpr64)
10: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:176 main.genEncodeBlockAsm() (type:reg.virtual, kind:gpr32)
10: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:176 main.genEncodeBlockAsm() (type:reg.virtual, kind:gpr64)
11: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:186 main.genEncodeBlockAsm() (type:reg.gpv, kind:gpr64)
11: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:186 main.genEncodeBlockAsm() (type:reg.virtual, kind:gpr16)
11: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:186 main.genEncodeBlockAsm() (type:reg.virtual, kind:gpr32)
11: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:186 main.genEncodeBlockAsm() (type:reg.virtual, kind:gpr64)
11: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:186 main.genEncodeBlockAsm() (type:reg.virtual, kind:gpr8)
12: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:191 main.genEncodeBlockAsm() (type:reg.gpv, kind:gpr64)
12: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:191 main.genEncodeBlockAsm() (type:reg.virtual, kind:gpr64)
13: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:381 main.emitLiteral() (type:reg.gpv, kind:gpr64)
13: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:381 main.emitLiteral() (type:reg.virtual, kind:gpr16)
13: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:381 main.emitLiteral() (type:reg.virtual, kind:gpr32)
13: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:381 main.emitLiteral() (type:reg.virtual, kind:gpr8)
14: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:382 main.emitLiteral() (type:reg.gpv, kind:gpr64)
14: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:382 main.emitLiteral() (type:reg.virtual, kind:gpr32)
14: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:382 main.emitLiteral() (type:reg.virtual, kind:gpr8)
15: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:247 main.genEncodeBlockAsm() (type:reg.virtual, kind:gpr16)
15: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:247 main.genEncodeBlockAsm() (type:reg.virtual, kind:gpr32)
15: e:/gopath/src/github.com/klauspost/compress/s2/gen.go:247 main.genEncodeBlockAsm() (type:reg.virtual, kind:gpr8)
```

When mmcloughlin#100 is fixed this will allow easier tracking of what avo considers allocated.

If nothing else, it could also help debugging mmcloughlin#100.

Using the package path is the most reasonable approach I could find to determine what to store.
@klauspost
Copy link
Contributor

@mmcloughlin I don't want to pressure, but it would help me a lot if you have any sort of outlook on this issue.

This is a hard blocker for me since I can't see a way to reasonably work around it.

@mmcloughlin
Copy link
Owner Author

@mmcloughlin I don't want to pressure, but it would help me a lot if you have any sort of outlook on this issue.

This is a hard blocker for me since I can't see a way to reasonably work around it.

Apologies for this. I haven't had a lot of time to devote to this with some recent travel. I'll should have some time this week.

Previously it had been possible to work around this by lifting some "local" registers to containing scopes. Is that no longer possible? It's not elegant I know.

@klauspost
Copy link
Contributor

Previously it had been possible to work around this by lifting some "local" registers to containing scopes.

Not really. It would give false dependencies on these registers and they wouldn't be seen as free when they aren't used. The parser fairly enough just checks for references, not if the previous value is actually used.

This means that functions in-between cannot use the physical register. I tried to do this by never reusing registers unless I need to re-use the data.

For instance, if I need more registers I will spill values to the stack, I can store the register on stack, never use the virtual register again, and create a new virtual register when I reload the value.

If I move the virtual register out in scope and re-use it, the parser will never free the register and I am unable to gain anything from spilling it to the stack.

@klauspost
Copy link
Contributor

klauspost commented Jan 13, 2020

It does seem that this is the intent, but there is just something that prevents .AsXX to properly de-reference the register, as nicely shown by your reproducer.

For example. If I go for the "allocate up front", I will keep the regs alive between uses which is not my intention.

tmp := GP64()
{
   MOVQ(U8(2), tmp)
   // do stuff with tmp...
}
{
   //... some unrelated code....
}
{
   XORQ(tmp, tmp)
   // do stuff with tmp...
}

Now tmp is kept alive across the entire section. If I allocate a tmp in the first and last block, the compiler should be able to reuse the register inbetween. Does that make sense?

Instead I would like to:

{
   tmp := GP64()
   MOVQ(U8(2), tmp)
   // do stuff with tmp...
}
{
   //... some unrelated code....
}
{
   tmp := GP64()
   XORQ(tmp, tmp)
   // do stuff with tmp...
}

This makes it clear that tmp is unused between the two blocks.

@mmcloughlin
Copy link
Owner Author

Thanks for the additional information. I definitely understand why you are using local registers, I was just hoping that might be a short-term fix for you.

I think I have a decent idea of where the problem is, but it's going to be a bit more of a far-reaching refactor than I had hoped. That was the main reason for introducing additional integration testing based on third-party packages #103. This will allow me to make the fix with greater confidence.

mmcloughlin added a commit that referenced this issue Jan 18, 2020
Requires all virtual registers to specify a mask rather than just a
size. Currently avo allows specifying an 8-bit register, which could
resolve to a low or high byte register. This complicates liveness
analysis and register allocation for what ends up being a very niche use
case.

Removes the `reg.Width` type since this is no longer used anywhere.

Updates #100
mmcloughlin added a commit that referenced this issue Jan 18, 2020
Provides register IDs. These will be used to help unify handling of
physical/virtual registers in liveness and allocation phases.

Updates #100
mmcloughlin added a commit that referenced this issue Jan 18, 2020
For liveness and allocation we actually want to know which parts of each
register are live at each time. The `reg.Set` structure was not correct.
This diff replaces it with `reg.MaskSet`, mapping a register ID to the
active mask.

Updates #100
mmcloughlin added a commit that referenced this issue Jan 19, 2020
Refactors the register allocator to be based on IDs, with masks applied
later.  This is just the result of hacking and slashing; it builds but
has not been tested at all.

Updates #100
mmcloughlin added a commit that referenced this issue Jan 19, 2020
Refactored allocator works on the ./examples/... generators.

Updates #100
mmcloughlin added a commit that referenced this issue Jan 19, 2020
Small example that triggers the aliasing problem. The refactored
allocator does not have the same problem.

Updates #100
mmcloughlin added a commit that referenced this issue Jan 19, 2020
Create a synthentic test case to verify correct handling of masking in
liveness analysis and the allocator.

Updates #100
@mmcloughlin
Copy link
Owner Author

I made some decent progress on this today. After hacking and slashing for a while I have a refactored allocator that handles aliased registers fundamentally differently. I should have done it this way the first time, but I kind of "bolted on" register aliasing late in the game and hence ended up with this flaw.

At the moment the code works on ./examples/... and ./tests/..., as well as the reproducer I gave earlier in this issue (#100 (comment)) and another synthetic test I made up (e221927). I'm going to test it more thoroughly tomorrow. Unit tests are also broken and there's other cleanup required, but barring a major problem I could land this tomorrow.

cc @klauspost

mmcloughlin added a commit that referenced this issue Jan 20, 2020
Requires all virtual registers to specify a mask rather than just a
size. Currently avo allows specifying an 8-bit register, which could
resolve to a low or high byte register. This complicates liveness
analysis and register allocation for what ends up being a very niche use
case.

Removes the `reg.Width` type since this is no longer used anywhere.

Updates #100
mmcloughlin added a commit that referenced this issue Jan 20, 2020
Provides register IDs. These will be used to help unify handling of
physical/virtual registers in liveness and allocation phases.

Updates #100
mmcloughlin added a commit that referenced this issue Jan 20, 2020
For liveness and allocation we actually want to know which parts of each
register are live at each time. The `reg.Set` structure was not correct.
This diff replaces it with `reg.MaskSet`, mapping a register ID to the
active mask.

Updates #100
mmcloughlin added a commit that referenced this issue Jan 20, 2020
Refactors the register allocator to be based on IDs, with masks applied
later.  This is just the result of hacking and slashing; it builds but
has not been tested at all.

Updates #100
mmcloughlin added a commit that referenced this issue Jan 20, 2020
Refactored allocator works on the ./examples/... generators.

Updates #100
mmcloughlin added a commit that referenced this issue Jan 20, 2020
Small example that triggers the aliasing problem. The refactored
allocator does not have the same problem.

Updates #100
mmcloughlin added a commit that referenced this issue Jan 20, 2020
Create a synthentic test case to verify correct handling of masking in
liveness analysis and the allocator.

Updates #100
mmcloughlin added a commit that referenced this issue Jan 20, 2020
mmcloughlin added a commit that referenced this issue Jan 23, 2020
Requires all virtual registers to specify a mask rather than just a
size. Currently avo allows specifying an 8-bit register, which could
resolve to a low or high byte register. This complicates liveness
analysis and register allocation for what ends up being a very niche use
case.

Removes the `reg.Width` type since this is no longer used anywhere.

Updates #100
mmcloughlin added a commit that referenced this issue Jan 23, 2020
Provides register IDs. These will be used to help unify handling of
physical/virtual registers in liveness and allocation phases.

Updates #100
mmcloughlin added a commit that referenced this issue Jan 23, 2020
For liveness and allocation we actually want to know which parts of each
register are live at each time. The `reg.Set` structure was not correct.
This diff replaces it with `reg.MaskSet`, mapping a register ID to the
active mask.

Updates #100
mmcloughlin added a commit that referenced this issue Jan 23, 2020
Refactors the register allocator to be based on IDs, with masks applied
later.  This is just the result of hacking and slashing; it builds but
has not been tested at all.

Updates #100
mmcloughlin added a commit that referenced this issue Jan 23, 2020
Refactored allocator works on the ./examples/... generators.

Updates #100
mmcloughlin added a commit that referenced this issue Jan 23, 2020
Small example that triggers the aliasing problem. The refactored
allocator does not have the same problem.

Updates #100
mmcloughlin added a commit that referenced this issue Jan 23, 2020
Create a synthentic test case to verify correct handling of masking in
liveness analysis and the allocator.

Updates #100
@mmcloughlin
Copy link
Owner Author

@klauspost I optimistically closed this after landing #121, but obviously feel free to re-open this or file another issue if you are still seeing problems.

@klauspost
Copy link
Contributor

Awesome, thanks!

mmcloughlin added a commit that referenced this issue Jan 28, 2020
Adds a regression test based on klauspost/compress#186. This necessitated some related changes:

* Mark "RET" as a terminal instruction
* printer refactor to maintain compatibility with asmfmt
* Tweaks to other regression tests to ensure they are run correctly in CI

Updates #100 #65 #8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants