WIP: riscv64: Use the zero register directly #7162

alexcrichton · 2023-10-05T19:39:33Z

This commit is an attempt to use the zero register more often in the riscv64 backend. Currently materializing an iconst 0 value moves the zero register into a general purpose register, and then the general purpose register is used. This means that an extra register is used when it doesn't need to be, for example.

Naively fixing this by returning (zero_reg) during constant emission does not work. One reason this doesn't work is that it introduces vcode aliases to the physical register 0, and during OperandCollector collection the physical register makes its way into normal constraints rather than being a fixed nonallocatable constraint. This is fixable by moving the "resolve alias" phase earlier, for example at the beginning of reg_use rather than lower down in add_operand.

Even after applying such a fix, however, that solution still does not work. This produces situations such as when a 0 constant is returned from a function that the zero register is attempted to be placed into the first return register. These two physical register constraints naturally conflict and nothing is around to generate a move instruction. This happens within OperandCollector meaning it's not easy to generate more instructions at that time.

The "fix", which is incomplete, that this PR implements is to introduce a pseudo-instruction which has a fixed register definition of the physical register zero. This instruction emits no code and is intended to provide a location for regalloc2 to split bundles by automatically inserting a move in the above situation. To get this working however it required removing an assertion in the OperandCollector and ends up violating an invariant of regalloc2 where a virtual register is assigned to a non-allocatable register. This ends up generating instructions which move general purpose registers into the zero register on riscv64, which is unlikely to be what regalloc intended.

This commit is an attempt to use the `zero` register more often in the riscv64 backend. Currently materializing an `iconst 0` value moves the `zero` register into a general purpose register, and then the general purpose register is used. This means that an extra register is used when it doesn't need to be, for example. Naively fixing this by returning `(zero_reg)` during constant emission does not work. One reason this doesn't work is that it introduces vcode aliases to the physical register 0, and during `OperandCollector` collection the physical register makes its way into normal constraints rather than being a fixed nonallocatable constraint. This is fixable by moving the "resolve alias" phase earlier, for example at the beginning of `reg_use` rather than lower down in `add_operand`. Even after applying such a fix, however, that solution still does not work. This produces situations such as when a 0 constant is returned from a function that the zero register is attempted to be placed into the first return register. These two physical register constraints naturally conflict and nothing is around to generate a move instruction. This happens within `OperandCollector` meaning it's not easy to generate more instructions at that time. The "fix", which is incomplete, that this PR implements is to introduce a pseudo-instruction which has a fixed register definition of the physical register zero. This instruction emits no code and is intended to provide a location for regalloc2 to split bundles by automatically inserting a move in the above situation. To get this working however it required removing an assertion in the `OperandCollector` and ends up violating an invariant of regalloc2 where a virtual register is assigned to a non-allocatable register. This ends up generating instructions which move general purpose registers into the zero register on riscv64, which is unlikely to be what regalloc intended.

alexcrichton · 2023-10-05T19:43:19Z

I'm opening this after some discussion with @elliottt today about this. We weren't really sure how best to proceed, if at all, with getting this to work with regalloc2. It seems that some sort of new constraint or new logic is required in regalloc2 no matter what, but we weren't quite sure how to modify it. I think @elliottt had a better idea about a possible shape, but he can probably describe it better than I.

@cfallin if you've got time (this isn't urgent by any means) we're curious to get your take on this shape of an approach with a pseudo-instruction that defines a value which we can place a special operand constraint on if necessary. Also cc @afonso360 as you're likely interested in this.

One way I thought of thinking about this problem is that for each platform there are sort of read-only physical registers in addition to the allocatable registers. The zero register on riscv64 is one example but the stack pointer and frame pointer are another on most platforms (since we typically don't want regalloc2 to allocate any virtual registers to those destinations). Something about being able to specify this seems like what we'd want with regalloc2 to (a) allow this in the first place (despite there not actually being a panic today) and (b) prevent merging bundles because bundles shouldn't in general be merged with the pseudo-instruction defining the zero register for example. (@elliottt had more thoughts on this too)

cfallin · 2023-10-05T19:45:40Z

The "fix", which is incomplete, that this PR implements is to introduce a pseudo-instruction which has a fixed register definition of the physical register zero. This instruction emits no code and is intended to provide a location for regalloc2 to split bundles by automatically inserting a move in the above situation. To get this working however it required removing an assertion in the OperandCollector and ends up violating an invariant of regalloc2 where a virtual register is assigned to a non-allocatable register. This ends up generating instructions which move general purpose registers into the zero register on riscv64, which is unlikely to be what regalloc intended.

Eek, yeah, this approach is going to break RA2 completely: there is a strong divide between registers that are allocatable and those that aren't, and using a fixed constraint on a normal vreg will try to put the whole bundle in the fixed register. The zero register can't be used that way, so as you've seen, breakage results.

We actually had a bunch of discussion over in RA2 issues about this a year or so ago; it resulted in a new kind of operand, the "fixed nonallocatable" constraint: you want Operand::fixed_nonallocatable(). That should solve the issue without changing anything about the broader lowering algorithm (and please don't, it could have a bunch of other implications we'd have to think carefully about!).

cfallin · 2023-10-05T19:47:13Z

See e.g. this thread where Amanieu ran into these same problems with the zero register, and reported some nice speedups in his application when using the new constraint he added.

afonso360 · 2023-10-05T19:48:25Z

Just wanted to say that this is an amazing coincidence since I was just about to write an issue on this very topic! I don't have any ideas on how to go about this, it was mostly a request for help.

alexcrichton · 2023-10-05T20:03:06Z

Yes sorry to be clear I'm not proposing changing any foundations or anything like that, I'm trying to understand the foundations myself and figure out how best to fit a solution into them.

I'm not sure how to use Operand::fixed_nonallocatable though? I thought that was sort of the first thing I tried which was to use the (zero_reg) directly in lowering (imm _ty 0). That ended up causing issues though where effectively at the end of the function Inst::Rets would create a constarint that a vreg, which was aliased to the zero register, was requierd to be in the first physical return register. This didn't seem resolvable by regalloc2 in talking with @elliottt which is when we came up with the idea of a pseudo-instruction, but with the pseudo-instruction I'm not sure how to use Operand::fixed_nonallocatable. Do you have an idea in mind though of what the register allocation operands would look like for this pseudo-instruction?

elliottt · 2023-10-05T20:10:56Z

The problem was that we wanted to return a value of the zero register for the lowering of a constant, but the fixed nonallocatable constraint only works for uses not defs. The solution we've used in the past is to make special instructions for interacting with nonallocatable registers (MovToPReg/MovFromPReg in the x64 backend for example), but those end up turning into the moves that Alex is trying to avoid in the first place.

For argument 0 immediates the use of fixed nonallocatable constraint is great, but when it would be desirable to have that value propagate further is where we just don't have a good story with RA2.

cfallin · 2023-10-05T21:11:45Z

Indeed, it works for uses and not defs -- which implies that we can't in isolation lower an (iconst 0) to a zero reg, but that makes sense because an use of the zero reg isn't actually storing the zero anywhere first. In other words, this decomposition -- "(i) lower an immediate 0, (ii) use it somewhere" is what implies the need for a normal reg.

Now of course it's a standard idiom on ISAs with a zero reg to use it as an arg; so what we need to do is to reflect that structure in the lowering as well, and handle this as part of the arg, when we lower the instruction that uses it.

In other words, we would want to match (e.g.) (iadd x (iconst 0)) and then directly emit an instruction that uses a fixed-nonallocatable Operand in its use-slot. That would imply a new Inst format, because it has different regalloc metadata. (EDIT: or the same format if you do the thing you mention in the regalloc metadata collector; but here the value is given directly and not via an alias, so it should work out.)

cfallin · 2023-10-05T21:27:05Z

One further thought: I think the clearest argument to me for why the above (do it as part of matching, not as a separate step) is necessary is exactly the case you're running into: returning 0 as a return value needs an instruction to produce the zero value, while using it as an argument does not. In that sense it's more like an immediate (add r1, r2, #2); the zero register is a like a weird encoding for a constant #0. We fundamentally need to "sink" it into consumers to have this distinction and generate an inst in the first case but not the second.

alexcrichton · 2023-10-06T14:18:54Z

Yeah that works, and it's already what riscv64 and aarch64 are doing I believe. They don't even have special instruction variants, the (zero_reg) is just used in the normal Reg slot as other normal temporaries are used. My hope was that given the commonality of an architecture-specific zero register between these two we wouldn't have to go through all relevant rules and add lowering variants for them. Given how non-special it is in the ISA (e.g. for add it's not a new encoding, it's "just" that the register is set to 0) I would ideally hope that it wouldn't need to be that special in the lowering rules.

That all being said my hope is that there'd be an idea that would be relatively easy/non-invasive to integrate into regalloc2. If there isn't though then alas, but not much can be done about that.

This commit is inspired by discussion on bytecodealliance#8695 which made me remember the discussion around bytecodealliance#7162 historically. In lieu of a deeper fix for the issue of "why can't `iconst 0` use `(zero_reg)`" it's still possible to add special-cases to rules throughout the backend so this commit does that for generating zero-value floats.

* riscv64: Special-case `f32const 0` and `f64const 0` This commit is inspired by discussion on #8695 which made me remember the discussion around #7162 historically. In lieu of a deeper fix for the issue of "why can't `iconst 0` use `(zero_reg)`" it's still possible to add special-cases to rules throughout the backend so this commit does that for generating zero-value floats. * Fix tests * Run all tests on CI prtest:full

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. labels Oct 5, 2023

alexcrichton closed this Oct 10, 2023

alexcrichton deleted the rv64-use-zero-reg branch October 10, 2023 10:33

afonso360 mentioned this pull request May 28, 2024

riscv64: Initial support for the ZiCond Extension #8695

Merged

alexcrichton mentioned this pull request May 28, 2024

riscv64: Special-case f32const 0 and f64const 0 #8701

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: riscv64: Use the zero register directly #7162

WIP: riscv64: Use the zero register directly #7162

alexcrichton commented Oct 5, 2023

alexcrichton commented Oct 5, 2023

cfallin commented Oct 5, 2023

cfallin commented Oct 5, 2023

afonso360 commented Oct 5, 2023 •

edited

Loading

alexcrichton commented Oct 5, 2023

elliottt commented Oct 5, 2023 •

edited

Loading

cfallin commented Oct 5, 2023 •

edited

Loading

cfallin commented Oct 5, 2023

alexcrichton commented Oct 6, 2023

WIP: riscv64: Use the zero register directly #7162

WIP: riscv64: Use the zero register directly #7162

Conversation

alexcrichton commented Oct 5, 2023

alexcrichton commented Oct 5, 2023

cfallin commented Oct 5, 2023

cfallin commented Oct 5, 2023

afonso360 commented Oct 5, 2023 • edited Loading

alexcrichton commented Oct 5, 2023

elliottt commented Oct 5, 2023 • edited Loading

cfallin commented Oct 5, 2023 • edited Loading

cfallin commented Oct 5, 2023

alexcrichton commented Oct 6, 2023

afonso360 commented Oct 5, 2023 •

edited

Loading

elliottt commented Oct 5, 2023 •

edited

Loading

cfallin commented Oct 5, 2023 •

edited

Loading