Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: riscv64: Use the zero register directly #7162

Closed

Conversation

alexcrichton
Copy link
Member

This commit is an attempt to use the zero register more often in the riscv64 backend. Currently materializing an iconst 0 value moves the zero register into a general purpose register, and then the general purpose register is used. This means that an extra register is used when it doesn't need to be, for example.

Naively fixing this by returning (zero_reg) during constant emission does not work. One reason this doesn't work is that it introduces vcode aliases to the physical register 0, and during OperandCollector collection the physical register makes its way into normal constraints rather than being a fixed nonallocatable constraint. This is fixable by moving the "resolve alias" phase earlier, for example at the beginning of reg_use rather than lower down in add_operand.

Even after applying such a fix, however, that solution still does not work. This produces situations such as when a 0 constant is returned from a function that the zero register is attempted to be placed into the first return register. These two physical register constraints naturally conflict and nothing is around to generate a move instruction. This happens within OperandCollector meaning it's not easy to generate more instructions at that time.

The "fix", which is incomplete, that this PR implements is to introduce a pseudo-instruction which has a fixed register definition of the physical register zero. This instruction emits no code and is intended to provide a location for regalloc2 to split bundles by automatically inserting a move in the above situation. To get this working however it required removing an assertion in the OperandCollector and ends up violating an invariant of regalloc2 where a virtual register is assigned to a non-allocatable register. This ends up generating instructions which move general purpose registers into the zero register on riscv64, which is unlikely to be what regalloc intended.

This commit is an attempt to use the `zero` register more often in the
riscv64 backend. Currently materializing an `iconst 0` value moves the
`zero` register into a general purpose register, and then the general
purpose register is used. This means that an extra register is used when
it doesn't need to be, for example.

Naively fixing this by returning `(zero_reg)` during constant emission
does not work. One reason this doesn't work is that it introduces vcode
aliases to the physical register 0, and during `OperandCollector`
collection the physical register makes its way into normal constraints
rather than being a fixed nonallocatable constraint. This is fixable by
moving the "resolve alias" phase earlier, for example at the beginning
of `reg_use` rather than lower down in `add_operand`.

Even after applying such a fix, however, that solution still does not
work. This produces situations such as when a 0 constant is returned
from a function that the zero register is attempted to be placed into
the first return register. These two physical register constraints
naturally conflict and nothing is around to generate a move instruction.
This happens within `OperandCollector` meaning it's not easy to generate
more instructions at that time.

The "fix", which is incomplete, that this PR implements is to introduce
a pseudo-instruction which has a fixed register definition of the
physical register zero. This instruction emits no code and is intended
to provide a location for regalloc2 to split bundles by automatically
inserting a move in the above situation. To get this working however it
required removing an assertion in the `OperandCollector` and ends up
violating an invariant of regalloc2 where a virtual register is assigned
to a non-allocatable register. This ends up generating instructions
which move general purpose registers into the zero register on riscv64,
which is unlikely to be what regalloc intended.
@alexcrichton
Copy link
Member Author

I'm opening this after some discussion with @elliottt today about this. We weren't really sure how best to proceed, if at all, with getting this to work with regalloc2. It seems that some sort of new constraint or new logic is required in regalloc2 no matter what, but we weren't quite sure how to modify it. I think @elliottt had a better idea about a possible shape, but he can probably describe it better than I.

@cfallin if you've got time (this isn't urgent by any means) we're curious to get your take on this shape of an approach with a pseudo-instruction that defines a value which we can place a special operand constraint on if necessary. Also cc @afonso360 as you're likely interested in this.

One way I thought of thinking about this problem is that for each platform there are sort of read-only physical registers in addition to the allocatable registers. The zero register on riscv64 is one example but the stack pointer and frame pointer are another on most platforms (since we typically don't want regalloc2 to allocate any virtual registers to those destinations). Something about being able to specify this seems like what we'd want with regalloc2 to (a) allow this in the first place (despite there not actually being a panic today) and (b) prevent merging bundles because bundles shouldn't in general be merged with the pseudo-instruction defining the zero register for example. (@elliottt had more thoughts on this too)

@cfallin
Copy link
Member

cfallin commented Oct 5, 2023

The "fix", which is incomplete, that this PR implements is to introduce a pseudo-instruction which has a fixed register definition of the physical register zero. This instruction emits no code and is intended to provide a location for regalloc2 to split bundles by automatically inserting a move in the above situation. To get this working however it required removing an assertion in the OperandCollector and ends up violating an invariant of regalloc2 where a virtual register is assigned to a non-allocatable register. This ends up generating instructions which move general purpose registers into the zero register on riscv64, which is unlikely to be what regalloc intended.

Eek, yeah, this approach is going to break RA2 completely: there is a strong divide between registers that are allocatable and those that aren't, and using a fixed constraint on a normal vreg will try to put the whole bundle in the fixed register. The zero register can't be used that way, so as you've seen, breakage results.

We actually had a bunch of discussion over in RA2 issues about this a year or so ago; it resulted in a new kind of operand, the "fixed nonallocatable" constraint: you want Operand::fixed_nonallocatable(). That should solve the issue without changing anything about the broader lowering algorithm (and please don't, it could have a bunch of other implications we'd have to think carefully about!).

@cfallin
Copy link
Member

cfallin commented Oct 5, 2023

See e.g. this thread where Amanieu ran into these same problems with the zero register, and reported some nice speedups in his application when using the new constraint he added.

@afonso360
Copy link
Contributor

afonso360 commented Oct 5, 2023

Just wanted to say that this is an amazing coincidence since I was just about to write an issue on this very topic! I don't have any ideas on how to go about this, it was mostly a request for help.

@alexcrichton
Copy link
Member Author

Yes sorry to be clear I'm not proposing changing any foundations or anything like that, I'm trying to understand the foundations myself and figure out how best to fit a solution into them.

I'm not sure how to use Operand::fixed_nonallocatable though? I thought that was sort of the first thing I tried which was to use the (zero_reg) directly in lowering (imm _ty 0). That ended up causing issues though where effectively at the end of the function Inst::Rets would create a constarint that a vreg, which was aliased to the zero register, was requierd to be in the first physical return register. This didn't seem resolvable by regalloc2 in talking with @elliottt which is when we came up with the idea of a pseudo-instruction, but with the pseudo-instruction I'm not sure how to use Operand::fixed_nonallocatable. Do you have an idea in mind though of what the register allocation operands would look like for this pseudo-instruction?

@elliottt
Copy link
Member

elliottt commented Oct 5, 2023

The problem was that we wanted to return a value of the zero register for the lowering of a constant, but the fixed nonallocatable constraint only works for uses not defs. The solution we've used in the past is to make special instructions for interacting with nonallocatable registers (MovToPReg/MovFromPReg in the x64 backend for example), but those end up turning into the moves that Alex is trying to avoid in the first place.

For argument 0 immediates the use of fixed nonallocatable constraint is great, but when it would be desirable to have that value propagate further is where we just don't have a good story with RA2.

@cfallin
Copy link
Member

cfallin commented Oct 5, 2023

Indeed, it works for uses and not defs -- which implies that we can't in isolation lower an (iconst 0) to a zero reg, but that makes sense because an use of the zero reg isn't actually storing the zero anywhere first. In other words, this decomposition -- "(i) lower an immediate 0, (ii) use it somewhere" is what implies the need for a normal reg.

Now of course it's a standard idiom on ISAs with a zero reg to use it as an arg; so what we need to do is to reflect that structure in the lowering as well, and handle this as part of the arg, when we lower the instruction that uses it.

In other words, we would want to match (e.g.) (iadd x (iconst 0)) and then directly emit an instruction that uses a fixed-nonallocatable Operand in its use-slot. That would imply a new Inst format, because it has different regalloc metadata. (EDIT: or the same format if you do the thing you mention in the regalloc metadata collector; but here the value is given directly and not via an alias, so it should work out.)

@cfallin
Copy link
Member

cfallin commented Oct 5, 2023

One further thought: I think the clearest argument to me for why the above (do it as part of matching, not as a separate step) is necessary is exactly the case you're running into: returning 0 as a return value needs an instruction to produce the zero value, while using it as an argument does not. In that sense it's more like an immediate (add r1, r2, #2); the zero register is a like a weird encoding for a constant #0. We fundamentally need to "sink" it into consumers to have this distinction and generate an inst in the first case but not the second.

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. labels Oct 5, 2023
@alexcrichton
Copy link
Member Author

Yeah that works, and it's already what riscv64 and aarch64 are doing I believe. They don't even have special instruction variants, the (zero_reg) is just used in the normal Reg slot as other normal temporaries are used. My hope was that given the commonality of an architecture-specific zero register between these two we wouldn't have to go through all relevant rules and add lowering variants for them. Given how non-special it is in the ISA (e.g. for add it's not a new encoding, it's "just" that the register is set to 0) I would ideally hope that it wouldn't need to be that special in the lowering rules.

That all being said my hope is that there'd be an idea that would be relatively easy/non-invasive to integrate into regalloc2. If there isn't though then alas, but not much can be done about that.

@alexcrichton alexcrichton deleted the rv64-use-zero-reg branch October 10, 2023 10:33
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request May 28, 2024
This commit is inspired by discussion on bytecodealliance#8695 which made me remember
the discussion around bytecodealliance#7162 historically. In lieu of a deeper fix for
the issue of "why can't `iconst 0` use `(zero_reg)`" it's still possible
to add special-cases to rules throughout the backend so this commit does
that for generating zero-value floats.
github-merge-queue bot pushed a commit that referenced this pull request May 30, 2024
* riscv64: Special-case `f32const 0` and `f64const 0`

This commit is inspired by discussion on #8695 which made me remember
the discussion around #7162 historically. In lieu of a deeper fix for
the issue of "why can't `iconst 0` use `(zero_reg)`" it's still possible
to add special-cases to rules throughout the backend so this commit does
that for generating zero-value floats.

* Fix tests

* Run all tests on CI

prtest:full
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants