-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: riscv64: Use the zero register directly #7162
Conversation
This commit is an attempt to use the `zero` register more often in the riscv64 backend. Currently materializing an `iconst 0` value moves the `zero` register into a general purpose register, and then the general purpose register is used. This means that an extra register is used when it doesn't need to be, for example. Naively fixing this by returning `(zero_reg)` during constant emission does not work. One reason this doesn't work is that it introduces vcode aliases to the physical register 0, and during `OperandCollector` collection the physical register makes its way into normal constraints rather than being a fixed nonallocatable constraint. This is fixable by moving the "resolve alias" phase earlier, for example at the beginning of `reg_use` rather than lower down in `add_operand`. Even after applying such a fix, however, that solution still does not work. This produces situations such as when a 0 constant is returned from a function that the zero register is attempted to be placed into the first return register. These two physical register constraints naturally conflict and nothing is around to generate a move instruction. This happens within `OperandCollector` meaning it's not easy to generate more instructions at that time. The "fix", which is incomplete, that this PR implements is to introduce a pseudo-instruction which has a fixed register definition of the physical register zero. This instruction emits no code and is intended to provide a location for regalloc2 to split bundles by automatically inserting a move in the above situation. To get this working however it required removing an assertion in the `OperandCollector` and ends up violating an invariant of regalloc2 where a virtual register is assigned to a non-allocatable register. This ends up generating instructions which move general purpose registers into the zero register on riscv64, which is unlikely to be what regalloc intended.
I'm opening this after some discussion with @elliottt today about this. We weren't really sure how best to proceed, if at all, with getting this to work with regalloc2. It seems that some sort of new constraint or new logic is required in regalloc2 no matter what, but we weren't quite sure how to modify it. I think @elliottt had a better idea about a possible shape, but he can probably describe it better than I. @cfallin if you've got time (this isn't urgent by any means) we're curious to get your take on this shape of an approach with a pseudo-instruction that defines a value which we can place a special operand constraint on if necessary. Also cc @afonso360 as you're likely interested in this. One way I thought of thinking about this problem is that for each platform there are sort of read-only physical registers in addition to the allocatable registers. The zero register on riscv64 is one example but the stack pointer and frame pointer are another on most platforms (since we typically don't want regalloc2 to allocate any virtual registers to those destinations). Something about being able to specify this seems like what we'd want with regalloc2 to (a) allow this in the first place (despite there not actually being a panic today) and (b) prevent merging bundles because bundles shouldn't in general be merged with the pseudo-instruction defining the zero register for example. (@elliottt had more thoughts on this too) |
Eek, yeah, this approach is going to break RA2 completely: there is a strong divide between registers that are allocatable and those that aren't, and using a fixed constraint on a normal vreg will try to put the whole bundle in the fixed register. The zero register can't be used that way, so as you've seen, breakage results. We actually had a bunch of discussion over in RA2 issues about this a year or so ago; it resulted in a new kind of operand, the "fixed nonallocatable" constraint: you want |
See e.g. this thread where Amanieu ran into these same problems with the zero register, and reported some nice speedups in his application when using the new constraint he added. |
Just wanted to say that this is an amazing coincidence since I was just about to write an issue on this very topic! I don't have any ideas on how to go about this, it was mostly a request for help. |
Yes sorry to be clear I'm not proposing changing any foundations or anything like that, I'm trying to understand the foundations myself and figure out how best to fit a solution into them. I'm not sure how to use |
The problem was that we wanted to return a value of the zero register for the lowering of a constant, but the fixed nonallocatable constraint only works for uses not defs. The solution we've used in the past is to make special instructions for interacting with nonallocatable registers (MovToPReg/MovFromPReg in the x64 backend for example), but those end up turning into the moves that Alex is trying to avoid in the first place. For argument |
Indeed, it works for uses and not defs -- which implies that we can't in isolation lower an Now of course it's a standard idiom on ISAs with a zero reg to use it as an arg; so what we need to do is to reflect that structure in the lowering as well, and handle this as part of the arg, when we lower the instruction that uses it. In other words, we would want to match (e.g.) |
One further thought: I think the clearest argument to me for why the above (do it as part of matching, not as a separate step) is necessary is exactly the case you're running into: returning |
Yeah that works, and it's already what riscv64 and aarch64 are doing I believe. They don't even have special instruction variants, the That all being said my hope is that there'd be an idea that would be relatively easy/non-invasive to integrate into regalloc2. If there isn't though then alas, but not much can be done about that. |
This commit is inspired by discussion on bytecodealliance#8695 which made me remember the discussion around bytecodealliance#7162 historically. In lieu of a deeper fix for the issue of "why can't `iconst 0` use `(zero_reg)`" it's still possible to add special-cases to rules throughout the backend so this commit does that for generating zero-value floats.
* riscv64: Special-case `f32const 0` and `f64const 0` This commit is inspired by discussion on #8695 which made me remember the discussion around #7162 historically. In lieu of a deeper fix for the issue of "why can't `iconst 0` use `(zero_reg)`" it's still possible to add special-cases to rules throughout the backend so this commit does that for generating zero-value floats. * Fix tests * Run all tests on CI prtest:full
This commit is an attempt to use the
zero
register more often in the riscv64 backend. Currently materializing aniconst 0
value moves thezero
register into a general purpose register, and then the general purpose register is used. This means that an extra register is used when it doesn't need to be, for example.Naively fixing this by returning
(zero_reg)
during constant emission does not work. One reason this doesn't work is that it introduces vcode aliases to the physical register 0, and duringOperandCollector
collection the physical register makes its way into normal constraints rather than being a fixed nonallocatable constraint. This is fixable by moving the "resolve alias" phase earlier, for example at the beginning ofreg_use
rather than lower down inadd_operand
.Even after applying such a fix, however, that solution still does not work. This produces situations such as when a 0 constant is returned from a function that the zero register is attempted to be placed into the first return register. These two physical register constraints naturally conflict and nothing is around to generate a move instruction. This happens within
OperandCollector
meaning it's not easy to generate more instructions at that time.The "fix", which is incomplete, that this PR implements is to introduce a pseudo-instruction which has a fixed register definition of the physical register zero. This instruction emits no code and is intended to provide a location for regalloc2 to split bundles by automatically inserting a move in the above situation. To get this working however it required removing an assertion in the
OperandCollector
and ends up violating an invariant of regalloc2 where a virtual register is assigned to a non-allocatable register. This ends up generating instructions which move general purpose registers into the zero register on riscv64, which is unlikely to be what regalloc intended.