-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lack of vsetvli after function call for whole register move #114518
Comments
@llvm/issue-subscribers-backend-risc-v Author: Kito Cheng (kito-cheng)
Unfortunately, whole register move instructions depend on `vtype`*1, which means they will cause an illegal instruction exception if VILL=1. This is generally not a problem, as VILL is set to 0 after any valid `vsetvli` instruction, so it’s usually safe unless the user executes a whole vector register move very early in the program.
However, the situation changed after the Linux kernel applied a patch[2] that sets VILL=1 after any system call. So, if we try to execute a whole register move after a system call, it will cause an illegal instruction exception. This can be difficult to detect, as the system call may not be invoked immediately; it might be deeply nested in a call chain, such as within I'm not sure if it's reasonable to ask the Linux kernel maintainers to fix this by keeping VILL consistent across system calls. An alternative approach is to address this issue on the toolchain side by requiring at least one valid Testcase: #include <riscv_vector.h>
void bar() __attribute__((riscv_vector_cc));
vint32m1_t foo(vint32m1_t a) {
bar(); // We never know bar will call system call inside or not.
return a;
} Generated asm with ...
.type foo,@<!-- -->function
.variant_cc foo
foo: # @<!-- -->foo
# %bb.0: # %entry
addi sp, sp, -16
sd ra, 8(sp) # 8-byte Folded Spill
vmv1r.v v24, v8
call bar
vmv1r.v v8, v24
ld ra, 8(sp) # 8-byte Folded Reload
addi sp, sp, 16
ret
... And the compiler could emits code like below to fix this issue: ...
.type foo,@<!-- -->function
.variant_cc foo
foo: # @<!-- -->foo
# %bb.0: # %entry
addi sp, sp, -16
sd ra, 8(sp) # 8-byte Folded Spill
vsetivli x0, 0, e8, m1, ta, ma # Need vsetvli to make VILL=0 here
vmv1r.v v24, v8
call bar
vsetivli x0, 0, e8, m1, ta, ma # Need vsetvli to make VILL=0 here
vmv1r.v v8, v24
ld ra, 8(sp) # 8-byte Folded Reload
addi sp, sp, 16
ret
... NOTE: We have hit this issue within our internal spec run.
|
cc: @topperc @BeMg @asb @preames @lukel97 Also cc some non LLVM folks since this is same situation for GCC @palmer-dabbelt @JeffreyALaw The case for GCC, GCC will using stack rather than callee save reg in this case, so I use some inline asm trick to force GCC to use that: #include <riscv_vector.h>
void bar() __attribute__((riscv_vector_cc));
vint32m1_t foo(vint32m1_t a, vint32m1_t b) {
register vint32m1_t x asm("v24") = b;
bar();
asm ("#xx %0"::"vr"(x) );
return x;
} |
I may have missed something here, why would
I don't see the necessariness of depending on vtype... maybe I missunderstand the spec or the spec is just being vague here. |
Commit log from riscvarchive/riscv-v-spec@856fe5b say: |
Yeah, I saw that. I mean, is it really necessary from the perspective of semantics? I think it is a mistake and introduces unnecessary constraints. |
I kinda agree with you but I guess SiFive is not the only one implement that semantics...also that's kinda spec conformance implementation |
The change riscvarchive/riscv-v-spec@856fe5b was committed after the ratification of RVV 1.0, so it is not a mandatory request but a supplementary explanation (and I think it is not following the intuition). I confirmed that Spacemit-X60 on K1 doesn't follow this:
I haven't checked XuanTie C908 on K230 yet, but I 99.99% believe that it is the same as it is an "old" core. |
Summary of the change above matches my understanding, I'd previously written this up here: https://github.com/preames/public-notes/blob/master/riscv-spec-minutia.rst#id8 And yes, this change is extremely problematic. We can work around in SW, but ouch. |
Since this is getting public discussion, I put my notes on this topic in a public location. This was mostly written previously based off internal discussion, but has been minor updated to include new information discussed in this ticket. See https://github.com/preames/public-notes/blob/master/riscv/whole-register-move-abi.rst |
Thanks for the summary! I think there should be option that we revert the change and reword that paragraph so that it clearly doesn't depend on vtype. |
The change that added the EEW=SEW sentence to the whole register move section was added by Krste in October 2020 before ratification. It was done for hardware that rearranges data based on EEW. I'm on my phone so I can't easily dig up the commit right now. |
I digged the commit history. For
For
My understanding is, it was clearly that IMO, there does exist some vagueness/contradiction in the spec, and please clarify it clearly/formally based on the semantics of whole register move. |
Confirmed that C908 on K230 will also not trap on
|
Much discussion about the ISA spec in this thread surrounds a particular non-normative note. Ultimately, non-normative text isn't relevant to this discussion, because, well, it isn't normative. The reason that text was changed post-ratification is that (a) it is valid to change non-normative text at any time, since by definition it doesn't affect the normative content, and (b) in this case, it contradicted the normative text, and the normative text always wins. As Craig points out, the ratified normative text in the spec says that the instructions do depend on SEW (for e.g. vstart), hence they depend on vtype. This definition was in place long before to ratification. Given the opportunity, I'm sure we'd revisit this ISA choice, but it is what it is. The fact that multiple implementations don't trap these instructions doesn't change the story. The spec says the behavior in this case is reserved. Raising an illegal-instruction trap is valid behavior. Not raising an illegal-instruction trap is also valid behavior. |
Please be specific here which part of the spec says this? |
The Whole-Register Moves section says that the instructions' behavior depends on SEW. The Vector Type Illegal section says that an attempt to execute an instruction that depends on vtype when vill=1 raises an illegal-instruction exception. (I had written earlier that the behavior is reserved, but my recollection was wrong; the spec makes the stronger statement that an exception must be raised.) |
Even for raified spec, it says:
I'm not a native English speaker, is |
”As if” isn’t a phrase used to suggest optionality; it’s a phrase used to establish an equivalence. |
In a similar vein to 856fe5b, this note states that compilers don't need to "know or change vtype" for whole vector register moves. There's some truth to this in that compilers can largely ignore SEW and LMUL, but ultimately they do depend on vtype and the current wording might be misleading. A compiler may in fact need to change vtype to clear vill, see: llvm/llvm-project#114518
…1710) In a similar vein to 856fe5b, this note states that compilers don't need to "know or change vtype" for whole vector register moves. There's some truth to this in that compilers can largely ignore SEW and LMUL, but ultimately they do depend on vtype and the current wording might be misleading. A compiler may in fact need to change vtype to clear vill, see: llvm/llvm-project#114518
qemu believes whole register move depends on vsew and will generate SIGILL for if vtype.vill is set. |
This is not an evidence to support this mistake, I can also say that GEM5 won't trap on this. Also, this is a "chicken or egg" problem. AFAIK, several cores have to change their implementations because of this change recently. At least, XiangShan has already been misled because of this apparent mistake: OpenXiangShan/NEMU#511. The impact on software would be much bigger. Or, let me ask this in another way: What benefit would we have if whole register move depends on |
The ratified spec already says that it depends on vtype. Non-normative text was not updated when the change was made. That was a mistake, but the normative text was updated. We cannot change the spec now without defining a new extension.
Which products? SiFive cores implement the dependence on vtype and are used by customers. Just because they aren't available for retail doesn't mean they can be ignored. |
How does GEM5 or any of the CPUs that don't trap handle vstart!=0 for these instructions? |
…on vill This is a compromise of llvm#114518. We may also add a new extension `Zvnotrapvmvnr` or whatever that doesn't add new instructions but these instructions won't trap on vill to fix this mistake. Not all of us want to pay for the mistake.
Sorry I don't mean to make this discussion argumentative, my apologies. |
Well, I think it is because we can make sure vstart won't be non-zero. Of cource, this may not match the spec. Fault Vmv1r_vMicro::execute(ExecContext *xc,
trace::InstRecord *traceData) const {
// TODO: Check register alignment.
// TODO: If vd is equal to vs2 the instruction is an architectural NOP.
MISA misa = xc->readMiscReg(MISCREG_ISA);
STATUS status = xc->readMiscReg(MISCREG_STATUS);
if (!misa.rvv || status.vs == VPUStatus::OFF) {
return std::make_shared<IllegalInstFault>("RVV is disabled or VPU is off",
machInst);
}
status.vs = VPUStatus::DIRTY;
xc->setMiscReg(MISCREG_STATUS, status);
/* Vars for Vs2*/ /* End vars for Vs2 */
uint64_t VlenbBits = 0;
RiscvISAInst::PCState __parserAutoPCState;
set(__parserAutoPCState, xc->pcState());
auto &tmp_d0 =
*(RiscvISAInst::VecRegContainer *)xc->getWritableRegOperand(this, 0);
auto Vd = tmp_d0.as<uint64_t>();
RiscvISAInst::VecRegContainer tmp_s0;
xc->getRegOperand(this, 0, &tmp_s0);
auto Vs2 = tmp_s0.as<uint64_t>();
VlenbBits = __parserAutoPCState.vlenb();
uint32_t vlen = VlenbBits * 8;
for (size_t i = 0; i < (vlen / 64); i++) {
Vd[i] = Vs2[i];
}
if (traceData) {
traceData->setData(vecRegClass, &tmp_d0);
};
return NoFault;
} |
Regarding "Option 1 - Change the ABI", have you/we considered the possibility of requiring the ABI to preserve VILL=0 except for syscalls? I.e., effectively mandate that after a syscall you need to do a |
That would mean that existing software/libraries doing direct syscalls (maybe incl. glibc?) would cease to follow the ABI; not much worse than existing kernels doing so I'd think. And handling that in userspace would be very messy, requiring conditionally running the |
Implementations are all over the place on whole register moves trapping under VILL, so let's just forbid that case in the psABI. Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> --- We've started seeing a bunch of fallout from the "whole register moves depend on TYPE" ISA change, and there's discussion all over the place: * There's a GCC bug <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117544> * Also an LLVM bug <llvm/llvm-project#114518> * QEMU changed behavior in 4eff52cd46 ("target/riscv: Add vill check for whole vector register move instructions") * Philip has a writeup on some of the options in his notes <https://github.com/preames/public-notes/blob/master/riscv/whole-register-move-abi.rst>. * This has also come up in most of the meetings I'v been in this week. It seems like there's no general consensus on what we de here -- some discussions say we're going to change the psABI (and presumably then the uABI), some say we're not. I don't personally care a ton if we make the ABI change or not, we just need to decide so we can figure out where the bugs are -- there's going to be fallout either way, but we can't really get things fixed until we decide one way or the other. As far as I can tell both paths are valid: * If we make these ABI changes then most code that predates the ISA change continues to function correctly after the ISA change. We just need to track down anything that sets VILL and fix it, but we should be able to do that incrementeally (maybe even just with a trap handler). Right now I think that's just the kernel, but I'm not 100% sure there. Looks like the first round of HW doesn't trap, though, so we should be safe for a bit. * If we don't make these ABI changes then we'll have to fix the compilers and go rebuild everything to match the ISA change. I think the GCC change should be pretty straight-forward, I don't know about the LLVM side of things. I'm not sure what we'd do with the kernel here: we could say the VILL traps are just latent userspace bugs, or we could say we're breaking userspace -- kind of a grey area, so probably more of an LKML question. I don't think one option is clearly simpler than the other, it's just a question of where we push the bugs.
Implementations are all over the place on whole register moves trapping under VILL, so let's just forbid that case in the psABI. Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> --- We've started seeing a bunch of fallout from the "whole register moves depend on TYPE" ISA change, and there's discussion all over the place: * There's a GCC bug <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117544> * Also an LLVM bug <llvm/llvm-project#114518> * QEMU changed behavior in 4eff52cd46 ("target/riscv: Add vill check for whole vector register move instructions") * Philip has a writeup on some of the options in his notes <https://github.com/preames/public-notes/blob/master/riscv/whole-register-move-abi.rst>. * This has also come up in most of the meetings I'v been in this week. It seems like there's no general consensus on what we do here -- some discussions say we're going to change the psABI (and presumably then the uABI), some say we're not. I don't personally care a ton if we make the ABI change or not, we just need to decide so we can figure out where the bugs are -- there's going to be fallout either way, but we can't really get things fixed until we decide one way or the other. As far as I can tell both paths are valid: * If we make these ABI changes then most code that predates the ISA change continues to function correctly after the ISA change. We just need to track down anything that sets VILL and fix it, but we should be able to do that incrementally (maybe even just with a trap handler). Right now I think that's just the kernel, but I'm not 100% sure there. Looks like the first round of HW doesn't trap, though, so we should be safe for a bit. * If we don't make these ABI changes then we'll have to fix the compilers and go rebuild everything to match the ISA change. I think the GCC change should be pretty straight-forward, I don't know about the LLVM side of things. I'm not sure what we'd do with the kernel here: we could say the VILL traps are just latent userspace bugs, or we could say we're breaking userspace -- kind of a grey area, so probably more of an LKML question. I don't think one option is clearly simpler than the other, it's just a question of where we push the bugs.
Implementations are all over the place on whole register moves trapping under VILL, so let's just forbid that case in the psABI. Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> --- We've started seeing a bunch of fallout from the "whole register moves depend on TYPE" ISA change, and there's discussion all over the place: * There's a GCC bug <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117544> * Also an LLVM bug <llvm/llvm-project#114518> * QEMU changed behavior in 4eff52cd46 ("target/riscv: Add vill check for whole vector register move instructions") * Philip has a writeup on some of the options in his notes <https://github.com/preames/public-notes/blob/master/riscv/whole-register-move-abi.rst>. * This has also come up in most of the meetings I'v been in this week. It seems like there's no general consensus on what we do here -- some discussions say we're going to change the psABI (and presumably then the uABI), some say we're not. I don't personally care a ton if we make the ABI change or not, we just need to decide so we can figure out where the bugs are -- there's going to be fallout either way, but we can't really get things fixed until we decide one way or the other. As far as I can tell both paths are valid: * If we make these ABI changes then most code that predates the ISA change continues to function correctly after the ISA change. We just need to track down anything that sets VILL and fix it, but we should be able to do that incrementally (maybe even just with a trap handler). Right now I think that's just the kernel, but I'm not 100% sure there. Looks like the first round of HW doesn't trap, though, so we should be safe for a bit. * If we don't make these ABI changes then we'll have to fix the compilers and go rebuild everything to match the ISA change. I think the GCC change should be pretty straight-forward, I don't know about the LLVM side of things. I'm not sure what we'd do with the kernel here: we could say the VILL traps are just latent userspace bugs, or we could say we're breaking userspace -- kind of a grey area, so probably more of an LKML question. I don't think one option is clearly simpler than the other, it's just a question of where we push the bugs.
This is an alternative to llvm#117866 that works by demanding a valid vtype instead using a separate pass. The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block. To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible. Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else. This is one part of a fix for llvm#114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.
This is an alternative to llvm#117866 that works by demanding a valid vtype instead using a separate pass. The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block. To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible. Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else. This is one part of a fix for llvm#114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.
This is an alternative to llvm#117866 that works by demanding a valid vtype instead using a separate pass. The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block. To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible. Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else. This is one part of a fix for llvm#114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.
This is an alternative to llvm#117866 that works by demanding a valid vtype instead using a separate pass. The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block. To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible. Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else. This is one part of a fix for llvm#114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.
This is an alternative to llvm#117866 that works by demanding a valid vtype instead using a separate pass. The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block. To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible. Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else. This is one part of a fix for llvm#114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.
…on (#118283) This is an alternative to #117866 that works by demanding a valid vtype instead of using a separate pass. The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block. To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible. Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else. This is one part of a fix for #114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.
…on (llvm#118283) This is an alternative to llvm#117866 that works by demanding a valid vtype instead of using a separate pass. The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block. To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible. Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else. This is one part of a fix for llvm#114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.
Unfortunately, whole register move instructions depend on
vtype
*1, which means they will cause an illegal instruction exception if VILL=1. This is generally not a problem, as VILL is set to 0 after any validvsetvli
instruction, so it’s usually safe unless the user executes a whole vector register move very early in the program.However, the situation changed after the Linux kernel applied a patch[2] that sets VILL=1 after any system call. So, if we try to execute a whole register move after a system call, it will cause an illegal instruction exception. This can be difficult to detect, as the system call may not be invoked immediately; it might be deeply nested in a call chain, such as within
printf
. Unfortunately, this change has already shipped with Linux kernel 6.5, which was released on August 28, 2023.I'm not sure if it's reasonable to ask the Linux kernel maintainers to fix this by keeping VILL consistent across system calls.
An alternative approach is to address this issue on the toolchain side by requiring at least one valid
vsetvli
instruction before any whole register move. This might be an ugly workaround, but it’s probably the simplest way to resolve the issue. I also realized this might be a better solution since the psABI specifies thatVTYPE
is NOT preserved across function calls. This means we can’t guarantee that VILL is not 1 at the function entry, so placing avsetvli
instruction right after the function call may be necessary.Testcase:
Generated asm with
clang -target riscv64-unknown-elf -S -O3 -march=rv64gcv
:And the compiler could emits code like below to fix this issue:
NOTE: We have hit this issue within our internal spec run.
*1
That change[1] is made AFTER 1.0...[1] riscvarchive/riscv-v-spec@856fe5b
[2] torvalds/linux@9657e9b
The text was updated successfully, but these errors were encountered: