Lack of vsetvli after function call for whole register move #114518

kito-cheng · 2024-11-01T08:08:19Z

Unfortunately, whole register move instructions depend on vtype*1, which means they will cause an illegal instruction exception if VILL=1. This is generally not a problem, as VILL is set to 0 after any valid vsetvli instruction, so it’s usually safe unless the user executes a whole vector register move very early in the program.

However, the situation changed after the Linux kernel applied a patch[2] that sets VILL=1 after any system call. So, if we try to execute a whole register move after a system call, it will cause an illegal instruction exception. This can be difficult to detect, as the system call may not be invoked immediately; it might be deeply nested in a call chain, such as within printf. Unfortunately, this change has already shipped with Linux kernel 6.5, which was released on August 28, 2023.

I'm not sure if it's reasonable to ask the Linux kernel maintainers to fix this by keeping VILL consistent across system calls.

An alternative approach is to address this issue on the toolchain side by requiring at least one valid vsetvli instruction before any whole register move. This might be an ugly workaround, but it’s probably the simplest way to resolve the issue. I also realized this might be a better solution since the psABI specifies that VTYPE is NOT preserved across function calls. This means we can’t guarantee that VILL is not 1 at the function entry, so placing a vsetvli instruction right after the function call may be necessary.

Testcase:

#include <riscv_vector.h>
void bar() __attribute__((riscv_vector_cc));

vint32m1_t foo(vint32m1_t a) {
    bar();  // We never know bar will call system call inside or not.
    return a;
}

Generated asm with clang -target riscv64-unknown-elf -S -O3 -march=rv64gcv:

...
        .type   foo,@function
        .variant_cc     foo
foo:                                    # @foo
# %bb.0:                                # %entry
        addi    sp, sp, -16
        sd      ra, 8(sp)                       # 8-byte Folded Spill
        vmv1r.v v24, v8
        call    bar
        vmv1r.v v8, v24
        ld      ra, 8(sp)                       # 8-byte Folded Reload
        addi    sp, sp, 16
        ret
...

And the compiler could emits code like below to fix this issue:

...
        .type   foo,@function
        .variant_cc     foo
foo:                                    # @foo
# %bb.0:                                # %entry
        addi    sp, sp, -16
        sd      ra, 8(sp)                       # 8-byte Folded Spill
        vsetivli x0, 0, e8, m1, ta, ma   # Need vsetvli to make VILL=0 here
        vmv1r.v v24, v8
        call    bar
        vsetivli x0, 0, e8, m1, ta, ma  # Need vsetvli to make VILL=0 here
        vmv1r.v v8, v24
        ld      ra, 8(sp)                       # 8-byte Folded Reload
        addi    sp, sp, 16
        ret
...

NOTE: We have hit this issue within our internal spec run.

*1 That change[1] is made AFTER 1.0...
[1] riscvarchive/riscv-v-spec@856fe5b
[2] torvalds/linux@9657e9b

The text was updated successfully, but these errors were encountered:

llvmbot · 2024-11-01T08:08:37Z

@llvm/issue-subscribers-backend-risc-v

Author: Kito Cheng (kito-cheng)

Unfortunately, whole register move instructions depend on `vtype`*1, which means they will cause an illegal instruction exception if VILL=1. This is generally not a problem, as VILL is set to 0 after any valid `vsetvli` instruction, so it’s usually safe unless the user executes a whole vector register move very early in the program.

However, the situation changed after the Linux kernel applied a patch[2] that sets VILL=1 after any system call. So, if we try to execute a whole register move after a system call, it will cause an illegal instruction exception. This can be difficult to detect, as the system call may not be invoked immediately; it might be deeply nested in a call chain, such as within printf. Unfortunately, this change has already shipped with Linux kernel 6.5, which was released on August 28, 2023.

I'm not sure if it's reasonable to ask the Linux kernel maintainers to fix this by keeping VILL consistent across system calls.

An alternative approach is to address this issue on the toolchain side by requiring at least one valid vsetvli instruction before any whole register move. This might be an ugly workaround, but it’s probably the simplest way to resolve the issue. I also realized this might be a better solution since the psABI specifies that VTYPE is NOT preserved across function calls. This means we can’t guarantee that VILL is not 1 at the function entry, so placing a vsetvli instruction right after the function call may be necessary.

Testcase:

#include &lt;riscv_vector.h&gt;
void bar() __attribute__((riscv_vector_cc));

vint32m1_t foo(vint32m1_t a) {
    bar();  // We never know bar will call system call inside or not.
    return a;
}

Generated asm with clang -target riscv64-unknown-elf -S -O3 -march=rv64gcv:

...
        .type   foo,@<!-- -->function
        .variant_cc     foo
foo:                                    # @<!-- -->foo
# %bb.0:                                # %entry
        addi    sp, sp, -16
        sd      ra, 8(sp)                       # 8-byte Folded Spill
        vmv1r.v v24, v8
        call    bar
        vmv1r.v v8, v24
        ld      ra, 8(sp)                       # 8-byte Folded Reload
        addi    sp, sp, 16
        ret
...

And the compiler could emits code like below to fix this issue:

...
        .type   foo,@<!-- -->function
        .variant_cc     foo
foo:                                    # @<!-- -->foo
# %bb.0:                                # %entry
        addi    sp, sp, -16
        sd      ra, 8(sp)                       # 8-byte Folded Spill
        vsetivli x0, 0, e8, m1, ta, ma   # Need vsetvli to make VILL=0 here
        vmv1r.v v24, v8
        call    bar
        vsetivli x0, 0, e8, m1, ta, ma  # Need vsetvli to make VILL=0 here
        vmv1r.v v8, v24
        ld      ra, 8(sp)                       # 8-byte Folded Reload
        addi    sp, sp, 16
        ret
...

NOTE: We have hit this issue within our internal spec run.

*1 That change[1] is made AFTER 1.0...
[1] riscvarchive/riscv-v-spec@856fe5b
[2] torvalds/linux@9657e9b

kito-cheng · 2024-11-01T08:16:31Z

cc: @topperc @BeMg @asb @preames @lukel97

Also cc some non LLVM folks since this is same situation for GCC @palmer-dabbelt @JeffreyALaw

The case for GCC, GCC will using stack rather than callee save reg in this case, so I use some inline asm trick to force GCC to use that:

#include <riscv_vector.h>
void bar() __attribute__((riscv_vector_cc));

vint32m1_t foo(vint32m1_t a, vint32m1_t b) {
    register vint32m1_t x asm("v24") = b;
    bar();
    asm ("#xx %0"::"vr"(x) );
    return x;
}

wangpc-pp · 2024-11-01T10:05:33Z

I may have missed something here, why would vmv<N>r.v depend on vtype? I didn't notice this before actually.
Here is a note from the SPEC:

Note: These instructions are intended to aid compilers to shuffle vector registers without needing to know or change vl or vtype.

I don't see the necessariness of depending on vtype... maybe I missunderstand the spec or the spec is just being vague here.

kito-cheng · 2024-11-01T11:53:46Z

I may have missed something here, why would vmv<N>r.v depend on vtype? I didn't notice this before actually. Here is a note from the SPEC:

Note: These instructions are intended to aid compilers to shuffle vector registers without needing to know or change vl or vtype.

I don't see the necessariness of depending on vtype... maybe I missunderstand the spec or the spec is just being vague here.

Commit log from riscvarchive/riscv-v-spec@856fe5b say: The normative text says that vmv<nr>r.v "operates as though EEW=SEW", meaning that it _does_ depend on vtype.

wangpc-pp · 2024-11-01T12:49:34Z

I may have missed something here, why would vmv<N>r.v depend on vtype? I didn't notice this before actually. Here is a note from the SPEC:

Note: These instructions are intended to aid compilers to shuffle vector registers without needing to know or change vl or vtype.

I don't see the necessariness of depending on vtype... maybe I missunderstand the spec or the spec is just being vague here.

Commit log from riscvarchive/riscv-v-spec@856fe5b say: The normative text says that vmv<nr>r.v "operates as though EEW=SEW", meaning that it _does_ depend on vtype.

Yeah, I saw that. I mean, is it really necessary from the perspective of semantics? I think it is a mistake and introduces unnecessary constraints.

kito-cheng · 2024-11-01T13:13:02Z

Yeah, I saw that. I mean, is it really necessary from the perspective of semantics? I think it is a mistake and introduces unnecessary constraints.

I kinda agree with you but I guess SiFive is not the only one implement that semantics...also that's kinda spec conformance implementation

wangpc-pp · 2024-11-01T15:23:42Z

Yeah, I saw that. I mean, is it really necessary from the perspective of semantics? I think it is a mistake and introduces unnecessary constraints.

I kinda agree with you but I guess SiFive is not the only one implement that semantics...also that's kinda spec conformance implementation

The change riscvarchive/riscv-v-spec@856fe5b was committed after the ratification of RVV 1.0, so it is not a mandatory request but a supplementary explanation (and I think it is not following the intuition).

I confirmed that Spacemit-X60 on K1 doesn't follow this:

bytedance@k1:~$ cat vmv.c
int main() {
        asm("li a0, 0x8000000000000000");
        asm("vsetvl a0, a0, a0");
        asm("vmv1r.v v0, v1"); // No SIGILL.
        asm("vadd.vv v0, v1, v2"); // SIGILL here.
        return 0;
}
bytedance@k1:~$ gcc -march=rv64gcv vmv.c -o vmv -g
bytedance@k1:~$ gdb vmv
GNU gdb (Ubuntu 14.0.50.20230907-0ubuntu1-bb1) 14.0.50.20230907-git
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "riscv64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from vmv...
(gdb) r
Starting program: /home/bytedance/vmv
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
main () at vmv.c:5
5               asm("vadd.vv v0, v1, v2");
(gdb) p /x $vtype
$1 = 0x8000000000000000

I haven't checked XuanTie C908 on K230 yet, but I 99.99% believe that it is the same as it is an "old" core.

preames · 2024-11-01T15:58:08Z

Summary of the change above matches my understanding, I'd previously written this up here: https://github.com/preames/public-notes/blob/master/riscv-spec-minutia.rst#id8

And yes, this change is extremely problematic. We can work around in SW, but ouch.

preames · 2024-11-01T16:47:44Z

Since this is getting public discussion, I put my notes on this topic in a public location. This was mostly written previously based off internal discussion, but has been minor updated to include new information discussed in this ticket. See https://github.com/preames/public-notes/blob/master/riscv/whole-register-move-abi.rst

wangpc-pp · 2024-11-01T18:21:23Z

Since this is getting public discussion, I put my notes on this topic in a public location. This was mostly written previously based off internal discussion, but has been minor updated to include new information discussed in this ticket. See https://github.com/preames/public-notes/blob/master/riscv/whole-register-move-abi.rst

Thanks for the summary! I think there should be option that we revert the change and reword that paragraph so that it clearly doesn't depend on vtype.

topperc · 2024-11-01T20:41:26Z

The change that added the EEW=SEW sentence to the whole register move section was added by Krste in October 2020 before ratification. It was done for hardware that rearranges data based on EEW. I'm on my phone so I can't easily dig up the commit right now.

wangpc-pp · 2024-11-02T04:58:56Z

The change that added the EEW=SEW sentence to the whole register move section was added by Krste in October 2020 before ratification. It was done for hardware that rearranges data based on EEW. I'm on my phone so I can't easily dig up the commit right now.

I digged the commit history.

For vill part:

The sentence of exceptions was added in riscvarchive/riscv-v-spec@4e72221.
And then removed whole register move in riscvarchive/riscv-v-spec@856fe5b.

For vmv<N>r.v part (there are too many related commits, I just paste important changes here):

They was introduced by riscvarchive/riscv-v-spec@7b02297. Note that It was ignoring the current settings of the vl and vtype register from the beginning.
A clarification that changed the description to operate as if SEW=8 and vl=VLMAX, regardless of current settings in vtype and vl: riscvarchive/riscv-v-spec@548232c
Some more clarifications in Clarify Whole register move characteristics - as specified 4.2. Mapping with LMUL > 1 riscvarchive/riscv-v-spec#378 and its related commits. It didn't change a lot.
Rearrange things were added in riscvarchive/riscv-v-spec@2144559 and then riscvarchive/riscv-v-spec@f1d349d made it EEW=SEW. From what I can tell, these sentences were just saying some implementations can treat vd==vs2 cases as HINTs.

My understanding is, it was clearly that vtype should be ignored since the beginning; then somehow the SEW=8 clarification started to make it vague (but I still think, people should interpret it as ignoring); the rearrange HINTs change made vagueness worse, it should be just a note for potential implementations, but somehow it led some people to think it may be a strong requirement.

IMO, there does exist some vagueness/contradiction in the spec, and please clarify it clearly/formally based on the semantics of whole register move.

wangpc-pp · 2024-11-04T03:21:24Z

Confirmed that C908 on K230 will also not trap on vmv<N>r.v:

[root@canaan ~ ]#./vmv
[  157.105988] vmv[171]: unhandled signal 4 code 0x1 at 0x000000000001054a in vmv[10000+69000]
[  157.114396] CPU: 0 PID: 171 Comm: vmv Not tainted 5.10.4+ #1
[  157.120060] epc: 000000000001054a ra : 00000000000105c6 sp : 0000003fffb28ba0
[  157.127199]  gp : 000000000007f518 tp : 0000000000085760 t0 : 0000000000000001
[  157.134423]  t1 : 000000000007ed28 t2 : 000000000000006a s0 : 0000003fffb28bb0
[  157.141647]  s1 : 0000000000000001 a0 : 0000000000000000 a1 : 0000003fffb28d58
[  157.148869]  a2 : 0000003fffb28d68 a3 : 0000000000000000 a4 : 0000003fffb28bd8
[  157.156092]  a5 : 0000000000010536 a6 : 000000000006afe0 a7 : 0000000000000001
[  157.163315]  s2 : 0000003fffb28d58 s3 : 0000000000000001 s4 : 0000003fffb28d68
[  157.170538]  s5 : 0000000000000001 s6 : 0000000000010536 s7 : 0000000000010258
[  157.177762]  s8 : 0000000000000000 s9 : 0000000000159fc0 s10: 0000000000156760
[  157.184984]  s11: 0000000000000001 t3 : 2f2f2f2f2f2f2f2f t4 : 000000000007d9e8
[  157.192207]  t5 : 0000000000084260 t6 : 0000000000079a30
[  157.197519] status: 8000000200004620 badaddr: 0000000002110057 cause: 0000000000000002
Illegal instruction

# objdump -d vmv >vmv.s
0000000000010536 <main>:
   10536:       1141                    add     sp,sp,-16
   10538:       e422                    sd      s0,8(sp)
   1053a:       0800                    add     s0,sp,16
   1053c:       fff0051b                addw    a0,zero,-1
   10540:       157e                    sll     a0,a0,0x3f
   10542:       80a57557                vsetvl  a0,a0,a0
   10546:       9e103057                vmv1r.v v0,v1
   1054a:       02110057                vadd.vv v0,v1,v2
   1054e:       4781                    li      a5,0
   10550:       853e                    mv      a0,a5
   10552:       6422                    ld      s0,8(sp)
   10554:       0141                    add     sp,sp,16
   10556:       8082                    ret

aswaterman · 2024-11-04T05:53:13Z

Much discussion about the ISA spec in this thread surrounds a particular non-normative note. Ultimately, non-normative text isn't relevant to this discussion, because, well, it isn't normative. The reason that text was changed post-ratification is that (a) it is valid to change non-normative text at any time, since by definition it doesn't affect the normative content, and (b) in this case, it contradicted the normative text, and the normative text always wins.

As Craig points out, the ratified normative text in the spec says that the instructions do depend on SEW (for e.g. vstart), hence they depend on vtype. This definition was in place long before to ratification.

Given the opportunity, I'm sure we'd revisit this ISA choice, but it is what it is. The fact that multiple implementations don't trap these instructions doesn't change the story. The spec says the behavior in this case is reserved. Raising an illegal-instruction trap is valid behavior. Not raising an illegal-instruction trap is also valid behavior.

wangpc-pp · 2024-11-04T06:10:51Z

The spec says the behavior in this case is reserved. Raising an illegal-instruction trap is valid behavior. Not raising an illegal-instruction trap is also valid behavior.

Please be specific here which part of the spec says this?

aswaterman · 2024-11-04T06:29:19Z

The Whole-Register Moves section says that the instructions' behavior depends on SEW. The Vector Type Illegal section says that an attempt to execute an instruction that depends on vtype when vill=1 raises an illegal-instruction exception.

(I had written earlier that the behavior is reserved, but my recollection was wrong; the spec makes the stronger statement that an exception must be raised.)

wangpc-pp · 2024-11-04T06:32:45Z

The Whole-Register Moves section says that the instructions' behavior depends on SEW.

Even for raified spec, it says:

The instructions operate as if EEW=SEW, EMUL = NREG, effective length evl= EMUL * VLEN/SEW.

I'm not a native English speaker, is as if a strong requirement?

aswaterman · 2024-11-04T06:51:06Z

”As if” isn’t a phrase used to suggest optionality; it’s a phrase used to establish an equivalence.

In a similar vein to 856fe5b, this note states that compilers don't need to "know or change vtype" for whole vector register moves. There's some truth to this in that compilers can largely ignore SEW and LMUL, but ultimately they do depend on vtype and the current wording might be misleading. A compiler may in fact need to change vtype to clear vill, see: llvm/llvm-project#114518

…1710) In a similar vein to 856fe5b, this note states that compilers don't need to "know or change vtype" for whole vector register moves. There's some truth to this in that compilers can largely ignore SEW and LMUL, but ultimately they do depend on vtype and the current wording might be misleading. A compiler may in fact need to change vtype to clear vill, see: llvm/llvm-project#114518

topperc · 2024-11-04T23:08:36Z

qemu believes whole register move depends on vsew and will generate SIGILL for if vtype.vill is set.

wangpc-pp · 2024-11-05T03:46:11Z

qemu believes whole register move depends on vsew and will generate SIGILL for if vtype.vill is set.

This is not an evidence to support this mistake, I can also say that GEM5 won't trap on this. Also, this is a "chicken or egg" problem.

AFAIK, several cores have to change their implementations because of this change recently. At least, XiangShan has already been misled because of this apparent mistake: OpenXiangShan/NEMU#511. The impact on software would be much bigger.

Or, let me ask this in another way: What benefit would we have if whole register move depends on vtype? Would it make RISC-V much more competitive than X86/AArch64? Is this a state-of-the-art architecture innovation? How will you handle these products which are already on the market?

topperc · 2024-11-05T05:47:20Z

Or, let me ask this in another way: What benefit would we have if whole register move depends on vtype?

The ratified spec already says that it depends on vtype. Non-normative text was not updated when the change was made. That was a mistake, but the normative text was updated. We cannot change the spec now without defining a new extension.

How will you handle these products which are already on the market?

Which products? SiFive cores implement the dependence on vtype and are used by customers. Just because they aren't available for retail doesn't mean they can be ignored.

topperc · 2024-11-05T06:02:46Z

This is not an evidence to support this mistake, I can also say that GEM5 won't trap on this. Also, this is a "chicken or egg" problem.

How does GEM5 or any of the CPUs that don't trap handle vstart!=0 for these instructions?

…on vill This is a compromise of llvm#114518. We may also add a new extension `Zvnotrapvmvnr` or whatever that doesn't add new instructions but these instructions won't trap on vill to fix this mistake. Not all of us want to pay for the mistake.

wangpc-pp · 2024-11-05T07:29:24Z

Or, let me ask this in another way: What benefit would we have if whole register move depends on vtype?

The ratified spec already says that it depends on vtype. Non-normative text was not updated when the change was made. That was a mistake, but the normative text was updated. We cannot change the spec now without defining a new extension.

How will you handle these products which are already on the market?

Which products? SiFive cores implement the dependence on vtype and are used by customers. Just because they aren't available for retail doesn't mean they can be ignored.

Sorry I don't mean to make this discussion argumentative, my apologies.
So you do agree there does exist a way to define a new extension to fix this mistake, right? Can we start it right now? WDYT? cc @aswaterman

wangpc-pp · 2024-11-05T07:35:52Z

This is not an evidence to support this mistake, I can also say that GEM5 won't trap on this. Also, this is a "chicken or egg" problem.

How does GEM5 or any of the CPUs that don't trap handle vstart!=0 for these instructions?

Well, I think it is because we can make sure vstart won't be non-zero. Of cource, this may not match the spec.
The formatted generated execution for current gem5 implementation:

Fault Vmv1r_vMicro::execute(ExecContext *xc,
                            trace::InstRecord *traceData) const {
  // TODO: Check register alignment.
  // TODO: If vd is equal to vs2 the instruction is an architectural NOP.
  MISA misa = xc->readMiscReg(MISCREG_ISA);
  STATUS status = xc->readMiscReg(MISCREG_STATUS);

  if (!misa.rvv || status.vs == VPUStatus::OFF) {
    return std::make_shared<IllegalInstFault>("RVV is disabled or VPU is off",
                                              machInst);
  }

  status.vs = VPUStatus::DIRTY;
  xc->setMiscReg(MISCREG_STATUS, status);

  /* Vars for Vs2*/ /* End vars for Vs2 */
  uint64_t VlenbBits = 0;
  RiscvISAInst::PCState __parserAutoPCState;
  set(__parserAutoPCState, xc->pcState());
  auto &tmp_d0 =
      *(RiscvISAInst::VecRegContainer *)xc->getWritableRegOperand(this, 0);
  auto Vd = tmp_d0.as<uint64_t>();
  RiscvISAInst::VecRegContainer tmp_s0;
  xc->getRegOperand(this, 0, &tmp_s0);
  auto Vs2 = tmp_s0.as<uint64_t>();
  VlenbBits = __parserAutoPCState.vlenb();
  uint32_t vlen = VlenbBits * 8;
  for (size_t i = 0; i < (vlen / 64); i++) {
    Vd[i] = Vs2[i];
  }

  if (traceData) {
    traceData->setData(vecRegClass, &tmp_d0);
  };
  return NoFault;
}

luismarques · 2024-11-07T18:08:30Z

Since this is getting public discussion, I put my notes on this topic in a public location. This was mostly written previously based off internal discussion, but has been minor updated to include new information discussed in this ticket. See https://github.com/preames/public-notes/blob/master/riscv/whole-register-move-abi.rst

Regarding "Option 1 - Change the ABI", have you/we considered the possibility of requiring the ABI to preserve VILL=0 except for syscalls? I.e., effectively mandate that after a syscall you need to do a vsetivli to restore vill to zero (to not change the kernel or, even if it's changed, be compatible with the already released Linux kernel versions 6.5+).

dzaima · 2024-11-07T18:26:18Z

requiring the ABI to preserve VILL=0 except for syscalls

That would mean that existing software/libraries doing direct syscalls (maybe incl. glibc?) would cease to follow the ABI; not much worse than existing kernels doing so I'd think.

And handling that in userspace would be very messy, requiring conditionally running the vsetivli after a syscall if RVV is present (which leads to a mild mess, considering that it forces the first syscall to be a test for RVV support, which'd then have to immediately initialize the branch condition for running the vsetivli for itself and future syscalls).

Implementations are all over the place on whole register moves trapping under VILL, so let's just forbid that case in the psABI. Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> --- We've started seeing a bunch of fallout from the "whole register moves depend on TYPE" ISA change, and there's discussion all over the place: * There's a GCC bug <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117544> * Also an LLVM bug <llvm/llvm-project#114518> * QEMU changed behavior in 4eff52cd46 ("target/riscv: Add vill check for whole vector register move instructions") * Philip has a writeup on some of the options in his notes <https://github.com/preames/public-notes/blob/master/riscv/whole-register-move-abi.rst>. * This has also come up in most of the meetings I'v been in this week. It seems like there's no general consensus on what we de here -- some discussions say we're going to change the psABI (and presumably then the uABI), some say we're not. I don't personally care a ton if we make the ABI change or not, we just need to decide so we can figure out where the bugs are -- there's going to be fallout either way, but we can't really get things fixed until we decide one way or the other. As far as I can tell both paths are valid: * If we make these ABI changes then most code that predates the ISA change continues to function correctly after the ISA change. We just need to track down anything that sets VILL and fix it, but we should be able to do that incrementeally (maybe even just with a trap handler). Right now I think that's just the kernel, but I'm not 100% sure there. Looks like the first round of HW doesn't trap, though, so we should be safe for a bit. * If we don't make these ABI changes then we'll have to fix the compilers and go rebuild everything to match the ISA change. I think the GCC change should be pretty straight-forward, I don't know about the LLVM side of things. I'm not sure what we'd do with the kernel here: we could say the VILL traps are just latent userspace bugs, or we could say we're breaking userspace -- kind of a grey area, so probably more of an LKML question. I don't think one option is clearly simpler than the other, it's just a question of where we push the bugs.

Implementations are all over the place on whole register moves trapping under VILL, so let's just forbid that case in the psABI. Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> --- We've started seeing a bunch of fallout from the "whole register moves depend on TYPE" ISA change, and there's discussion all over the place: * There's a GCC bug <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117544> * Also an LLVM bug <llvm/llvm-project#114518> * QEMU changed behavior in 4eff52cd46 ("target/riscv: Add vill check for whole vector register move instructions") * Philip has a writeup on some of the options in his notes <https://github.com/preames/public-notes/blob/master/riscv/whole-register-move-abi.rst>. * This has also come up in most of the meetings I'v been in this week. It seems like there's no general consensus on what we do here -- some discussions say we're going to change the psABI (and presumably then the uABI), some say we're not. I don't personally care a ton if we make the ABI change or not, we just need to decide so we can figure out where the bugs are -- there's going to be fallout either way, but we can't really get things fixed until we decide one way or the other. As far as I can tell both paths are valid: * If we make these ABI changes then most code that predates the ISA change continues to function correctly after the ISA change. We just need to track down anything that sets VILL and fix it, but we should be able to do that incrementally (maybe even just with a trap handler). Right now I think that's just the kernel, but I'm not 100% sure there. Looks like the first round of HW doesn't trap, though, so we should be safe for a bit. * If we don't make these ABI changes then we'll have to fix the compilers and go rebuild everything to match the ISA change. I think the GCC change should be pretty straight-forward, I don't know about the LLVM side of things. I'm not sure what we'd do with the kernel here: we could say the VILL traps are just latent userspace bugs, or we could say we're breaking userspace -- kind of a grey area, so probably more of an LKML question. I don't think one option is clearly simpler than the other, it's just a question of where we push the bugs.

This is an alternative to llvm#117866 that works by demanding a valid vtype instead using a separate pass. The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block. To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible. Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else. This is one part of a fix for llvm#114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.

…on (#118283) This is an alternative to #117866 that works by demanding a valid vtype instead of using a separate pass. The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block. To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible. Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else. This is one part of a fix for #114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.

…on (llvm#118283) This is an alternative to llvm#117866 that works by demanding a valid vtype instead of using a separate pass. The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block. To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible. Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else. This is one part of a fix for llvm#114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.

kito-cheng added the backend:RISC-V label Nov 1, 2024

huxuan0307 mentioned this issue Nov 4, 2024

Wonder why vmvnr.v depends on vtype riscv/riscv-isa-manual#1709

Closed

lukel97 mentioned this issue Nov 4, 2024

Clarify note that compilers may need to change vtype for vmv<nr>r.v riscv/riscv-isa-manual#1710

Merged

wangpc-pp mentioned this issue Nov 5, 2024

[RISCV] Add a feature to indicate the whole register move won't trap on vill #114942

Closed

dzaima mentioned this issue Nov 7, 2024

Reason for vstart≥vl requiring undisturbed tail elements even with ta vtype riscv/riscv-isa-manual#1715

Open

palmer-dabbelt mentioned this issue Nov 21, 2024

Disallow VTYPE=VILL riscv-non-isa/riscv-elf-psabi-doc#454

Open

This was referenced Nov 27, 2024

[RISCV] enable VTYPE before whole RVVReg move #117866

Closed

[RISCV] Ensure the valid vtype during copyPhysReg #118252

Closed

lukel97 mentioned this issue Dec 2, 2024

[RISCV] Clear vill for whole vector register moves in vsetvli insertion #118283

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lack of vsetvli after function call for whole register move #114518

Lack of vsetvli after function call for whole register move #114518

kito-cheng commented Nov 1, 2024

llvmbot commented Nov 1, 2024

kito-cheng commented Nov 1, 2024 •

edited

Loading

wangpc-pp commented Nov 1, 2024 •

edited

Loading

kito-cheng commented Nov 1, 2024

wangpc-pp commented Nov 1, 2024

kito-cheng commented Nov 1, 2024

wangpc-pp commented Nov 1, 2024 •

edited

Loading

preames commented Nov 1, 2024

preames commented Nov 1, 2024

wangpc-pp commented Nov 1, 2024

topperc commented Nov 1, 2024

wangpc-pp commented Nov 2, 2024

wangpc-pp commented Nov 4, 2024

aswaterman commented Nov 4, 2024 •

edited

Loading

wangpc-pp commented Nov 4, 2024

aswaterman commented Nov 4, 2024

wangpc-pp commented Nov 4, 2024

aswaterman commented Nov 4, 2024

topperc commented Nov 4, 2024

wangpc-pp commented Nov 5, 2024

topperc commented Nov 5, 2024 •

edited

Loading

topperc commented Nov 5, 2024

wangpc-pp commented Nov 5, 2024 •

edited

Loading

wangpc-pp commented Nov 5, 2024 •

edited

Loading

luismarques commented Nov 7, 2024

dzaima commented Nov 7, 2024 •

edited

Loading

Lack of vsetvli after function call for whole register move #114518

Lack of vsetvli after function call for whole register move #114518

Comments

kito-cheng commented Nov 1, 2024

llvmbot commented Nov 1, 2024

kito-cheng commented Nov 1, 2024 • edited Loading

wangpc-pp commented Nov 1, 2024 • edited Loading

kito-cheng commented Nov 1, 2024

wangpc-pp commented Nov 1, 2024

kito-cheng commented Nov 1, 2024

wangpc-pp commented Nov 1, 2024 • edited Loading

preames commented Nov 1, 2024

preames commented Nov 1, 2024

wangpc-pp commented Nov 1, 2024

topperc commented Nov 1, 2024

wangpc-pp commented Nov 2, 2024

wangpc-pp commented Nov 4, 2024

aswaterman commented Nov 4, 2024 • edited Loading

wangpc-pp commented Nov 4, 2024

aswaterman commented Nov 4, 2024

wangpc-pp commented Nov 4, 2024

aswaterman commented Nov 4, 2024

topperc commented Nov 4, 2024

wangpc-pp commented Nov 5, 2024

topperc commented Nov 5, 2024 • edited Loading

topperc commented Nov 5, 2024

wangpc-pp commented Nov 5, 2024 • edited Loading

wangpc-pp commented Nov 5, 2024 • edited Loading

luismarques commented Nov 7, 2024

dzaima commented Nov 7, 2024 • edited Loading

kito-cheng commented Nov 1, 2024 •

edited

Loading

wangpc-pp commented Nov 1, 2024 •

edited

Loading

wangpc-pp commented Nov 1, 2024 •

edited

Loading

aswaterman commented Nov 4, 2024 •

edited

Loading

topperc commented Nov 5, 2024 •

edited

Loading

wangpc-pp commented Nov 5, 2024 •

edited

Loading

wangpc-pp commented Nov 5, 2024 •

edited

Loading

dzaima commented Nov 7, 2024 •

edited

Loading