Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of vsetvli after function call for whole register move #114518

Open
kito-cheng opened this issue Nov 1, 2024 · 26 comments
Open

Lack of vsetvli after function call for whole register move #114518

kito-cheng opened this issue Nov 1, 2024 · 26 comments

Comments

@kito-cheng
Copy link
Member

Unfortunately, whole register move instructions depend on vtype*1, which means they will cause an illegal instruction exception if VILL=1. This is generally not a problem, as VILL is set to 0 after any valid vsetvli instruction, so it’s usually safe unless the user executes a whole vector register move very early in the program.

However, the situation changed after the Linux kernel applied a patch[2] that sets VILL=1 after any system call. So, if we try to execute a whole register move after a system call, it will cause an illegal instruction exception. This can be difficult to detect, as the system call may not be invoked immediately; it might be deeply nested in a call chain, such as within printf. Unfortunately, this change has already shipped with Linux kernel 6.5, which was released on August 28, 2023.

I'm not sure if it's reasonable to ask the Linux kernel maintainers to fix this by keeping VILL consistent across system calls.

An alternative approach is to address this issue on the toolchain side by requiring at least one valid vsetvli instruction before any whole register move. This might be an ugly workaround, but it’s probably the simplest way to resolve the issue. I also realized this might be a better solution since the psABI specifies that VTYPE is NOT preserved across function calls. This means we can’t guarantee that VILL is not 1 at the function entry, so placing a vsetvli instruction right after the function call may be necessary.

Testcase:

#include <riscv_vector.h>
void bar() __attribute__((riscv_vector_cc));

vint32m1_t foo(vint32m1_t a) {
    bar();  // We never know bar will call system call inside or not.
    return a;
}

Generated asm with clang -target riscv64-unknown-elf -S -O3 -march=rv64gcv:

...
        .type   foo,@function
        .variant_cc     foo
foo:                                    # @foo
# %bb.0:                                # %entry
        addi    sp, sp, -16
        sd      ra, 8(sp)                       # 8-byte Folded Spill
        vmv1r.v v24, v8
        call    bar
        vmv1r.v v8, v24
        ld      ra, 8(sp)                       # 8-byte Folded Reload
        addi    sp, sp, 16
        ret
...

And the compiler could emits code like below to fix this issue:

...
        .type   foo,@function
        .variant_cc     foo
foo:                                    # @foo
# %bb.0:                                # %entry
        addi    sp, sp, -16
        sd      ra, 8(sp)                       # 8-byte Folded Spill
        vsetivli x0, 0, e8, m1, ta, ma   # Need vsetvli to make VILL=0 here
        vmv1r.v v24, v8
        call    bar
        vsetivli x0, 0, e8, m1, ta, ma  # Need vsetvli to make VILL=0 here
        vmv1r.v v8, v24
        ld      ra, 8(sp)                       # 8-byte Folded Reload
        addi    sp, sp, 16
        ret
...

NOTE: We have hit this issue within our internal spec run.

*1 That change[1] is made AFTER 1.0...
[1] riscvarchive/riscv-v-spec@856fe5b
[2] torvalds/linux@9657e9b

@llvmbot
Copy link
Member

llvmbot commented Nov 1, 2024

@llvm/issue-subscribers-backend-risc-v

Author: Kito Cheng (kito-cheng)

Unfortunately, whole register move instructions depend on `vtype`*1, which means they will cause an illegal instruction exception if VILL=1. This is generally not a problem, as VILL is set to 0 after any valid `vsetvli` instruction, so it’s usually safe unless the user executes a whole vector register move very early in the program.

However, the situation changed after the Linux kernel applied a patch[2] that sets VILL=1 after any system call. So, if we try to execute a whole register move after a system call, it will cause an illegal instruction exception. This can be difficult to detect, as the system call may not be invoked immediately; it might be deeply nested in a call chain, such as within printf. Unfortunately, this change has already shipped with Linux kernel 6.5, which was released on August 28, 2023.

I'm not sure if it's reasonable to ask the Linux kernel maintainers to fix this by keeping VILL consistent across system calls.

An alternative approach is to address this issue on the toolchain side by requiring at least one valid vsetvli instruction before any whole register move. This might be an ugly workaround, but it’s probably the simplest way to resolve the issue. I also realized this might be a better solution since the psABI specifies that VTYPE is NOT preserved across function calls. This means we can’t guarantee that VILL is not 1 at the function entry, so placing a vsetvli instruction right after the function call may be necessary.

Testcase:

#include &lt;riscv_vector.h&gt;
void bar() __attribute__((riscv_vector_cc));

vint32m1_t foo(vint32m1_t a) {
    bar();  // We never know bar will call system call inside or not.
    return a;
}

Generated asm with clang -target riscv64-unknown-elf -S -O3 -march=rv64gcv:

...
        .type   foo,@<!-- -->function
        .variant_cc     foo
foo:                                    # @<!-- -->foo
# %bb.0:                                # %entry
        addi    sp, sp, -16
        sd      ra, 8(sp)                       # 8-byte Folded Spill
        vmv1r.v v24, v8
        call    bar
        vmv1r.v v8, v24
        ld      ra, 8(sp)                       # 8-byte Folded Reload
        addi    sp, sp, 16
        ret
...

And the compiler could emits code like below to fix this issue:

...
        .type   foo,@<!-- -->function
        .variant_cc     foo
foo:                                    # @<!-- -->foo
# %bb.0:                                # %entry
        addi    sp, sp, -16
        sd      ra, 8(sp)                       # 8-byte Folded Spill
        vsetivli x0, 0, e8, m1, ta, ma   # Need vsetvli to make VILL=0 here
        vmv1r.v v24, v8
        call    bar
        vsetivli x0, 0, e8, m1, ta, ma  # Need vsetvli to make VILL=0 here
        vmv1r.v v8, v24
        ld      ra, 8(sp)                       # 8-byte Folded Reload
        addi    sp, sp, 16
        ret
...

NOTE: We have hit this issue within our internal spec run.

*1 That change[1] is made AFTER 1.0...
[1] riscvarchive/riscv-v-spec@856fe5b
[2] torvalds/linux@9657e9b

@kito-cheng
Copy link
Member Author

kito-cheng commented Nov 1, 2024

cc: @topperc @BeMg @asb @preames @lukel97

Also cc some non LLVM folks since this is same situation for GCC @palmer-dabbelt @JeffreyALaw

The case for GCC, GCC will using stack rather than callee save reg in this case, so I use some inline asm trick to force GCC to use that:

#include <riscv_vector.h>
void bar() __attribute__((riscv_vector_cc));

vint32m1_t foo(vint32m1_t a, vint32m1_t b) {
    register vint32m1_t x asm("v24") = b;
    bar();
    asm ("#xx %0"::"vr"(x) );
    return x;
}

@wangpc-pp
Copy link
Contributor

wangpc-pp commented Nov 1, 2024

I may have missed something here, why would vmv<N>r.v depend on vtype? I didn't notice this before actually.
Here is a note from the SPEC:

Note: These instructions are intended to aid compilers to shuffle vector registers without needing to know or change vl or vtype.

I don't see the necessariness of depending on vtype... maybe I missunderstand the spec or the spec is just being vague here.

@kito-cheng
Copy link
Member Author

I may have missed something here, why would vmv<N>r.v depend on vtype? I didn't notice this before actually. Here is a note from the SPEC:

Note: These instructions are intended to aid compilers to shuffle vector registers without needing to know or change vl or vtype.

I don't see the necessariness of depending on vtype... maybe I missunderstand the spec or the spec is just being vague here.

Commit log from riscvarchive/riscv-v-spec@856fe5b say: The normative text says that vmv<nr>r.v "operates as though EEW=SEW", meaning that it _does_ depend on vtype.

@wangpc-pp
Copy link
Contributor

I may have missed something here, why would vmv<N>r.v depend on vtype? I didn't notice this before actually. Here is a note from the SPEC:

Note: These instructions are intended to aid compilers to shuffle vector registers without needing to know or change vl or vtype.

I don't see the necessariness of depending on vtype... maybe I missunderstand the spec or the spec is just being vague here.

Commit log from riscvarchive/riscv-v-spec@856fe5b say: The normative text says that vmv<nr>r.v "operates as though EEW=SEW", meaning that it _does_ depend on vtype.

Yeah, I saw that. I mean, is it really necessary from the perspective of semantics? I think it is a mistake and introduces unnecessary constraints.

@kito-cheng
Copy link
Member Author

Yeah, I saw that. I mean, is it really necessary from the perspective of semantics? I think it is a mistake and introduces unnecessary constraints.

I kinda agree with you but I guess SiFive is not the only one implement that semantics...also that's kinda spec conformance implementation

@wangpc-pp
Copy link
Contributor

wangpc-pp commented Nov 1, 2024

Yeah, I saw that. I mean, is it really necessary from the perspective of semantics? I think it is a mistake and introduces unnecessary constraints.

I kinda agree with you but I guess SiFive is not the only one implement that semantics...also that's kinda spec conformance implementation

The change riscvarchive/riscv-v-spec@856fe5b was committed after the ratification of RVV 1.0, so it is not a mandatory request but a supplementary explanation (and I think it is not following the intuition).

I confirmed that Spacemit-X60 on K1 doesn't follow this:

bytedance@k1:~$ cat vmv.c
int main() {
        asm("li a0, 0x8000000000000000");
        asm("vsetvl a0, a0, a0");
        asm("vmv1r.v v0, v1"); // No SIGILL.
        asm("vadd.vv v0, v1, v2"); // SIGILL here.
        return 0;
}
bytedance@k1:~$ gcc -march=rv64gcv vmv.c -o vmv -g
bytedance@k1:~$ gdb vmv
GNU gdb (Ubuntu 14.0.50.20230907-0ubuntu1-bb1) 14.0.50.20230907-git
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "riscv64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from vmv...
(gdb) r
Starting program: /home/bytedance/vmv
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
main () at vmv.c:5
5               asm("vadd.vv v0, v1, v2");
(gdb) p /x $vtype
$1 = 0x8000000000000000

I haven't checked XuanTie C908 on K230 yet, but I 99.99% believe that it is the same as it is an "old" core.

@preames
Copy link
Collaborator

preames commented Nov 1, 2024

Summary of the change above matches my understanding, I'd previously written this up here: https://github.com/preames/public-notes/blob/master/riscv-spec-minutia.rst#id8

And yes, this change is extremely problematic. We can work around in SW, but ouch.

@preames
Copy link
Collaborator

preames commented Nov 1, 2024

Since this is getting public discussion, I put my notes on this topic in a public location. This was mostly written previously based off internal discussion, but has been minor updated to include new information discussed in this ticket. See https://github.com/preames/public-notes/blob/master/riscv/whole-register-move-abi.rst

@wangpc-pp
Copy link
Contributor

Since this is getting public discussion, I put my notes on this topic in a public location. This was mostly written previously based off internal discussion, but has been minor updated to include new information discussed in this ticket. See https://github.com/preames/public-notes/blob/master/riscv/whole-register-move-abi.rst

Thanks for the summary! I think there should be option that we revert the change and reword that paragraph so that it clearly doesn't depend on vtype.

@topperc
Copy link
Collaborator

topperc commented Nov 1, 2024

The change that added the EEW=SEW sentence to the whole register move section was added by Krste in October 2020 before ratification. It was done for hardware that rearranges data based on EEW. I'm on my phone so I can't easily dig up the commit right now.

@wangpc-pp
Copy link
Contributor

The change that added the EEW=SEW sentence to the whole register move section was added by Krste in October 2020 before ratification. It was done for hardware that rearranges data based on EEW. I'm on my phone so I can't easily dig up the commit right now.

I digged the commit history.

For vill part:

  1. The sentence of exceptions was added in riscvarchive/riscv-v-spec@4e72221.
  2. And then removed whole register move in riscvarchive/riscv-v-spec@856fe5b.

For vmv<N>r.v part (there are too many related commits, I just paste important changes here):

  1. They was introduced by riscvarchive/riscv-v-spec@7b02297. Note that It was ignoring the current settings of the vl and vtype register from the beginning.
  2. A clarification that changed the description to operate as if SEW=8 and vl=VLMAX, regardless of current settings in vtype and vl: riscvarchive/riscv-v-spec@548232c
  3. Some more clarifications in Clarify Whole register move characteristics - as specified 4.2. Mapping with LMUL > 1 riscvarchive/riscv-v-spec#378 and its related commits. It didn't change a lot.
  4. Rearrange things were added in riscvarchive/riscv-v-spec@2144559 and then riscvarchive/riscv-v-spec@f1d349d made it EEW=SEW. From what I can tell, these sentences were just saying some implementations can treat vd==vs2 cases as HINTs.

My understanding is, it was clearly that vtype should be ignored since the beginning; then somehow the SEW=8 clarification started to make it vague (but I still think, people should interpret it as ignoring); the rearrange HINTs change made vagueness worse, it should be just a note for potential implementations, but somehow it led some people to think it may be a strong requirement.

IMO, there does exist some vagueness/contradiction in the spec, and please clarify it clearly/formally based on the semantics of whole register move.

@wangpc-pp
Copy link
Contributor

Confirmed that C908 on K230 will also not trap on vmv<N>r.v:

[root@canaan ~ ]#./vmv
[  157.105988] vmv[171]: unhandled signal 4 code 0x1 at 0x000000000001054a in vmv[10000+69000]
[  157.114396] CPU: 0 PID: 171 Comm: vmv Not tainted 5.10.4+ #1
[  157.120060] epc: 000000000001054a ra : 00000000000105c6 sp : 0000003fffb28ba0
[  157.127199]  gp : 000000000007f518 tp : 0000000000085760 t0 : 0000000000000001
[  157.134423]  t1 : 000000000007ed28 t2 : 000000000000006a s0 : 0000003fffb28bb0
[  157.141647]  s1 : 0000000000000001 a0 : 0000000000000000 a1 : 0000003fffb28d58
[  157.148869]  a2 : 0000003fffb28d68 a3 : 0000000000000000 a4 : 0000003fffb28bd8
[  157.156092]  a5 : 0000000000010536 a6 : 000000000006afe0 a7 : 0000000000000001
[  157.163315]  s2 : 0000003fffb28d58 s3 : 0000000000000001 s4 : 0000003fffb28d68
[  157.170538]  s5 : 0000000000000001 s6 : 0000000000010536 s7 : 0000000000010258
[  157.177762]  s8 : 0000000000000000 s9 : 0000000000159fc0 s10: 0000000000156760
[  157.184984]  s11: 0000000000000001 t3 : 2f2f2f2f2f2f2f2f t4 : 000000000007d9e8
[  157.192207]  t5 : 0000000000084260 t6 : 0000000000079a30
[  157.197519] status: 8000000200004620 badaddr: 0000000002110057 cause: 0000000000000002
Illegal instruction

# objdump -d vmv >vmv.s
0000000000010536 <main>:
   10536:       1141                    add     sp,sp,-16
   10538:       e422                    sd      s0,8(sp)
   1053a:       0800                    add     s0,sp,16
   1053c:       fff0051b                addw    a0,zero,-1
   10540:       157e                    sll     a0,a0,0x3f
   10542:       80a57557                vsetvl  a0,a0,a0
   10546:       9e103057                vmv1r.v v0,v1
   1054a:       02110057                vadd.vv v0,v1,v2
   1054e:       4781                    li      a5,0
   10550:       853e                    mv      a0,a5
   10552:       6422                    ld      s0,8(sp)
   10554:       0141                    add     sp,sp,16
   10556:       8082                    ret

@aswaterman
Copy link

aswaterman commented Nov 4, 2024

Much discussion about the ISA spec in this thread surrounds a particular non-normative note. Ultimately, non-normative text isn't relevant to this discussion, because, well, it isn't normative. The reason that text was changed post-ratification is that (a) it is valid to change non-normative text at any time, since by definition it doesn't affect the normative content, and (b) in this case, it contradicted the normative text, and the normative text always wins.

As Craig points out, the ratified normative text in the spec says that the instructions do depend on SEW (for e.g. vstart), hence they depend on vtype. This definition was in place long before to ratification.

Given the opportunity, I'm sure we'd revisit this ISA choice, but it is what it is. The fact that multiple implementations don't trap these instructions doesn't change the story. The spec says the behavior in this case is reserved. Raising an illegal-instruction trap is valid behavior. Not raising an illegal-instruction trap is also valid behavior.

@wangpc-pp
Copy link
Contributor

The spec says the behavior in this case is reserved. Raising an illegal-instruction trap is valid behavior. Not raising an illegal-instruction trap is also valid behavior.

Please be specific here which part of the spec says this?

@aswaterman
Copy link

The Whole-Register Moves section says that the instructions' behavior depends on SEW. The Vector Type Illegal section says that an attempt to execute an instruction that depends on vtype when vill=1 raises an illegal-instruction exception.

(I had written earlier that the behavior is reserved, but my recollection was wrong; the spec makes the stronger statement that an exception must be raised.)

@wangpc-pp
Copy link
Contributor

The Whole-Register Moves section says that the instructions' behavior depends on SEW.

Even for raified spec, it says:

The instructions operate as if EEW=SEW, EMUL = NREG, effective length evl= EMUL * VLEN/SEW.

I'm not a native English speaker, is as if a strong requirement?

@aswaterman
Copy link

”As if” isn’t a phrase used to suggest optionality; it’s a phrase used to establish an equivalence.

lukel97 added a commit to lukel97/riscv-isa-manual that referenced this issue Nov 4, 2024
In a similar vein to 856fe5b, this note states that compilers don't
need to "know or change vtype" for whole vector register moves.

There's some truth to this in that compilers can largely ignore SEW
and LMUL, but ultimately they do depend on vtype and the current
wording might be misleading. A compiler may in fact need to change
vtype to clear vill, see:
llvm/llvm-project#114518
aswaterman pushed a commit to riscv/riscv-isa-manual that referenced this issue Nov 4, 2024
…1710)

In a similar vein to 856fe5b, this note states that compilers don't
need to "know or change vtype" for whole vector register moves.

There's some truth to this in that compilers can largely ignore SEW
and LMUL, but ultimately they do depend on vtype and the current
wording might be misleading. A compiler may in fact need to change
vtype to clear vill, see:
llvm/llvm-project#114518
@topperc
Copy link
Collaborator

topperc commented Nov 4, 2024

qemu believes whole register move depends on vsew and will generate SIGILL for if vtype.vill is set.

@wangpc-pp
Copy link
Contributor

qemu believes whole register move depends on vsew and will generate SIGILL for if vtype.vill is set.

This is not an evidence to support this mistake, I can also say that GEM5 won't trap on this. Also, this is a "chicken or egg" problem.

AFAIK, several cores have to change their implementations because of this change recently. At least, XiangShan has already been misled because of this apparent mistake: OpenXiangShan/NEMU#511. The impact on software would be much bigger.

Or, let me ask this in another way: What benefit would we have if whole register move depends on vtype? Would it make RISC-V much more competitive than X86/AArch64? Is this a state-of-the-art architecture innovation? How will you handle these products which are already on the market?

@topperc
Copy link
Collaborator

topperc commented Nov 5, 2024

Or, let me ask this in another way: What benefit would we have if whole register move depends on vtype?

The ratified spec already says that it depends on vtype. Non-normative text was not updated when the change was made. That was a mistake, but the normative text was updated. We cannot change the spec now without defining a new extension.

How will you handle these products which are already on the market?

Which products? SiFive cores implement the dependence on vtype and are used by customers. Just because they aren't available for retail doesn't mean they can be ignored.

@topperc
Copy link
Collaborator

topperc commented Nov 5, 2024

This is not an evidence to support this mistake, I can also say that GEM5 won't trap on this. Also, this is a "chicken or egg" problem.

How does GEM5 or any of the CPUs that don't trap handle vstart!=0 for these instructions?

wangpc-pp added a commit to wangpc-pp/llvm-project that referenced this issue Nov 5, 2024
…on vill

This is a compromise of llvm#114518.

We may also add a new extension `Zvnotrapvmvnr` or whatever that
doesn't add new instructions but these instructions won't trap on
vill to fix this mistake.

Not all of us want to pay for the mistake.
@wangpc-pp
Copy link
Contributor

wangpc-pp commented Nov 5, 2024

Or, let me ask this in another way: What benefit would we have if whole register move depends on vtype?

The ratified spec already says that it depends on vtype. Non-normative text was not updated when the change was made. That was a mistake, but the normative text was updated. We cannot change the spec now without defining a new extension.

How will you handle these products which are already on the market?

Which products? SiFive cores implement the dependence on vtype and are used by customers. Just because they aren't available for retail doesn't mean they can be ignored.

Sorry I don't mean to make this discussion argumentative, my apologies.
So you do agree there does exist a way to define a new extension to fix this mistake, right? Can we start it right now? WDYT? cc @aswaterman

@wangpc-pp
Copy link
Contributor

wangpc-pp commented Nov 5, 2024

This is not an evidence to support this mistake, I can also say that GEM5 won't trap on this. Also, this is a "chicken or egg" problem.

How does GEM5 or any of the CPUs that don't trap handle vstart!=0 for these instructions?

Well, I think it is because we can make sure vstart won't be non-zero. Of cource, this may not match the spec.
The formatted generated execution for current gem5 implementation:

Fault Vmv1r_vMicro::execute(ExecContext *xc,
                            trace::InstRecord *traceData) const {
  // TODO: Check register alignment.
  // TODO: If vd is equal to vs2 the instruction is an architectural NOP.
  MISA misa = xc->readMiscReg(MISCREG_ISA);
  STATUS status = xc->readMiscReg(MISCREG_STATUS);

  if (!misa.rvv || status.vs == VPUStatus::OFF) {
    return std::make_shared<IllegalInstFault>("RVV is disabled or VPU is off",
                                              machInst);
  }

  status.vs = VPUStatus::DIRTY;
  xc->setMiscReg(MISCREG_STATUS, status);

  /* Vars for Vs2*/ /* End vars for Vs2 */
  uint64_t VlenbBits = 0;
  RiscvISAInst::PCState __parserAutoPCState;
  set(__parserAutoPCState, xc->pcState());
  auto &tmp_d0 =
      *(RiscvISAInst::VecRegContainer *)xc->getWritableRegOperand(this, 0);
  auto Vd = tmp_d0.as<uint64_t>();
  RiscvISAInst::VecRegContainer tmp_s0;
  xc->getRegOperand(this, 0, &tmp_s0);
  auto Vs2 = tmp_s0.as<uint64_t>();
  VlenbBits = __parserAutoPCState.vlenb();
  uint32_t vlen = VlenbBits * 8;
  for (size_t i = 0; i < (vlen / 64); i++) {
    Vd[i] = Vs2[i];
  }

  if (traceData) {
    traceData->setData(vecRegClass, &tmp_d0);
  };
  return NoFault;
}

@luismarques
Copy link
Contributor

Since this is getting public discussion, I put my notes on this topic in a public location. This was mostly written previously based off internal discussion, but has been minor updated to include new information discussed in this ticket. See https://github.com/preames/public-notes/blob/master/riscv/whole-register-move-abi.rst

Regarding "Option 1 - Change the ABI", have you/we considered the possibility of requiring the ABI to preserve VILL=0 except for syscalls? I.e., effectively mandate that after a syscall you need to do a vsetivli to restore vill to zero (to not change the kernel or, even if it's changed, be compatible with the already released Linux kernel versions 6.5+).

@dzaima
Copy link

dzaima commented Nov 7, 2024

requiring the ABI to preserve VILL=0 except for syscalls

That would mean that existing software/libraries doing direct syscalls (maybe incl. glibc?) would cease to follow the ABI; not much worse than existing kernels doing so I'd think.

And handling that in userspace would be very messy, requiring conditionally running the vsetivli after a syscall if RVV is present (which leads to a mild mess, considering that it forces the first syscall to be a test for RVV support, which'd then have to immediately initialize the branch condition for running the vsetivli for itself and future syscalls).

palmer-dabbelt added a commit to palmer-dabbelt/riscv-elf-psabi-doc that referenced this issue Nov 21, 2024
Implementations are all over the place on whole register moves trapping
under VILL, so let's just forbid that case in the psABI.

Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

---

We've started seeing a bunch of fallout from the "whole register moves
depend on TYPE" ISA change, and there's discussion all over the place:

* There's a GCC bug
  <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117544>
* Also an LLVM bug <llvm/llvm-project#114518>
* QEMU changed behavior in 4eff52cd46 ("target/riscv: Add vill check for
  whole vector register move instructions")
* Philip has a writeup on some of the options in his notes
  <https://github.com/preames/public-notes/blob/master/riscv/whole-register-move-abi.rst>.
* This has also come up in most of the meetings I'v been in this week.

It seems like there's no general consensus on what we de here -- some
discussions say we're going to change the psABI (and presumably then the
uABI), some say we're not.  I don't personally care a ton if we make the
ABI change or not, we just need to decide so we can figure out where the
bugs are -- there's going to be fallout either way, but we can't really
get things fixed until we decide one way or the other.  As far as I can
tell both paths are valid:

* If we make these ABI changes then most code that predates the ISA
  change continues to function correctly after the ISA change.  We just
  need to track down anything that sets VILL and fix it, but we should
  be able to do that incrementeally (maybe even just with a trap
  handler).  Right now I think that's just the kernel, but I'm not 100%
  sure there.  Looks like the first round of HW doesn't trap, though, so
  we should be safe for a bit.
* If we don't make these ABI changes then we'll have to fix the
  compilers and go rebuild everything to match the ISA change.  I think
  the GCC change should be pretty straight-forward, I don't know about
  the LLVM side of things.  I'm not sure what we'd do with the kernel
  here: we could say the VILL traps are just latent userspace bugs, or
  we could say we're breaking userspace -- kind of a grey area, so
  probably more of an LKML question.

I don't think one option is clearly simpler than the other, it's just a
question of where we push the bugs.
palmer-dabbelt added a commit to palmer-dabbelt/riscv-elf-psabi-doc that referenced this issue Nov 21, 2024
Implementations are all over the place on whole register moves trapping
under VILL, so let's just forbid that case in the psABI.

Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

---

We've started seeing a bunch of fallout from the "whole register moves
depend on TYPE" ISA change, and there's discussion all over the place:

* There's a GCC bug
  <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117544>
* Also an LLVM bug <llvm/llvm-project#114518>
* QEMU changed behavior in 4eff52cd46 ("target/riscv: Add vill check for
  whole vector register move instructions")
* Philip has a writeup on some of the options in his notes
  <https://github.com/preames/public-notes/blob/master/riscv/whole-register-move-abi.rst>.
* This has also come up in most of the meetings I'v been in this week.

It seems like there's no general consensus on what we do here -- some
discussions say we're going to change the psABI (and presumably then the
uABI), some say we're not.  I don't personally care a ton if we make the
ABI change or not, we just need to decide so we can figure out where the
bugs are -- there's going to be fallout either way, but we can't really
get things fixed until we decide one way or the other.  As far as I can
tell both paths are valid:

* If we make these ABI changes then most code that predates the ISA
  change continues to function correctly after the ISA change.  We just
  need to track down anything that sets VILL and fix it, but we should
  be able to do that incrementally (maybe even just with a trap
  handler).  Right now I think that's just the kernel, but I'm not 100%
  sure there.  Looks like the first round of HW doesn't trap, though, so
  we should be safe for a bit.
* If we don't make these ABI changes then we'll have to fix the
  compilers and go rebuild everything to match the ISA change.  I think
  the GCC change should be pretty straight-forward, I don't know about
  the LLVM side of things.  I'm not sure what we'd do with the kernel
  here: we could say the VILL traps are just latent userspace bugs, or
  we could say we're breaking userspace -- kind of a grey area, so
  probably more of an LKML question.

I don't think one option is clearly simpler than the other, it's just a
question of where we push the bugs.
palmer-dabbelt added a commit to palmer-dabbelt/riscv-elf-psabi-doc that referenced this issue Nov 21, 2024
Implementations are all over the place on whole register moves trapping
under VILL, so let's just forbid that case in the psABI.

Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

---

We've started seeing a bunch of fallout from the "whole register moves
depend on TYPE" ISA change, and there's discussion all over the place:

* There's a GCC bug
  <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117544>
* Also an LLVM bug <llvm/llvm-project#114518>
* QEMU changed behavior in 4eff52cd46 ("target/riscv: Add vill check for
  whole vector register move instructions")
* Philip has a writeup on some of the options in his notes
  <https://github.com/preames/public-notes/blob/master/riscv/whole-register-move-abi.rst>.
* This has also come up in most of the meetings I'v been in this week.

It seems like there's no general consensus on what we do here -- some
discussions say we're going to change the psABI (and presumably then the
uABI), some say we're not.  I don't personally care a ton if we make the
ABI change or not, we just need to decide so we can figure out where the
bugs are -- there's going to be fallout either way, but we can't really
get things fixed until we decide one way or the other.  As far as I can
tell both paths are valid:

* If we make these ABI changes then most code that predates the ISA
  change continues to function correctly after the ISA change.  We just
  need to track down anything that sets VILL and fix it, but we should
  be able to do that incrementally (maybe even just with a trap
  handler).  Right now I think that's just the kernel, but I'm not 100%
  sure there.  Looks like the first round of HW doesn't trap, though, so
  we should be safe for a bit.
* If we don't make these ABI changes then we'll have to fix the
  compilers and go rebuild everything to match the ISA change.  I think
  the GCC change should be pretty straight-forward, I don't know about
  the LLVM side of things.  I'm not sure what we'd do with the kernel
  here: we could say the VILL traps are just latent userspace bugs, or
  we could say we're breaking userspace -- kind of a grey area, so
  probably more of an LKML question.

I don't think one option is clearly simpler than the other, it's just a
question of where we push the bugs.
lukel97 added a commit to lukel97/llvm-project that referenced this issue Dec 2, 2024
This is an alternative to llvm#117866 that works by demanding a valid vtype instead using a separate pass.

The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block.

To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible.

Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else.

This is one part of a fix for llvm#114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.
lukel97 added a commit to lukel97/llvm-project that referenced this issue Dec 2, 2024
This is an alternative to llvm#117866 that works by demanding a valid vtype instead using a separate pass.

The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block.

To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible.

Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else.

This is one part of a fix for llvm#114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.
lukel97 added a commit to lukel97/llvm-project that referenced this issue Dec 3, 2024
This is an alternative to llvm#117866 that works by demanding a valid vtype instead using a separate pass.

The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block.

To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible.

Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else.

This is one part of a fix for llvm#114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.
lukel97 added a commit to lukel97/llvm-project that referenced this issue Dec 4, 2024
This is an alternative to llvm#117866 that works by demanding a valid vtype instead using a separate pass.

The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block.

To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible.

Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else.

This is one part of a fix for llvm#114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.
lukel97 added a commit to lukel97/llvm-project that referenced this issue Dec 5, 2024
This is an alternative to llvm#117866 that works by demanding a valid vtype instead using a separate pass.

The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block.

To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible.

Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else.

This is one part of a fix for llvm#114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.
lukel97 added a commit that referenced this issue Dec 5, 2024
…on (#118283)

This is an alternative to #117866 that works by demanding a valid vtype
instead of using a separate pass.

The main advantage of this is that it allows coalesceVSETVLIs to just
reuse an existing vsetvli later in the block.

To do this we need to first transfer the vsetvli info to some arbitrary
valid state in transferBefore when we encounter a vector copy. Then we
add a new vill demanded field that will happily accept any other known
vtype, which allows us to coalesce these where possible.

Note we also need to check for vector copies in computeVLVTYPEChanges,
otherwise the pass will completely skip over functions that only have
vector copies and nothing else.

This is one part of a fix for #114518. We still need to check if there's
other cases where vector copies/whole register moves that are inserted
after vsetvli insertion.
TIFitis pushed a commit to TIFitis/llvm-project that referenced this issue Dec 18, 2024
…on (llvm#118283)

This is an alternative to llvm#117866 that works by demanding a valid vtype
instead of using a separate pass.

The main advantage of this is that it allows coalesceVSETVLIs to just
reuse an existing vsetvli later in the block.

To do this we need to first transfer the vsetvli info to some arbitrary
valid state in transferBefore when we encounter a vector copy. Then we
add a new vill demanded field that will happily accept any other known
vtype, which allows us to coalesce these where possible.

Note we also need to check for vector copies in computeVLVTYPEChanges,
otherwise the pass will completely skip over functions that only have
vector copies and nothing else.

This is one part of a fix for llvm#114518. We still need to check if there's
other cases where vector copies/whole register moves that are inserted
after vsetvli insertion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants