Accelerating Vector<T> with SVE on ARM64

We want to accelerate `Vector<T>` using the Scalable Vector Extension (SVE) on ARM64, which provides instructions that operate on vectors of arbitrary size. Different hardware implementations can support different vector register lengths (in powers of 2 from 128 bits, up to 2048 bits), but the same instruction set is used to operate on them. Software can determine the vector register size by executing the `rdvl` instruction.

The current implementation of `Vector<T>` uses NEON and is fixed to 128 bits across all hardware. However, there are intrinsics available for the class that allow you to execute SVE instructions directly.

The key consequence of this feature is that the size of `Vector<T>` becomes a runtime constant, and therefore cannot be statically evaluated by the compiler. Most of the existing codebase assumes that all types have a known, statically determinable size (typically obtained via `genTypeSize`). We need to introduce a new type classification for `Vector<T>` that may not support this assumption, requiring the compiler to handle it differently.

There are three scenarios we need to consider:

- JIT compilation
- Ahead-of-time (AOT) compilation with a specified target vector length
- Ahead-of-time (AOT) compilation without a known target vector length

Scenarios 1 and 2 are roughly equivalent in complexity. Either the compiler learns the size of `Vector<T>` by executing `rdvl` (Scenario 1: JIT), or the user provides a desired size via a compiler option (Scenario 2: AOT). Both cases can be managed by the EE and communicated to RyuJIT through the JIT–EE contract.

I’ve been exploring a solution for these two scenarios by introducing a new type, `TYP_SIMDSV`, and refactoring areas that process SIMD nodes to query the type size dynamically from the compiler, rather than using `genTypeSize`.

It is not possible to AOT-compile with a specified vector length and then execute the resulting binary on hardware with a different vector length. Assumptions about the size of `Vector<T>` would no longer hold, and memory corruption is likely. This has implications for how users interact with NativeAOT and R2R. The user must ensure that the process’s vector length matches the value used at compile time.

Scenario 3 is more complex, as it requires adding compilation paths that cannot assume any specific vector size. The compiler must generate code sequences that query the vector register size at runtime. Any code generation that depends on vector size, such as stack frame layout or context serialization, must be implemented in a way that is completely independent of the actual value.

Scenarios 1 and 2 can be viewed as optimizations of Scenario 3. When the vector length is known at compile time, the compiler can omit `rdvl` calls and substitute a compile-time constant instead. However, because the JIT currently relies heavily on knowing the size of all internal types, it may be pragmatic to tackle Scenarios 1 and 2 first, and then move toward fully vector-agnostic compilation later.

### Problems to Solve

- [ ] Introduce `TYP_SIMD` to the JIT type system in a manner compatible with NEON
- https://github.com/dotnet/runtime/pull/121114 (initial research)
- https://github.com/dotnet/runtime/pull/121489
- https://github.com/dotnet/runtime/pull/121548

- [ ] Implement stack frame allocation for `TYP_SIMD` and `TYP_MASK`
- https://github.com/dotnet/runtime/pull/122638

- [ ] Implement value classes containing `Vector<T>`
- [ ] Update ABI and LSRA for parameter passing and non-volatile registers ([AAPCS link](https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#613scalable-vector-registers))
- [ ] Port `Vector<T>` implementations to SVE with a NEON fallback when the feature is not available.
- [ ] Update suspension `CONTEXT` records.
- [ ] Update exception handling and unwinding
- [ ] Update testing to scale with vector length and test across all available configurations (NEON only, 128-bit SVE, 256-bit SVE).

- [ ] Support addressing modes for compiler inserted loads/stores
- [ ] Update the debug adapter.
- [ ] Update GCInfo to allow reporting offsets in terms of `N * VL + Imm`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerating Vector<T> with SVE on ARM64 #120599

Problems to Solve

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Accelerating Vector<T> with SVE on ARM64 #120599

Description

Problems to Solve

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions