Skip to content

Accelerating Vector<T> with SVE on ARM64 #120599

@snickolls-arm

Description

@snickolls-arm

We want to accelerate Vector<T> using the Scalable Vector Extension (SVE) on ARM64, which provides instructions that operate on vectors of arbitrary size. Different hardware implementations can support different vector register lengths (in powers of 2 from 128 bits, up to 2048 bits), but the same instruction set is used to operate on them. Software can determine the vector register size by executing the rdvl instruction.

The current implementation of Vector<T> uses NEON and is fixed to 128 bits across all hardware. However, there are intrinsics available for the class that allow you to execute SVE instructions directly.

The key consequence of this feature is that the size of Vector<T> becomes a runtime constant, and therefore cannot be statically evaluated by the compiler. Most of the existing codebase assumes that all types have a known, statically determinable size (typically obtained via genTypeSize). We need to introduce a new type classification for Vector<T> that may not support this assumption, requiring the compiler to handle it differently.

There are three scenarios we need to consider:

  • JIT compilation
  • Ahead-of-time (AOT) compilation with a specified target vector length
  • Ahead-of-time (AOT) compilation without a known target vector length

Scenarios 1 and 2 are roughly equivalent in complexity. Either the compiler learns the size of Vector<T> by executing rdvl (Scenario 1: JIT), or the user provides a desired size via a compiler option (Scenario 2: AOT). Both cases can be managed by the EE and communicated to RyuJIT through the JIT–EE contract.

I’ve been exploring a solution for these two scenarios by introducing a new type, TYP_SIMDSV, and refactoring areas that process SIMD nodes to query the type size dynamically from the compiler, rather than using genTypeSize.

It is not possible to AOT-compile with a specified vector length and then execute the resulting binary on hardware with a different vector length. Assumptions about the size of Vector<T> would no longer hold, and memory corruption is likely. This has implications for how users interact with NativeAOT and R2R. The user must ensure that the process’s vector length matches the value used at compile time.

Scenario 3 is more complex, as it requires adding compilation paths that cannot assume any specific vector size. The compiler must generate code sequences that query the vector register size at runtime. Any code generation that depends on vector size, such as stack frame layout or context serialization, must be implemented in a way that is completely independent of the actual value.

Scenarios 1 and 2 can be viewed as optimizations of Scenario 3. When the vector length is known at compile time, the compiler can omit rdvl calls and substitute a compile-time constant instead. However, because the JIT currently relies heavily on knowing the size of all internal types, it may be pragmatic to tackle Scenarios 1 and 2 first, and then move toward fully vector-agnostic compilation later.

Problems to Solve

Metadata

Metadata

Assignees

No one assigned

    Labels

    arch-arm64area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIarm-sveWork related to arm64 SVE/SVE2 support

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions