-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
We want to accelerate Vector<T> using the Scalable Vector Extension (SVE) on ARM64, which provides instructions that operate on vectors of arbitrary size. Different hardware implementations can support different vector register lengths (in powers of 2 from 128 bits, up to 2048 bits), but the same instruction set is used to operate on them. Software can determine the vector register size by executing the rdvl instruction.
The current implementation of Vector<T> uses NEON and is fixed to 128 bits across all hardware. However, there are intrinsics available for the class that allow you to execute SVE instructions directly.
The key consequence of this feature is that the size of Vector<T> becomes a runtime constant, and therefore cannot be statically evaluated by the compiler. Most of the existing codebase assumes that all types have a known, statically determinable size (typically obtained via genTypeSize). We need to introduce a new type classification for Vector<T> that may not support this assumption, requiring the compiler to handle it differently.
There are three scenarios we need to consider:
- JIT compilation
- Ahead-of-time (AOT) compilation with a specified target vector length
- Ahead-of-time (AOT) compilation without a known target vector length
Scenarios 1 and 2 are roughly equivalent in complexity. Either the compiler learns the size of Vector<T> by executing rdvl (Scenario 1: JIT), or the user provides a desired size via a compiler option (Scenario 2: AOT). Both cases can be managed by the EE and communicated to RyuJIT through the JIT–EE contract.
I’ve been exploring a solution for these two scenarios by introducing a new type, TYP_SIMDSV, and refactoring areas that process SIMD nodes to query the type size dynamically from the compiler, rather than using genTypeSize.
It is not possible to AOT-compile with a specified vector length and then execute the resulting binary on hardware with a different vector length. Assumptions about the size of Vector<T> would no longer hold, and memory corruption is likely. This has implications for how users interact with NativeAOT and R2R. The user must ensure that the process’s vector length matches the value used at compile time.
Scenario 3 is more complex, as it requires adding compilation paths that cannot assume any specific vector size. The compiler must generate code sequences that query the vector register size at runtime. Any code generation that depends on vector size, such as stack frame layout or context serialization, must be implemented in a way that is completely independent of the actual value.
Scenarios 1 and 2 can be viewed as optimizations of Scenario 3. When the vector length is known at compile time, the compiler can omit rdvl calls and substitute a compile-time constant instead. However, because the JIT currently relies heavily on knowing the size of all internal types, it may be pragmatic to tackle Scenarios 1 and 2 first, and then move toward fully vector-agnostic compilation later.
Problems to Solve
-
Introduce
TYP_SIMDto the JIT type system in a manner compatible with NEON -
Add scalable vector type to JIT and HFA type for Vector<T> #121114 (initial research)
-
Implement stack frame allocation for
TYP_SIMDandTYP_MASK -
Implement value classes containing
Vector<T> -
Update ABI and LSRA for parameter passing and non-volatile registers (AAPCS link)
-
Port
Vector<T>implementations to SVE with a NEON fallback when the feature is not available. -
Update suspension
CONTEXTrecords. -
Update exception handling and unwinding
-
Update testing to scale with vector length and test across all available configurations (NEON only, 128-bit SVE, 256-bit SVE).
-
Support addressing modes for compiler inserted loads/stores
-
Update the debug adapter.
-
Update GCInfo to allow reporting offsets in terms of
N * VL + Imm