Skip to content

Pass vectors via simd registers on Unix x64#124267

Open
reedz wants to merge 4 commits intodotnet:mainfrom
reedz:feature/simd-vector-registers
Open

Pass vectors via simd registers on Unix x64#124267
reedz wants to merge 4 commits intodotnet:mainfrom
reedz:feature/simd-vector-registers

Conversation

@reedz
Copy link
Contributor

@reedz reedz commented Feb 11, 2026

Fixes #5040.

JIT and VM on Unix AMD64 (System V ABI) is updated to pass Vector64, Vector128, Vector256, and Vector512 arguments and return values in single SIMD registers (XMM/YMM/ZMM) instead of splitting them across multiple eightbyte slots or spilling to the stack.

Benchmark results seem favorable:

Benchmark Main (ns) Branch (ns) Speedup
Vector128<float> Add (2 args) 5.75 3.28 1.75×
Vector128<float> Chain (4 args) 14.29 6.76 2.11×
Vector256<int> Add (2 args) 6.47 3.57 1.81×
Vector512<int> Add (2 args) 10.77 7.42 1.45×

Copilot AI review requested due to automatic review settings February 11, 2026 12:36
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 11, 2026
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Feb 11, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates CoreCLR JIT + VM on Unix AMD64 (SysV ABI) to treat opaque SIMD vector structs as single-register values (XMM/YMM/ZMM) for argument passing and returns, and adds targeted JIT tests for the new ABI behavior.

Changes:

  • Teach the SysV x64 ABI classifier/return logic to pass and return Vector64/128/256/512 as single SIMD-register values when supported (and adjust codegen/LSRA accordingly).
  • Update VM and tooling-side SysV struct classification for SIMD intrinsics (to avoid misclassification via internal field layouts) and add runtime plumbing/macros to preserve upper halves of YMM/ZMM regs across relevant stubs.
  • Add new directed JIT tests covering register passing/return scenarios for intrinsic vectors and System.Numerics.Vector.

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/tests/JIT/Stress/ABI/ABIs.cs Expands ABI stress tailcall candidate types to include Vector512.
src/tests/JIT/Directed/VectorABI/VectorRegPassSysV.csproj New directed test project for SysV SIMD register passing.
src/tests/JIT/Directed/VectorABI/VectorRegPassSysV.cs New SysV x64 test for Vector64/128/256/512 arg/return paths.
src/tests/JIT/Directed/VectorABI/VectorNumericsRegPass.csproj New directed test project for System.Numerics.Vector passing.
src/tests/JIT/Directed/VectorABI/VectorNumericsRegPass.cs New test validating Vector parameter/return behavior.
src/tests/JIT/Directed/VectorABI/VectorMgdMgd256_ro.csproj New project file for VectorMgdMgd256 test variant.
src/tests/JIT/Directed/VectorABI/VectorMgdMgd256.cs New/updated HVA + Vector256/512 coverage.
src/coreclr/vm/vars.hpp Declares Unix AMD64 runtime flags for AVX/AVX512 support.
src/coreclr/vm/vars.cpp Defines Unix AMD64 runtime flags for AVX/AVX512 support.
src/coreclr/vm/codeman.cpp Sets AVX/AVX512 runtime flags based on feature detection/config.
src/coreclr/vm/methodtable.cpp Forces SIMD intrinsic types to classify as SSE (avoid field-walk misclassification).
src/coreclr/vm/amd64/virtualcallstubamd64.S Extends transition stubs to preserve upper vector register state before helper calls.
src/coreclr/vm/amd64/unixasmhelpers.S Updates tiered compilation stub to preserve upper vector register state.
src/coreclr/vm/amd64/theprestubamd64.S Updates prestub to preserve upper vector register state (incl. Swift retbuf offset).
src/coreclr/vm/amd64/externalmethodfixupthunk.S Updates delay-load thunk to preserve upper vector register state.
src/coreclr/vm/amd64/CachedInterfaceDispatchCoreCLR.S Updates interface dispatch slow paths to preserve upper vector register state.
src/coreclr/pal/inc/unixasmmacrosamd64.inc Adds SAVE/RESTORE_UPPER_VECTOR_REGISTERS macros driven by runtime AVX flags.
src/coreclr/tools/Common/JitInterface/SystemVStructClassificator.cs Mirrors VM SIMD classification changes for tool-side classifier.
src/coreclr/jit/targetamd64.cpp Adds SysV x64 single-register SIMD passing (XMM/YMM/ZMM) in classifier.
src/coreclr/jit/compiler.h Adds helper to identify “single register” opaque SIMD types for ABI decisions.
src/coreclr/jit/compiler.cpp Returns opaque HW SIMD structs as a single SIMD register on Unix AMD64 when supported.
src/coreclr/jit/lsrabuild.cpp Treats SIMD returns as single-float-reg uses on AMD64 in return building.
src/coreclr/jit/codegenxarch.cpp Ensures SIMD returns use float return regs; avoids vzeroupper when returning ≥256-bit vectors.
src/coreclr/jit/abi.cpp Extends float-reg segment typing to include 32/64-byte SIMD segments on xarch.

Comment on lines 257 to 269
Console.WriteLine("=== Vector512 tests ===");
Console.WriteLine($" Avx512F.IsSupported = {Avx512F.IsSupported}");

var v512a = Vector512.Create(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16);
var v512b = Vector512.Create(10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160);
pass &= Check(AddVec512(v512a, v512b),
Vector512.Create(11, 22, 33, 44, 55, 66, 77, 88, 99, 110, 121, 132, 143, 154, 165, 176),
"AddVec512");

pass &= Check(ReturnVec512(99), Vector512.Create(99), "ReturnVec512");

pass &= Check(MixedArgsVec512(3, Vector512.Create(10.0f), 7L),
Vector512.Create(20.0f), "MixedArgsVec512");
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Vector512 test section executes Vector512 operations unconditionally. On most test machines Avx512F.IsSupported will be false, and Vector512 operators/intrinsics will throw PlatformNotSupportedException. Gate these checks behind Avx512F.IsSupported (and ideally also ensure Avx is supported) so the test doesn't fail on non-AVX512 hardware.

Suggested change
Console.WriteLine("=== Vector512 tests ===");
Console.WriteLine($" Avx512F.IsSupported = {Avx512F.IsSupported}");
var v512a = Vector512.Create(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16);
var v512b = Vector512.Create(10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160);
pass &= Check(AddVec512(v512a, v512b),
Vector512.Create(11, 22, 33, 44, 55, 66, 77, 88, 99, 110, 121, 132, 143, 154, 165, 176),
"AddVec512");
pass &= Check(ReturnVec512(99), Vector512.Create(99), "ReturnVec512");
pass &= Check(MixedArgsVec512(3, Vector512.Create(10.0f), 7L),
Vector512.Create(20.0f), "MixedArgsVec512");
if (Avx.IsSupported && Avx512F.IsSupported)
{
Console.WriteLine("=== Vector512 tests ===");
Console.WriteLine($" Avx512F.IsSupported = {Avx512F.IsSupported}");
var v512a = Vector512.Create(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16);
var v512b = Vector512.Create(10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160);
pass &= Check(AddVec512(v512a, v512b),
Vector512.Create(11, 22, 33, 44, 55, 66, 77, 88, 99, 110, 121, 132, 143, 154, 165, 176),
"AddVec512");
pass &= Check(ReturnVec512(99), Vector512.Create(99), "ReturnVec512");
pass &= Check(MixedArgsVec512(3, Vector512.Create(10.0f), 7L),
Vector512.Create(20.0f), "MixedArgsVec512");
}
else
{
Console.WriteLine("=== Vector512 tests skipped: AVX/AVX-512 not supported ===");
}

Copilot uses AI. Check for mistakes.
Comment on lines +10 to +16
// Test that Vector128, Vector256, and Vector512 are correctly passed as arguments
// and returned from methods on System V x64 (Linux), verifying the single-register
// SIMD passing path in the JIT's SysVX64Classifier.
//
// Vector128 (16B) -> XMM register
// Vector256 (32B) -> YMM register (requires AVX)
// Vector512 (64B) -> ZMM register (requires AVX-512)
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is described as validating the System V x64 ABI classifier, but it currently runs (and will trivially pass) on non-Unix-x64 platforms as well, which reduces coverage and can hide regressions. Consider adding a platform guard (e.g., x64 && !Windows) and skipping/returning PASS when not on SysV x64.

Copilot uses AI. Check for mistakes.
Comment on lines 151 to 179
[Fact]
public static int TestEntryPoint()
{
Console.WriteLine($"Vector256<int>.Count = {Vector256<int>.Count}");
Console.WriteLine($"Vector512<int>.Count = {Vector512<int>.Count}");

// ---- Single Vector256 tests ----
Console.WriteLine("=== Single Vector256 tests ===");

var v256 = Vector256.Create(1, 2, 3, 4, 5, 6, 7, 8);
Check("PassSingle256", PassSingle256(v256) == v256);

Check("Add256", Add256(Vector256.Create(1), Vector256.Create(2)) == Vector256.Create(3));

Check("PassMany256", PassMany256(
Vector256.Create(1), Vector256.Create(2), Vector256.Create(3), Vector256.Create(4)) == Vector256.Create(10));

Check("Mixed256", Mixed256(3, Vector256.Create(10), 7L) == Vector256.Create(20));

// ---- Single Vector512 tests ----
Console.WriteLine("=== Single Vector512 tests ===");

var v512 = Vector512.Create(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16);
Check("PassSingle512", PassSingle512(v512) == v512);

Check("Add512", Add512(Vector512.Create(1), Vector512.Create(2)) == Vector512.Create(3));

Check("Mixed512", Mixed512(3, Vector512.Create(10), 7L) == Vector512.Create(20));

Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test uses Vector256 and Vector512 operations without checking Avx/Avx512 support. On hardware that doesn't support these ISAs, these intrinsics will throw and the test will fail spuriously. Add guards (e.g., ConditionalFact or early-return PASS) around the Vector256 and Vector512 portions based on Avx.IsSupported / Avx512F.IsSupported.

Copilot uses AI. Check for mistakes.
Comment on lines 10 to 11
// when they fit in the System V ABI limit (16 bytes).

Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says Vector is tested only when it fits the SysV 16-byte limit, but Vector is 32 bytes on AVX-capable machines. Consider updating the comment (and/or adding a runtime print/assert) to reflect that the vector size varies by hardware and the intent is to validate register passing for the active size.

Suggested change
// when they fit in the System V ABI limit (16 bytes).
// for the active hardware-dependent vector size (for example, 16 bytes with SSE or
// 32 bytes on AVX-capable machines) according to the System V ABI.

Copilot uses AI. Check for mistakes.
typeof(byte), typeof(short), typeof(int), typeof(long),
typeof(float), typeof(double), typeof(Int128),
typeof(Vector<int>), typeof(Vector128<int>), typeof(Vector256<int>),
typeof(Vector<int>), typeof(Vector128<int>), typeof(Vector256<int>), typeof(Vector512<int>),
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm64Abi's comment says structs larger than 16 bytes are passed by-ref and inhibit tailcalls, but this list now includes Vector512 (64 bytes). That likely defeats the purpose of TailCalleeCandidateArgTypes on arm64; consider removing Vector512 (and any other >16B types) or updating the ABI logic/comment accordingly.

Suggested change
typeof(Vector<int>), typeof(Vector128<int>), typeof(Vector256<int>), typeof(Vector512<int>),
typeof(Vector<int>), typeof(Vector128<int>), typeof(Vector256<int>),

Copilot uses AI. Check for mistakes.
Comment on lines 229 to 254
var v256a = Vector256.Create(1, 2, 3, 4, 5, 6, 7, 8);
var v256b = Vector256.Create(10, 20, 30, 40, 50, 60, 70, 80);
pass &= Check(AddVec256(v256a, v256b),
Vector256.Create(11, 22, 33, 44, 55, 66, 77, 88), "AddVec256");

pass &= Check(
PassManyVec256(
Vector256.Create(1), Vector256.Create(2), Vector256.Create(3),
Vector256.Create(4), 100),
Vector256.Create(110),
"PassManyVec256");

pass &= Check(MixedArgsVec256(3, Vector256.Create(10.0f), 7L),
Vector256.Create(20.0f), "MixedArgsVec256");

pass &= Check(ReturnVec256(42), Vector256.Create(42), "ReturnVec256");

// --- Vector256<double> tests ---
pass &= Check(AddVec256D(Vector256.Create(1.0), Vector256.Create(2.0)),
Vector256.Create(3.0), "AddVec256D");

// --- Vector256 chained return tests ---
pass &= Check(
ChainVec256(Vector256.Create(1.0f), Vector256.Create(2.0f),
Vector256.Create(0.5f), Vector256.Create(0.1f)),
Vector256.Create(3.6f), "ChainVec256");
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Vector256 test section executes Vector256 operations unconditionally. On machines where AVX isn't supported (Avx.IsSupported == false), most Vector256 operators/intrinsics will throw PlatformNotSupportedException, causing the test to fail for the wrong reason. Gate the Vector256-specific checks behind Avx.IsSupported (or return PASS/skip when not supported).

Suggested change
var v256a = Vector256.Create(1, 2, 3, 4, 5, 6, 7, 8);
var v256b = Vector256.Create(10, 20, 30, 40, 50, 60, 70, 80);
pass &= Check(AddVec256(v256a, v256b),
Vector256.Create(11, 22, 33, 44, 55, 66, 77, 88), "AddVec256");
pass &= Check(
PassManyVec256(
Vector256.Create(1), Vector256.Create(2), Vector256.Create(3),
Vector256.Create(4), 100),
Vector256.Create(110),
"PassManyVec256");
pass &= Check(MixedArgsVec256(3, Vector256.Create(10.0f), 7L),
Vector256.Create(20.0f), "MixedArgsVec256");
pass &= Check(ReturnVec256(42), Vector256.Create(42), "ReturnVec256");
// --- Vector256<double> tests ---
pass &= Check(AddVec256D(Vector256.Create(1.0), Vector256.Create(2.0)),
Vector256.Create(3.0), "AddVec256D");
// --- Vector256 chained return tests ---
pass &= Check(
ChainVec256(Vector256.Create(1.0f), Vector256.Create(2.0f),
Vector256.Create(0.5f), Vector256.Create(0.1f)),
Vector256.Create(3.6f), "ChainVec256");
if (Avx.IsSupported)
{
var v256a = Vector256.Create(1, 2, 3, 4, 5, 6, 7, 8);
var v256b = Vector256.Create(10, 20, 30, 40, 50, 60, 70, 80);
pass &= Check(AddVec256(v256a, v256b),
Vector256.Create(11, 22, 33, 44, 55, 66, 77, 88), "AddVec256");
pass &= Check(
PassManyVec256(
Vector256.Create(1), Vector256.Create(2), Vector256.Create(3),
Vector256.Create(4), 100),
Vector256.Create(110),
"PassManyVec256");
pass &= Check(MixedArgsVec256(3, Vector256.Create(10.0f), 7L),
Vector256.Create(20.0f), "MixedArgsVec256");
pass &= Check(ReturnVec256(42), Vector256.Create(42), "ReturnVec256");
// --- Vector256<double> tests ---
pass &= Check(AddVec256D(Vector256.Create(1.0), Vector256.Create(2.0)),
Vector256.Create(3.0), "AddVec256D");
// --- Vector256 chained return tests ---
pass &= Check(
ChainVec256(Vector256.Create(1.0f), Vector256.Create(2.0f),
Vector256.Create(0.5f), Vector256.Create(0.1f)),
Vector256.Create(3.6f), "ChainVec256");
}
else
{
Console.WriteLine(" Skipping Vector256 tests because AVX is not supported.");
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider passing and returning Vector<T> in registers

1 participant