Pass vectors via simd registers on Unix x64 by reedz · Pull Request #124267 · dotnet/runtime

reedz · 2026-02-11T12:36:35Z

JIT and VM on Unix AMD64 (System V ABI) is updated to pass Vector64, Vector128, Vector256, and Vector512 arguments and return values in single SIMD registers (XMM/YMM/ZMM) instead of splitting them across multiple eightbyte slots or spilling to the stack.

Benchmark results seem favorable:

Benchmark	Main (ns)	Branch (ns)	Speedup
Vector128<float> Add (2 args)	5.75	3.28	1.75×
Vector128<float> Chain (4 args)	14.29	6.76	2.11×
Vector256<int> Add (2 args)	6.47	3.57	1.81×
Vector512<int> Add (2 args)	10.77	7.42	1.45×

Copilot

Pull request overview

Updates CoreCLR JIT + VM on Unix AMD64 (SysV ABI) to treat opaque SIMD vector structs as single-register values (XMM/YMM/ZMM) for argument passing and returns, and adds targeted JIT tests for the new ABI behavior.

Changes:

Teach the SysV x64 ABI classifier/return logic to pass and return Vector64/128/256/512 as single SIMD-register values when supported (and adjust codegen/LSRA accordingly).
Update VM and tooling-side SysV struct classification for SIMD intrinsics (to avoid misclassification via internal field layouts) and add runtime plumbing/macros to preserve upper halves of YMM/ZMM regs across relevant stubs.
Add new directed JIT tests covering register passing/return scenarios for intrinsic vectors and System.Numerics.Vector.

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
src/tests/JIT/Stress/ABI/ABIs.cs	Expands ABI stress tailcall candidate types to include Vector512.
src/tests/JIT/Directed/VectorABI/VectorRegPassSysV.csproj	New directed test project for SysV SIMD register passing.
src/tests/JIT/Directed/VectorABI/VectorRegPassSysV.cs	New SysV x64 test for Vector64/128/256/512 arg/return paths.
src/tests/JIT/Directed/VectorABI/VectorNumericsRegPass.csproj	New directed test project for System.Numerics.Vector passing.
src/tests/JIT/Directed/VectorABI/VectorNumericsRegPass.cs	New test validating Vector parameter/return behavior.
src/tests/JIT/Directed/VectorABI/VectorMgdMgd256_ro.csproj	New project file for VectorMgdMgd256 test variant.
src/tests/JIT/Directed/VectorABI/VectorMgdMgd256.cs	New/updated HVA + Vector256/512 coverage.
src/coreclr/vm/vars.hpp	Declares Unix AMD64 runtime flags for AVX/AVX512 support.
src/coreclr/vm/vars.cpp	Defines Unix AMD64 runtime flags for AVX/AVX512 support.
src/coreclr/vm/codeman.cpp	Sets AVX/AVX512 runtime flags based on feature detection/config.
src/coreclr/vm/methodtable.cpp	Forces SIMD intrinsic types to classify as SSE (avoid field-walk misclassification).
src/coreclr/vm/amd64/virtualcallstubamd64.S	Extends transition stubs to preserve upper vector register state before helper calls.
src/coreclr/vm/amd64/unixasmhelpers.S	Updates tiered compilation stub to preserve upper vector register state.
src/coreclr/vm/amd64/theprestubamd64.S	Updates prestub to preserve upper vector register state (incl. Swift retbuf offset).
src/coreclr/vm/amd64/externalmethodfixupthunk.S	Updates delay-load thunk to preserve upper vector register state.
src/coreclr/vm/amd64/CachedInterfaceDispatchCoreCLR.S	Updates interface dispatch slow paths to preserve upper vector register state.
src/coreclr/pal/inc/unixasmmacrosamd64.inc	Adds SAVE/RESTORE_UPPER_VECTOR_REGISTERS macros driven by runtime AVX flags.
src/coreclr/tools/Common/JitInterface/SystemVStructClassificator.cs	Mirrors VM SIMD classification changes for tool-side classifier.
src/coreclr/jit/targetamd64.cpp	Adds SysV x64 single-register SIMD passing (XMM/YMM/ZMM) in classifier.
src/coreclr/jit/compiler.h	Adds helper to identify “single register” opaque SIMD types for ABI decisions.
src/coreclr/jit/compiler.cpp	Returns opaque HW SIMD structs as a single SIMD register on Unix AMD64 when supported.
src/coreclr/jit/lsrabuild.cpp	Treats SIMD returns as single-float-reg uses on AMD64 in return building.
src/coreclr/jit/codegenxarch.cpp	Ensures SIMD returns use float return regs; avoids vzeroupper when returning ≥256-bit vectors.
src/coreclr/jit/abi.cpp	Extends float-reg segment typing to include 32/64-byte SIMD segments on xarch.

Copilot · 2026-02-11T12:47:34Z

src/tests/JIT/Directed/VectorABI/VectorRegPassSysV.cs

+        Console.WriteLine("=== Vector512 tests ===");
+        Console.WriteLine($"  Avx512F.IsSupported = {Avx512F.IsSupported}");
+
+        var v512a = Vector512.Create(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16);
+        var v512b = Vector512.Create(10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160);
+        pass &= Check(AddVec512(v512a, v512b),
+            Vector512.Create(11, 22, 33, 44, 55, 66, 77, 88, 99, 110, 121, 132, 143, 154, 165, 176),
+            "AddVec512");
+
+        pass &= Check(ReturnVec512(99), Vector512.Create(99), "ReturnVec512");
+
+        pass &= Check(MixedArgsVec512(3, Vector512.Create(10.0f), 7L),
+            Vector512.Create(20.0f), "MixedArgsVec512");


The Vector512 test section executes Vector512 operations unconditionally. On most test machines Avx512F.IsSupported will be false, and Vector512 operators/intrinsics will throw PlatformNotSupportedException. Gate these checks behind Avx512F.IsSupported (and ideally also ensure Avx is supported) so the test doesn't fail on non-AVX512 hardware.

Suggested change

Console.WriteLine("=== Vector512 tests ===");

Console.WriteLine($" Avx512F.IsSupported = {Avx512F.IsSupported}");

var v512a = Vector512.Create(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16);

var v512b = Vector512.Create(10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160);

pass &= Check(AddVec512(v512a, v512b),

Vector512.Create(11, 22, 33, 44, 55, 66, 77, 88, 99, 110, 121, 132, 143, 154, 165, 176),

"AddVec512");

pass &= Check(ReturnVec512(99), Vector512.Create(99), "ReturnVec512");

pass &= Check(MixedArgsVec512(3, Vector512.Create(10.0f), 7L),

Vector512.Create(20.0f), "MixedArgsVec512");

if (Avx.IsSupported && Avx512F.IsSupported)

{

Console.WriteLine("=== Vector512 tests ===");

Console.WriteLine($" Avx512F.IsSupported = {Avx512F.IsSupported}");

var v512a = Vector512.Create(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16);

var v512b = Vector512.Create(10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160);

pass &= Check(AddVec512(v512a, v512b),

Vector512.Create(11, 22, 33, 44, 55, 66, 77, 88, 99, 110, 121, 132, 143, 154, 165, 176),

"AddVec512");

pass &= Check(ReturnVec512(99), Vector512.Create(99), "ReturnVec512");

pass &= Check(MixedArgsVec512(3, Vector512.Create(10.0f), 7L),

Vector512.Create(20.0f), "MixedArgsVec512");

}

else

{

Console.WriteLine("=== Vector512 tests skipped: AVX/AVX-512 not supported ===");

}

Copilot · 2026-02-11T12:47:35Z

src/tests/JIT/Directed/VectorABI/VectorRegPassSysV.cs

+// Test that Vector128, Vector256, and Vector512 are correctly passed as arguments
+// and returned from methods on System V x64 (Linux), verifying the single-register
+// SIMD passing path in the JIT's SysVX64Classifier.
+//
+// Vector128 (16B) -> XMM register
+// Vector256 (32B) -> YMM register (requires AVX)
+// Vector512 (64B) -> ZMM register (requires AVX-512)


This test is described as validating the System V x64 ABI classifier, but it currently runs (and will trivially pass) on non-Unix-x64 platforms as well, which reduces coverage and can hide regressions. Consider adding a platform guard (e.g., x64 && !Windows) and skipping/returning PASS when not on SysV x64.

Copilot · 2026-02-11T12:47:35Z

src/tests/JIT/Directed/VectorABI/VectorMgdMgd256.cs

+    [Fact]
+    public static int TestEntryPoint()
+    {
+        Console.WriteLine($"Vector256<int>.Count = {Vector256<int>.Count}");
+        Console.WriteLine($"Vector512<int>.Count = {Vector512<int>.Count}");
+
+        // ---- Single Vector256 tests ----
+        Console.WriteLine("=== Single Vector256 tests ===");
+
+        var v256 = Vector256.Create(1, 2, 3, 4, 5, 6, 7, 8);
+        Check("PassSingle256", PassSingle256(v256) == v256);
+
+        Check("Add256", Add256(Vector256.Create(1), Vector256.Create(2)) == Vector256.Create(3));
+
+        Check("PassMany256", PassMany256(
+            Vector256.Create(1), Vector256.Create(2), Vector256.Create(3), Vector256.Create(4)) == Vector256.Create(10));
+
+        Check("Mixed256", Mixed256(3, Vector256.Create(10), 7L) == Vector256.Create(20));
+
+        // ---- Single Vector512 tests ----
+        Console.WriteLine("=== Single Vector512 tests ===");
+
+        var v512 = Vector512.Create(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16);
+        Check("PassSingle512", PassSingle512(v512) == v512);
+
+        Check("Add512", Add512(Vector512.Create(1), Vector512.Create(2)) == Vector512.Create(3));
+
+        Check("Mixed512", Mixed512(3, Vector512.Create(10), 7L) == Vector512.Create(20));
+


The test uses Vector256 and Vector512 operations without checking Avx/Avx512 support. On hardware that doesn't support these ISAs, these intrinsics will throw and the test will fail spuriously. Add guards (e.g., ConditionalFact or early-return PASS) around the Vector256 and Vector512 portions based on Avx.IsSupported / Avx512F.IsSupported.

Copilot · 2026-02-11T12:47:36Z

src/tests/JIT/Directed/VectorABI/VectorNumericsRegPass.cs

+// when they fit in the System V ABI limit (16 bytes).
+


The comment says Vector is tested only when it fits the SysV 16-byte limit, but Vector is 32 bytes on AVX-capable machines. Consider updating the comment (and/or adding a runtime print/assert) to reflect that the vector size varies by hardware and the intent is to validate register passing for the active size.

Suggested change

// when they fit in the System V ABI limit (16 bytes).

// for the active hardware-dependent vector size (for example, 16 bytes with SSE or

// 32 bytes on AVX-capable machines) according to the System V ABI.

Copilot · 2026-02-11T12:47:36Z

src/tests/JIT/Stress/ABI/ABIs.cs

                typeof(byte), typeof(short), typeof(int), typeof(long),
                typeof(float), typeof(double), typeof(Int128),
-                typeof(Vector<int>), typeof(Vector128<int>), typeof(Vector256<int>),
+                typeof(Vector<int>), typeof(Vector128<int>), typeof(Vector256<int>), typeof(Vector512<int>),


Arm64Abi's comment says structs larger than 16 bytes are passed by-ref and inhibit tailcalls, but this list now includes Vector512 (64 bytes). That likely defeats the purpose of TailCalleeCandidateArgTypes on arm64; consider removing Vector512 (and any other >16B types) or updating the ABI logic/comment accordingly.

Suggested change

typeof(Vector<int>), typeof(Vector128<int>), typeof(Vector256<int>), typeof(Vector512<int>),

typeof(Vector<int>), typeof(Vector128<int>), typeof(Vector256<int>),

Copilot · 2026-02-11T12:47:36Z

src/tests/JIT/Directed/VectorABI/VectorRegPassSysV.cs

+        var v256a = Vector256.Create(1, 2, 3, 4, 5, 6, 7, 8);
+        var v256b = Vector256.Create(10, 20, 30, 40, 50, 60, 70, 80);
+        pass &= Check(AddVec256(v256a, v256b),
+            Vector256.Create(11, 22, 33, 44, 55, 66, 77, 88), "AddVec256");
+
+        pass &= Check(
+            PassManyVec256(
+                Vector256.Create(1), Vector256.Create(2), Vector256.Create(3),
+                Vector256.Create(4), 100),
+            Vector256.Create(110),
+            "PassManyVec256");
+
+        pass &= Check(MixedArgsVec256(3, Vector256.Create(10.0f), 7L),
+            Vector256.Create(20.0f), "MixedArgsVec256");
+
+        pass &= Check(ReturnVec256(42), Vector256.Create(42), "ReturnVec256");
+
+        // --- Vector256<double> tests ---
+        pass &= Check(AddVec256D(Vector256.Create(1.0), Vector256.Create(2.0)),
+            Vector256.Create(3.0), "AddVec256D");
+
+        // --- Vector256 chained return tests ---
+        pass &= Check(
+            ChainVec256(Vector256.Create(1.0f), Vector256.Create(2.0f),
+                        Vector256.Create(0.5f), Vector256.Create(0.1f)),
+            Vector256.Create(3.6f), "ChainVec256");


The Vector256 test section executes Vector256 operations unconditionally. On machines where AVX isn't supported (Avx.IsSupported == false), most Vector256 operators/intrinsics will throw PlatformNotSupportedException, causing the test to fail for the wrong reason. Gate the Vector256-specific checks behind Avx.IsSupported (or return PASS/skip when not supported).

Suggested change

var v256a = Vector256.Create(1, 2, 3, 4, 5, 6, 7, 8);

var v256b = Vector256.Create(10, 20, 30, 40, 50, 60, 70, 80);

pass &= Check(AddVec256(v256a, v256b),

Vector256.Create(11, 22, 33, 44, 55, 66, 77, 88), "AddVec256");

pass &= Check(

PassManyVec256(

Vector256.Create(1), Vector256.Create(2), Vector256.Create(3),

Vector256.Create(4), 100),

Vector256.Create(110),

"PassManyVec256");

pass &= Check(MixedArgsVec256(3, Vector256.Create(10.0f), 7L),

Vector256.Create(20.0f), "MixedArgsVec256");

pass &= Check(ReturnVec256(42), Vector256.Create(42), "ReturnVec256");

// --- Vector256<double> tests ---

pass &= Check(AddVec256D(Vector256.Create(1.0), Vector256.Create(2.0)),

Vector256.Create(3.0), "AddVec256D");

// --- Vector256 chained return tests ---

pass &= Check(

ChainVec256(Vector256.Create(1.0f), Vector256.Create(2.0f),

Vector256.Create(0.5f), Vector256.Create(0.1f)),

Vector256.Create(3.6f), "ChainVec256");

if (Avx.IsSupported)

{

var v256a = Vector256.Create(1, 2, 3, 4, 5, 6, 7, 8);

var v256b = Vector256.Create(10, 20, 30, 40, 50, 60, 70, 80);

pass &= Check(AddVec256(v256a, v256b),

Vector256.Create(11, 22, 33, 44, 55, 66, 77, 88), "AddVec256");

pass &= Check(

PassManyVec256(

Vector256.Create(1), Vector256.Create(2), Vector256.Create(3),

Vector256.Create(4), 100),

Vector256.Create(110),

"PassManyVec256");

pass &= Check(MixedArgsVec256(3, Vector256.Create(10.0f), 7L),

Vector256.Create(20.0f), "MixedArgsVec256");

pass &= Check(ReturnVec256(42), Vector256.Create(42), "ReturnVec256");

// --- Vector256<double> tests ---

pass &= Check(AddVec256D(Vector256.Create(1.0), Vector256.Create(2.0)),

Vector256.Create(3.0), "AddVec256D");

// --- Vector256 chained return tests ---

pass &= Check(

ChainVec256(Vector256.Create(1.0f), Vector256.Create(2.0f),

Vector256.Create(0.5f), Vector256.Create(0.1f)),

Vector256.Create(3.6f), "ChainVec256");

}

else

{

Console.WriteLine(" Skipping Vector256 tests because AVX is not supported.");

}

…egisters

reedz added 2 commits February 10, 2026 22:18

pass vector 64/128/256/512 via simd registers

ac3e09e

furhter fixes and tests

00f2625

reedz requested a review from MichalStrehovsky as a code owner February 11, 2026 12:36

Copilot AI review requested due to automatic review settings February 11, 2026 12:36

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 11, 2026

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Feb 11, 2026

Copilot started reviewing on behalf of reedz February 11, 2026 12:37 View session

Copilot AI reviewed Feb 11, 2026

View reviewed changes

reedz added 2 commits February 11, 2026 13:53

Merge remote-tracking branch 'origin/main' into feature/simd-vector-r…

cfc840d

…egisters

fix build

7cc106b

This was referenced Feb 11, 2026

[Android][CoreCLR] System.Security.Cryptography.Tests killed by lowmemorykiller #118603

Open

System.Security.Cryptography.CryptographicException : m_safeCertContext is an invalid handle. #124279

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pass vectors via simd registers on Unix x64#124267

Pass vectors via simd registers on Unix x64#124267
reedz wants to merge 4 commits intodotnet:mainfrom
reedz:feature/simd-vector-registers

reedz commented Feb 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	// when they fit in the System V ABI limit (16 bytes).
	// for the active hardware-dependent vector size (for example, 16 bytes with SSE or
	// 32 bytes on AVX-capable machines) according to the System V ABI.

	typeof(Vector<int>), typeof(Vector128<int>), typeof(Vector256<int>), typeof(Vector512<int>),
	typeof(Vector<int>), typeof(Vector128<int>), typeof(Vector256<int>),

Conversation

reedz commented Feb 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant