Pass vectors via simd registers on Unix x64#124267
Pass vectors via simd registers on Unix x64#124267reedz wants to merge 4 commits intodotnet:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Updates CoreCLR JIT + VM on Unix AMD64 (SysV ABI) to treat opaque SIMD vector structs as single-register values (XMM/YMM/ZMM) for argument passing and returns, and adds targeted JIT tests for the new ABI behavior.
Changes:
- Teach the SysV x64 ABI classifier/return logic to pass and return Vector64/128/256/512 as single SIMD-register values when supported (and adjust codegen/LSRA accordingly).
- Update VM and tooling-side SysV struct classification for SIMD intrinsics (to avoid misclassification via internal field layouts) and add runtime plumbing/macros to preserve upper halves of YMM/ZMM regs across relevant stubs.
- Add new directed JIT tests covering register passing/return scenarios for intrinsic vectors and System.Numerics.Vector.
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/tests/JIT/Stress/ABI/ABIs.cs | Expands ABI stress tailcall candidate types to include Vector512. |
| src/tests/JIT/Directed/VectorABI/VectorRegPassSysV.csproj | New directed test project for SysV SIMD register passing. |
| src/tests/JIT/Directed/VectorABI/VectorRegPassSysV.cs | New SysV x64 test for Vector64/128/256/512 arg/return paths. |
| src/tests/JIT/Directed/VectorABI/VectorNumericsRegPass.csproj | New directed test project for System.Numerics.Vector passing. |
| src/tests/JIT/Directed/VectorABI/VectorNumericsRegPass.cs | New test validating Vector parameter/return behavior. |
| src/tests/JIT/Directed/VectorABI/VectorMgdMgd256_ro.csproj | New project file for VectorMgdMgd256 test variant. |
| src/tests/JIT/Directed/VectorABI/VectorMgdMgd256.cs | New/updated HVA + Vector256/512 coverage. |
| src/coreclr/vm/vars.hpp | Declares Unix AMD64 runtime flags for AVX/AVX512 support. |
| src/coreclr/vm/vars.cpp | Defines Unix AMD64 runtime flags for AVX/AVX512 support. |
| src/coreclr/vm/codeman.cpp | Sets AVX/AVX512 runtime flags based on feature detection/config. |
| src/coreclr/vm/methodtable.cpp | Forces SIMD intrinsic types to classify as SSE (avoid field-walk misclassification). |
| src/coreclr/vm/amd64/virtualcallstubamd64.S | Extends transition stubs to preserve upper vector register state before helper calls. |
| src/coreclr/vm/amd64/unixasmhelpers.S | Updates tiered compilation stub to preserve upper vector register state. |
| src/coreclr/vm/amd64/theprestubamd64.S | Updates prestub to preserve upper vector register state (incl. Swift retbuf offset). |
| src/coreclr/vm/amd64/externalmethodfixupthunk.S | Updates delay-load thunk to preserve upper vector register state. |
| src/coreclr/vm/amd64/CachedInterfaceDispatchCoreCLR.S | Updates interface dispatch slow paths to preserve upper vector register state. |
| src/coreclr/pal/inc/unixasmmacrosamd64.inc | Adds SAVE/RESTORE_UPPER_VECTOR_REGISTERS macros driven by runtime AVX flags. |
| src/coreclr/tools/Common/JitInterface/SystemVStructClassificator.cs | Mirrors VM SIMD classification changes for tool-side classifier. |
| src/coreclr/jit/targetamd64.cpp | Adds SysV x64 single-register SIMD passing (XMM/YMM/ZMM) in classifier. |
| src/coreclr/jit/compiler.h | Adds helper to identify “single register” opaque SIMD types for ABI decisions. |
| src/coreclr/jit/compiler.cpp | Returns opaque HW SIMD structs as a single SIMD register on Unix AMD64 when supported. |
| src/coreclr/jit/lsrabuild.cpp | Treats SIMD returns as single-float-reg uses on AMD64 in return building. |
| src/coreclr/jit/codegenxarch.cpp | Ensures SIMD returns use float return regs; avoids vzeroupper when returning ≥256-bit vectors. |
| src/coreclr/jit/abi.cpp | Extends float-reg segment typing to include 32/64-byte SIMD segments on xarch. |
| Console.WriteLine("=== Vector512 tests ==="); | ||
| Console.WriteLine($" Avx512F.IsSupported = {Avx512F.IsSupported}"); | ||
|
|
||
| var v512a = Vector512.Create(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16); | ||
| var v512b = Vector512.Create(10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160); | ||
| pass &= Check(AddVec512(v512a, v512b), | ||
| Vector512.Create(11, 22, 33, 44, 55, 66, 77, 88, 99, 110, 121, 132, 143, 154, 165, 176), | ||
| "AddVec512"); | ||
|
|
||
| pass &= Check(ReturnVec512(99), Vector512.Create(99), "ReturnVec512"); | ||
|
|
||
| pass &= Check(MixedArgsVec512(3, Vector512.Create(10.0f), 7L), | ||
| Vector512.Create(20.0f), "MixedArgsVec512"); |
There was a problem hiding this comment.
The Vector512 test section executes Vector512 operations unconditionally. On most test machines Avx512F.IsSupported will be false, and Vector512 operators/intrinsics will throw PlatformNotSupportedException. Gate these checks behind Avx512F.IsSupported (and ideally also ensure Avx is supported) so the test doesn't fail on non-AVX512 hardware.
| Console.WriteLine("=== Vector512 tests ==="); | |
| Console.WriteLine($" Avx512F.IsSupported = {Avx512F.IsSupported}"); | |
| var v512a = Vector512.Create(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16); | |
| var v512b = Vector512.Create(10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160); | |
| pass &= Check(AddVec512(v512a, v512b), | |
| Vector512.Create(11, 22, 33, 44, 55, 66, 77, 88, 99, 110, 121, 132, 143, 154, 165, 176), | |
| "AddVec512"); | |
| pass &= Check(ReturnVec512(99), Vector512.Create(99), "ReturnVec512"); | |
| pass &= Check(MixedArgsVec512(3, Vector512.Create(10.0f), 7L), | |
| Vector512.Create(20.0f), "MixedArgsVec512"); | |
| if (Avx.IsSupported && Avx512F.IsSupported) | |
| { | |
| Console.WriteLine("=== Vector512 tests ==="); | |
| Console.WriteLine($" Avx512F.IsSupported = {Avx512F.IsSupported}"); | |
| var v512a = Vector512.Create(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16); | |
| var v512b = Vector512.Create(10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160); | |
| pass &= Check(AddVec512(v512a, v512b), | |
| Vector512.Create(11, 22, 33, 44, 55, 66, 77, 88, 99, 110, 121, 132, 143, 154, 165, 176), | |
| "AddVec512"); | |
| pass &= Check(ReturnVec512(99), Vector512.Create(99), "ReturnVec512"); | |
| pass &= Check(MixedArgsVec512(3, Vector512.Create(10.0f), 7L), | |
| Vector512.Create(20.0f), "MixedArgsVec512"); | |
| } | |
| else | |
| { | |
| Console.WriteLine("=== Vector512 tests skipped: AVX/AVX-512 not supported ==="); | |
| } |
| // Test that Vector128, Vector256, and Vector512 are correctly passed as arguments | ||
| // and returned from methods on System V x64 (Linux), verifying the single-register | ||
| // SIMD passing path in the JIT's SysVX64Classifier. | ||
| // | ||
| // Vector128 (16B) -> XMM register | ||
| // Vector256 (32B) -> YMM register (requires AVX) | ||
| // Vector512 (64B) -> ZMM register (requires AVX-512) |
There was a problem hiding this comment.
This test is described as validating the System V x64 ABI classifier, but it currently runs (and will trivially pass) on non-Unix-x64 platforms as well, which reduces coverage and can hide regressions. Consider adding a platform guard (e.g., x64 && !Windows) and skipping/returning PASS when not on SysV x64.
| [Fact] | ||
| public static int TestEntryPoint() | ||
| { | ||
| Console.WriteLine($"Vector256<int>.Count = {Vector256<int>.Count}"); | ||
| Console.WriteLine($"Vector512<int>.Count = {Vector512<int>.Count}"); | ||
|
|
||
| // ---- Single Vector256 tests ---- | ||
| Console.WriteLine("=== Single Vector256 tests ==="); | ||
|
|
||
| var v256 = Vector256.Create(1, 2, 3, 4, 5, 6, 7, 8); | ||
| Check("PassSingle256", PassSingle256(v256) == v256); | ||
|
|
||
| Check("Add256", Add256(Vector256.Create(1), Vector256.Create(2)) == Vector256.Create(3)); | ||
|
|
||
| Check("PassMany256", PassMany256( | ||
| Vector256.Create(1), Vector256.Create(2), Vector256.Create(3), Vector256.Create(4)) == Vector256.Create(10)); | ||
|
|
||
| Check("Mixed256", Mixed256(3, Vector256.Create(10), 7L) == Vector256.Create(20)); | ||
|
|
||
| // ---- Single Vector512 tests ---- | ||
| Console.WriteLine("=== Single Vector512 tests ==="); | ||
|
|
||
| var v512 = Vector512.Create(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16); | ||
| Check("PassSingle512", PassSingle512(v512) == v512); | ||
|
|
||
| Check("Add512", Add512(Vector512.Create(1), Vector512.Create(2)) == Vector512.Create(3)); | ||
|
|
||
| Check("Mixed512", Mixed512(3, Vector512.Create(10), 7L) == Vector512.Create(20)); | ||
|
|
There was a problem hiding this comment.
The test uses Vector256 and Vector512 operations without checking Avx/Avx512 support. On hardware that doesn't support these ISAs, these intrinsics will throw and the test will fail spuriously. Add guards (e.g., ConditionalFact or early-return PASS) around the Vector256 and Vector512 portions based on Avx.IsSupported / Avx512F.IsSupported.
| // when they fit in the System V ABI limit (16 bytes). | ||
|
|
There was a problem hiding this comment.
The comment says Vector is tested only when it fits the SysV 16-byte limit, but Vector is 32 bytes on AVX-capable machines. Consider updating the comment (and/or adding a runtime print/assert) to reflect that the vector size varies by hardware and the intent is to validate register passing for the active size.
| // when they fit in the System V ABI limit (16 bytes). | |
| // for the active hardware-dependent vector size (for example, 16 bytes with SSE or | |
| // 32 bytes on AVX-capable machines) according to the System V ABI. |
src/tests/JIT/Stress/ABI/ABIs.cs
Outdated
| typeof(byte), typeof(short), typeof(int), typeof(long), | ||
| typeof(float), typeof(double), typeof(Int128), | ||
| typeof(Vector<int>), typeof(Vector128<int>), typeof(Vector256<int>), | ||
| typeof(Vector<int>), typeof(Vector128<int>), typeof(Vector256<int>), typeof(Vector512<int>), |
There was a problem hiding this comment.
Arm64Abi's comment says structs larger than 16 bytes are passed by-ref and inhibit tailcalls, but this list now includes Vector512 (64 bytes). That likely defeats the purpose of TailCalleeCandidateArgTypes on arm64; consider removing Vector512 (and any other >16B types) or updating the ABI logic/comment accordingly.
| typeof(Vector<int>), typeof(Vector128<int>), typeof(Vector256<int>), typeof(Vector512<int>), | |
| typeof(Vector<int>), typeof(Vector128<int>), typeof(Vector256<int>), |
| var v256a = Vector256.Create(1, 2, 3, 4, 5, 6, 7, 8); | ||
| var v256b = Vector256.Create(10, 20, 30, 40, 50, 60, 70, 80); | ||
| pass &= Check(AddVec256(v256a, v256b), | ||
| Vector256.Create(11, 22, 33, 44, 55, 66, 77, 88), "AddVec256"); | ||
|
|
||
| pass &= Check( | ||
| PassManyVec256( | ||
| Vector256.Create(1), Vector256.Create(2), Vector256.Create(3), | ||
| Vector256.Create(4), 100), | ||
| Vector256.Create(110), | ||
| "PassManyVec256"); | ||
|
|
||
| pass &= Check(MixedArgsVec256(3, Vector256.Create(10.0f), 7L), | ||
| Vector256.Create(20.0f), "MixedArgsVec256"); | ||
|
|
||
| pass &= Check(ReturnVec256(42), Vector256.Create(42), "ReturnVec256"); | ||
|
|
||
| // --- Vector256<double> tests --- | ||
| pass &= Check(AddVec256D(Vector256.Create(1.0), Vector256.Create(2.0)), | ||
| Vector256.Create(3.0), "AddVec256D"); | ||
|
|
||
| // --- Vector256 chained return tests --- | ||
| pass &= Check( | ||
| ChainVec256(Vector256.Create(1.0f), Vector256.Create(2.0f), | ||
| Vector256.Create(0.5f), Vector256.Create(0.1f)), | ||
| Vector256.Create(3.6f), "ChainVec256"); |
There was a problem hiding this comment.
The Vector256 test section executes Vector256 operations unconditionally. On machines where AVX isn't supported (Avx.IsSupported == false), most Vector256 operators/intrinsics will throw PlatformNotSupportedException, causing the test to fail for the wrong reason. Gate the Vector256-specific checks behind Avx.IsSupported (or return PASS/skip when not supported).
| var v256a = Vector256.Create(1, 2, 3, 4, 5, 6, 7, 8); | |
| var v256b = Vector256.Create(10, 20, 30, 40, 50, 60, 70, 80); | |
| pass &= Check(AddVec256(v256a, v256b), | |
| Vector256.Create(11, 22, 33, 44, 55, 66, 77, 88), "AddVec256"); | |
| pass &= Check( | |
| PassManyVec256( | |
| Vector256.Create(1), Vector256.Create(2), Vector256.Create(3), | |
| Vector256.Create(4), 100), | |
| Vector256.Create(110), | |
| "PassManyVec256"); | |
| pass &= Check(MixedArgsVec256(3, Vector256.Create(10.0f), 7L), | |
| Vector256.Create(20.0f), "MixedArgsVec256"); | |
| pass &= Check(ReturnVec256(42), Vector256.Create(42), "ReturnVec256"); | |
| // --- Vector256<double> tests --- | |
| pass &= Check(AddVec256D(Vector256.Create(1.0), Vector256.Create(2.0)), | |
| Vector256.Create(3.0), "AddVec256D"); | |
| // --- Vector256 chained return tests --- | |
| pass &= Check( | |
| ChainVec256(Vector256.Create(1.0f), Vector256.Create(2.0f), | |
| Vector256.Create(0.5f), Vector256.Create(0.1f)), | |
| Vector256.Create(3.6f), "ChainVec256"); | |
| if (Avx.IsSupported) | |
| { | |
| var v256a = Vector256.Create(1, 2, 3, 4, 5, 6, 7, 8); | |
| var v256b = Vector256.Create(10, 20, 30, 40, 50, 60, 70, 80); | |
| pass &= Check(AddVec256(v256a, v256b), | |
| Vector256.Create(11, 22, 33, 44, 55, 66, 77, 88), "AddVec256"); | |
| pass &= Check( | |
| PassManyVec256( | |
| Vector256.Create(1), Vector256.Create(2), Vector256.Create(3), | |
| Vector256.Create(4), 100), | |
| Vector256.Create(110), | |
| "PassManyVec256"); | |
| pass &= Check(MixedArgsVec256(3, Vector256.Create(10.0f), 7L), | |
| Vector256.Create(20.0f), "MixedArgsVec256"); | |
| pass &= Check(ReturnVec256(42), Vector256.Create(42), "ReturnVec256"); | |
| // --- Vector256<double> tests --- | |
| pass &= Check(AddVec256D(Vector256.Create(1.0), Vector256.Create(2.0)), | |
| Vector256.Create(3.0), "AddVec256D"); | |
| // --- Vector256 chained return tests --- | |
| pass &= Check( | |
| ChainVec256(Vector256.Create(1.0f), Vector256.Create(2.0f), | |
| Vector256.Create(0.5f), Vector256.Create(0.1f)), | |
| Vector256.Create(3.6f), "ChainVec256"); | |
| } | |
| else | |
| { | |
| Console.WriteLine(" Skipping Vector256 tests because AVX is not supported."); | |
| } |
Fixes #5040.
JIT and VM on Unix AMD64 (System V ABI) is updated to pass Vector64, Vector128, Vector256, and Vector512 arguments and return values in single SIMD registers (XMM/YMM/ZMM) instead of splitting them across multiple eightbyte slots or spilling to the stack.
Benchmark results seem favorable: