-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Update BitArray.CopyTo to use cross-platform intrinsics #120627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update BitArray.CopyTo to use cross-platform intrinsics #120627
Conversation
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Tagging subscribers to this area: @dotnet/area-system-collections |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates BitArray.CopyTo
to use cross-platform intrinsic APIs instead of platform-specific ones, improving code portability across different hardware architectures. The changes replace platform-specific AVX512 and AVX2 intrinsics with their cross-platform equivalents.
Key changes:
- Replace AVX512BW/AVX512F intrinsics with Vector512 cross-platform APIs
- Replace AVX2/AVX intrinsics with Vector256 cross-platform APIs
- Use hardware acceleration detection and operator overloads for better portability
…te SIMD implementations Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Vector128<byte> extractedLower = Sse2.And(shuffledLower, bitMask128); | ||
Vector128<byte> normalizedLower = Sse2.Min(extractedLower, ones); | ||
Sse2.Store((byte*)destination + i, normalizedLower); | ||
Vector128<byte> shuffledLower = Vector128.Shuffle(scalar.AsByte(), lowerShuffleMask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's likely worth noting that since this is byte
, there shouldn't be any considerations. However, Vector.Shuffle
has to have deterministic behavior across platforms for "out of range" indices and so, for other T, can slightly pessimize to handle that scenario.
We're using a "vector constant" here, but its indirect and through a couple local copies. The JIT should still see it in such other scenarios and the cost is minimal, so shouldn't matter. But it's something to note anyways.
Fixes #114818 (comment)
This PR updates
BitArray.CopyTo
to use cross-platform intrinsic APIs instead of platform-specific ones, making the code more portable across different hardware architectures.Changes
AVX512 path (Vector512)
Avx512BW.IsSupported
→Vector512.IsHardwareAccelerated
Avx512BW.Shuffle(x, y)
→Vector512.Shuffle(x, y)
Avx512F.And(x, y)
→x & y
Avx512BW.Min(x, y)
→Vector512.Min(x, y)
Avx512F.Store(x, y)
→y.Store(x)
AVX2 path (Vector256)
Avx2.IsSupported
→Vector256.IsHardwareAccelerated
Avx2.Shuffle(x, y)
→Vector256.Shuffle(x, y)
Avx2.And(x, y)
→x & y
Avx2.Min(x, y)
→Vector256.Min(x, y)
Avx.Store(x, y)
→y.Store(x)
Unified Vector128 path
Ssse3.IsSupported
→Vector128.IsHardwareAccelerated
Ssse3.Shuffle(x, y)
→Vector128.Shuffle(x, y)
(internally usesPackedSimd.Swizzle
on WASM andAdvSimd.Arm64.VectorTableLookup
on ARM)Sse2.And(x, y)
/PackedSimd.And(x, y)
/AdvSimd.And(x, y)
→x & y
Sse2.Min(x, y)
/PackedSimd.Min(x, y)
/AdvSimd.Min(x, y)
→Vector128.Min(x, y)
Sse2.Store(x, y)
/PackedSimd.Store(x, y)
/AdvSimd.Arm64.StorePair(x, y, z)
→y.Store(x)
Benefits
VectorTableLookup
instead of manually chainedZipLow
/ZipHigh
instructionsTesting
All existing System.Collections tests pass (33,438 tests, 0 failures).
Fixes #116079
Original prompt
Fixes #116079
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.