-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emit AVX-512 vector instructions #8264
Comments
cc @CarolEidt |
@dotnet/jit-contrib |
News on this? |
This is likely to be a large work item, and I don't know how high this will land on the priority list for future work.
|
Can you please prioritize it ? Our project heavily depends on System.Numerics.Vector and we are using Intel(R) Xeon(R) Platinum 8168 CPU (Skylake). |
@zhongkaifu, do you have any numbers on how much of a performance increase AVX-512 is (both for a specific workload, and for applications as a whole)? |
@tannergooding Our existing code could get 2x performance increase from AVX-128bits to AVX-256bits. For AVX-512, since existing System.Numerics.Vector doesn't support it yet, we cannot test it. |
@zhongkaifu, it may be worth getting some experimental numbers using native code. Having some information showing that this improves your overall scenario would help to prioritize the work appropriately. Like with many new ISAs or alternative algorithms, using AVX-512 isn't always a clear cut perf gain and benchmarking/profiling is important. Depending on the processor, workload, etc, they can actually reduce the frequency of your processor (temporarily) and impact the overall performance of the process (or other processes). A simple search will show some blog posts from various consumers and some technical sheets from Intel which describe both the benefits and drawbacks that AVX-512 can find You might want to see the following from CloudFlar: https://blog.cloudflare.com/on-the-dangers-of-intels-frequency-scaling/ and this Spec from Intel (see Erratta 24, and others): https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-spec-update.html |
Thanks @tannergooding . I've read these articles, and they are really helpful. I didn't know this problem before. I may use MKL to run some tests and figure out how many gain we can have. |
https://godbolt.org/z/bX3h2h public static int Combine(Vector128<uint> values)
{
Vector128<uint> hash = seedVec;
hash = Sse2.Add(hash, Sse41.MultiplyLow(values, Vector128.Create(Prime2)));
// these three instructions could be a single `vprold` with AVX-512
hash = Sse2.Or(
Sse2.ShiftLeftLogical(hash, 13),
Sse2.ShiftRightLogical(hash, 19));
hash = Sse41.MultiplyLow(hash, Vector128.Create(Prime1));
// same here - `vprold`
hash = Sse2.Or(
Avx2.ShiftLeftLogicalVariable(hash, Vector128.Create(1u, 7u, 12u, 18u)),
Avx2.ShiftRightLogicalVariable(hash, Vector128.Create(31u, 25u, 20u, 14u)));
// horizontal sum and add 16 to the result
var hashAsInt32 = hash.AsInt32();
hashAsInt32 = Ssse3.HorizontalAdd(hashAsInt32, hashAsInt32);
hashAsInt32 = Ssse3.HorizontalAdd(hashAsInt32, hashAsInt32);
var sum16 = Sse41.Extract(hashAsInt32.AsUInt32(), 0) + 16;
return (int)MixFinal(sum16);
} |
I'm going to close this in favor of #77034 |
With the announcement of Skylake-X, AVX-512 is going mainstream.
The CLR should emit AVX-512 vector instructions that System.Numerics.Vector can use.
category:cq
theme:vector-codegen
skill-level:expert
cost:extra-large
The text was updated successfully, but these errors were encountered: