-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Status of AVX 512 ? #28
Comments
Hello. We will support 512-bit vectors. However, you'll need to turn up the enabled features during compilation because by default Rust binaries are not compiled with avx-512 enabled. |
Thank you for reply ! And what features I have to enable ? |
You'd usually use a target-feature list in the RUSTFLAGS value during build. The allowed features are the same as for the target_feature attribute It appears that you can't enable avx-512 on stable yet. Perhaps @Amanieu knows more? I've seen them merging work in stdarch lately. |
I made a N-Body algorithm implementation, and on my server, compiling with target-cpu=knl works much better than with target-cpu=native. It's like it vectorizes better, but without adding any target-feature. Although I am not in a KNL, it is true that the server has similar instructions and for some reason it works better (i mean, the algorithm takes less time to finish) |
Currrently we tie the stabilization of |
Knight's Landing chips lack the narrower-width SSE instructions so it is likely that some things that are lowering to SSE instructions while using |
I just pestered everyone by mentioning this in the Zulip so I should mention it here: I should note that "AVX-512" is by no means a singular unitary feature, there is |
@ManuelCostanzo Note that gcc/clang/icc generally avoid 512-bit registers even when compiled for |
Unfortunately I believe that will also be entirely out of our hands unless LLVM provides a mechanism for encouraging it. Using |
Aha! |
At first I was considering to myself, "shouldn't this issue be closed?" since it's not something the Portable SIMD API can help with per se. However, past and future Jubilees, please consider: These specific instructions on targeting AVX512-enabled architectures should probably go somewhere, and from that "guide-level" perspective, this is within the scope of our mandate. |
AVX512 would certainly be nice for cryptography. For example, curve25519-dalek has a backend leveraging AVX512-IFMA. GHASH (used by AES-GCM) also benefits from VPCLMULQDQ, but it's already possible to leverage from Rust just by using The Keccak sponge function (used by the SHA3 family and the KangarooTwelve XOF) is another example of an algorithm that could benefit: https://github.com/XKCP/K12/blob/master/lib/Optimized64/KeccakP-1600-AVX512-plainC.c |
@tarcieri When I said "specific instructions" I meant for human usage. Conversely, guaranteeing specific machine instructions, including for specific SIMD architectures, are compiled into the binary is not actually in-scope for the SIMD API project, as much of a paradox as that may seem, so usages like those will likely continue to depend on core::arch::x86_64, etc. |
If I understand what you're saying, there are specific logical operations the above AVX512 use cases map to, but there may not be corresponding Rust traits for those operations. The The GHASH use case is carryless multiplication. I'm not sure what a good API is for distinguishing that from a more traditional multiply-with-carry. Keccak is simple bitwise ops like XOR and shuffles. |
FMA will likely be supported at some point (regardless of AVX-512). Unfortunately llvm doesn't expose carry-less multiply (https://groups.google.com/g/llvm-dev/c/5cpOboKOBg4/m/kJ9z_xkVAQAJ) so you'd probably need to use |
This was wrong, actually! It is Knight's Corner and Knight's Ferry that don't support SSE! KNL does support SSE, but it has the really wide vectors plus some other performance quirks that cause LLVM to favor using big fat full vectors. |
It'll be 256-bit AVX/FMA versus AVX-512. KNL didn't suffer the license-based downclocking so compilers issue 512-bit (zmm) instructions by default. They need coaxing to issue those when targeting |
Hi, |
@mhnatiuk It depends what you're striving for. |
portable-simd does seem to hit the AVX512 instruction set when compiled with This is implicitly deduced by the fact that the performance of a masked sum equals the sum of an un-masked sum when the mask is represented as a bitmap. See https://github.com/DataEngineeringLabs/simd-benches#bench-results-on-native for details. The particular comparison is "Sum of nullable values (Bitmap)" vs "Sum of values". |
Hi, any news for this issue ?
How could I help to stabilize ? |
I guess the right place to ask would be rust-lang/stdarch#310 ? |
I think this is the relevant comment--someone will need to spend some time splitting the feature and stabilizing the target features and leave the intrinsics for another time. I'm not sure if there's any good reason for holding back stabilization at this point. |
I agree that it's fine not to block the target feature on the intrinsics. |
Notably it would be nice to have the |
Using ZMM registers as clobbered registers (i.e. |
@mert-kurttutan while we could potentially go out of our way to avoid using ZMM registers as inputs/outputs, what we'd really like to eventually use are intrinsics like Really these operations benefit the most from always being able to keep data in ZMM registers, and unless we have a stable way to get data in and out of them it involves hoisting more and more into the inline assembly to fill those ZMM registers. We also have algorithms factored into different crates where it would be nice to be able to keep data in ZMM registers even when calling functions between crates. |
Hello !
I want to ask if this crate supports AVX 512 instructions. If not, Is it in the plans to be able to support it ? This would be the definitive rate for simd in Rust ? Because I understand that the one that is in the official documentation does not have more support.
Thanks
The text was updated successfully, but these errors were encountered: