Status of AVX 512 ? #28

ManuelCostanzo · 2020-10-06T21:22:00Z

Hello !

I want to ask if this crate supports AVX 512 instructions. If not, Is it in the plans to be able to support it ? This would be the definitive rate for simd in Rust ? Because I understand that the one that is in the official documentation does not have more support.

Thanks

Lokathor · 2020-10-06T21:43:08Z

Hello.

We will support 512-bit vectors. However, you'll need to turn up the enabled features during compilation because by default Rust binaries are not compiled with avx-512 enabled.

ManuelCostanzo · 2020-10-06T21:55:47Z

Thank you for reply ! And what features I have to enable ?

Lokathor · 2020-10-06T22:12:25Z

You'd usually use a target-feature list in the RUSTFLAGS value during build.

The allowed features are the same as for the target_feature attribute

It appears that you can't enable avx-512 on stable yet.

Perhaps @Amanieu knows more? I've seen them merging work in stdarch lately.

ManuelCostanzo · 2020-10-06T22:21:52Z

I made a N-Body algorithm implementation, and on my server, compiling with target-cpu=knl works much better than with target-cpu=native.

It's like it vectorizes better, but without adding any target-feature.

Although I am not in a KNL, it is true that the server has similar instructions and for some reason it works better (i mean, the algorithm takes less time to finish)

Amanieu · 2020-10-06T23:44:36Z

Currrently we tie the stabilization of target_feature features with the implementation of the relevant intrinsics in stdarch (The AVX512 intrinsics are still incomplete). However I think we should separate these now that we have stdsimd.

workingjubilee · 2020-10-07T01:48:30Z

Knight's Landing chips lack the narrower-width SSE instructions so it is likely that some things that are lowering to SSE instructions while using -Ctarget-cpu=native are lowering to AVX instructions with -Ctarget-cpu=knl.

workingjubilee · 2020-10-07T04:34:06Z

I just pestered everyone by mentioning this in the Zulip so I should mention it here: I should note that "AVX-512" is by no means a singular unitary feature, there is avx512f ("F" for "Foundation", perhaps?) and also extension features for AVX-512 that build on top of avx512f, so that's something to be aware of. Our main attention will be on supporting the concepts of 512-bit vectors abstractly in our API and in a manner that is vendor-neutral so that the compiler can do the best job it can with a desired intention without the programmer having to get into the nitty-gritty specifics of Intel's API.

jedbrown · 2020-11-18T16:20:58Z

@ManuelCostanzo Note that gcc/clang/icc generally avoid 512-bit registers even when compiled for skylake-avx512 due to license-based downclocking, which includes stalls at frequency transitions. You have to specifically request them by something like -mprefer-vector-width=512. Ice Lake has a big improvement in downclocking so we might see compilers using 512-bit registers by default. rustc --print target-features suggests that there is no way to encourage the compiler to actually use 512-bit registers (which is what you want if you spend lots of time in sustained avx512 code).

calebzulawski · 2020-11-18T17:59:38Z

Unfortunately I believe that will also be entirely out of our hands unless LLVM provides a mechanism for encouraging it. Using target-cpu=native may help in some cases?

jedbrown · 2020-11-18T18:26:04Z

Aha! rustc -Ctarget-cpu=skylake-avx512 -Ctarget-feature=-prefer-256-bit. It's confusing because +prefer-256-bit is the default and one specifies that they want 512 by disabling it -- I'd been looking for +prefer-512-bit, which doesn't exist.

https://godbolt.org/z/nEMWz9

workingjubilee · 2021-02-03T22:02:43Z

At first I was considering to myself, "shouldn't this issue be closed?" since it's not something the Portable SIMD API can help with per se. However, past and future Jubilees, please consider: These specific instructions on targeting AVX512-enabled architectures should probably go somewhere, and from that "guide-level" perspective, this is within the scope of our mandate.

tarcieri · 2021-02-04T14:34:04Z

AVX512 would certainly be nice for cryptography. For example, curve25519-dalek has a backend leveraging AVX512-IFMA.

GHASH (used by AES-GCM) also benefits from VPCLMULQDQ, but it's already possible to leverage from Rust just by using target-cpu=skylake

The Keccak sponge function (used by the SHA3 family and the KangarooTwelve XOF) is another example of an algorithm that could benefit: https://github.com/XKCP/K12/blob/master/lib/Optimized64/KeccakP-1600-AVX512-plainC.c

workingjubilee · 2021-02-05T00:43:58Z

@tarcieri When I said "specific instructions" I meant for human usage.

Conversely, guaranteeing specific machine instructions, including for specific SIMD architectures, are compiled into the binary is not actually in-scope for the SIMD API project, as much of a paradox as that may seem, so usages like those will likely continue to depend on core::arch::x86_64, etc.

tarcieri · 2021-02-05T01:13:45Z

If I understand what you're saying, there are specific logical operations the above AVX512 use cases map to, but there may not be corresponding Rust traits for those operations.

The curve25519-dalek use case requires a multiply-accumulate operation, namely fused multiply–add.

The GHASH use case is carryless multiplication. I'm not sure what a good API is for distinguishing that from a more traditional multiply-with-carry.

Keccak is simple bitwise ops like XOR and shuffles.

calebzulawski · 2021-02-05T01:58:57Z

FMA will likely be supported at some point (regardless of AVX-512). Unfortunately llvm doesn't expose carry-less multiply (https://groups.google.com/g/llvm-dev/c/5cpOboKOBg4/m/kJ9z_xkVAQAJ) so you'd probably need to use std::arch for that.

workingjubilee · 2021-02-19T04:32:33Z

Knight's Landing chips lack the narrower-width SSE instructions so it is likely that some things that are lowering to SSE instructions while using -Ctarget-cpu=native are lowering to AVX instructions with -Ctarget-cpu=knl.

This was wrong, actually! It is Knight's Corner and Knight's Ferry that don't support SSE! KNL does support SSE, but it has the really wide vectors plus some other performance quirks that cause LLVM to favor using big fat full vectors.

jedbrown · 2021-02-19T06:24:04Z

It'll be 256-bit AVX/FMA versus AVX-512. KNL didn't suffer the license-based downclocking so compilers issue 512-bit (zmm) instructions by default. They need coaxing to issue those when targeting skylake-avx512 (#28 (comment)). Note that the skylake target does not support AVX-512 at all.

mhnatiuk · 2021-07-13T11:58:57Z

Hi,
I'm starting to learn Rust for scientific computing. Is this issue already resolved elsewhere by rust developers?

jedbrown · 2021-07-14T03:44:11Z

@mhnatiuk It depends what you're striving for. rustc makes portable binaries by default, but you can either change the target globally (see examples ☝️; this is the most common approach in scientific computing) or compile multiple variants of hot vectorizable parts of your code and specialize at run-time (nicer for packaging and distribution).

jorgecarleitao · 2021-11-21T22:39:54Z

portable-simd does seem to hit the AVX512 instruction set when compiled with target-cpu=native".

This is implicitly deduced by the fact that the performance of a masked sum equals the sum of an un-masked sum when the mask is represented as a bitmap. See https://github.com/DataEngineeringLabs/simd-benches#bench-results-on-native for details. The particular comparison is "Sum of nullable values (Bitmap)" vs "Sum of values".

JeWaVe · 2024-02-08T16:48:05Z

Hi,

any news for this issue ?
We are now in 2024 and with rustc 1.76 I get

the target feature avx512f is currently unstable

How could I help to stabilize ?

HadrienG2 · 2024-02-12T12:54:24Z

I guess the right place to ask would be rust-lang/stdarch#310 ?

calebzulawski · 2024-02-12T13:15:25Z

Currrently we tie the stabilization of target_feature features with the implementation of the relevant intrinsics in stdarch (The AVX512 intrinsics are still incomplete). However I think we should separate these now that we have stdsimd.

I think this is the relevant comment--someone will need to spend some time splitting the feature and stabilizing the target features and leave the intrinsics for another time. I'm not sure if there's any good reason for holding back stabilization at this point.

Amanieu · 2024-02-12T13:28:02Z

I agree that it's fine not to block the target feature on the intrinsics.

tarcieri · 2024-02-12T13:47:25Z

Notably it would be nice to have the target_feature stable so ZMM registers can be used with inline assembly, even if the relevant intrinsics aren't stable

mert-kurttutan · 2024-07-05T08:28:37Z

Notably it would be nice to have the target_feature stable so ZMM registers can be used with inline assembly, even if the relevant intrinsics aren't stable

Using ZMM registers as clobbered registers (i.e. out("zmm0) _,) (and using runtime detection with rawcpu_id or cpufeatures crate) seems to work. If you dont need to have ZMM registers as input or output, this may work.
@tarcieri I would love to see your insight on this method?

tarcieri · 2024-07-08T17:41:16Z

@mert-kurttutan while we could potentially go out of our way to avoid using ZMM registers as inputs/outputs, what we'd really like to eventually use are intrinsics like _mm512_aesenc_epi128, which take ZMM registers as inputs and outputs. But in the meantime, we can use an asm! polyfill instead, or at least we could if ZMM registers are stable.

Really these operations benefit the most from always being able to keep data in ZMM registers, and unless we have a stable way to get data in and out of them it involves hoisting more and more into the inline assembly to fill those ZMM registers. We also have algorithms factored into different crates where it would be nice to be able to keep data in ZMM registers even when calling functions between crates.

workingjubilee added the E-needs-docs Needs documentation added. label May 3, 2021

mert-kurttutan mentioned this issue Jul 16, 2024

Implement AVX-512 intrinsics rust-lang/stdarch#310

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Status of AVX 512 ? #28

Status of AVX 512 ? #28

ManuelCostanzo commented Oct 6, 2020

Lokathor commented Oct 6, 2020

ManuelCostanzo commented Oct 6, 2020 •

edited

Loading

Lokathor commented Oct 6, 2020

ManuelCostanzo commented Oct 6, 2020 •

edited

Loading

Amanieu commented Oct 6, 2020 •

edited

Loading

workingjubilee commented Oct 7, 2020

workingjubilee commented Oct 7, 2020

jedbrown commented Nov 18, 2020

calebzulawski commented Nov 18, 2020

jedbrown commented Nov 18, 2020

workingjubilee commented Feb 3, 2021

tarcieri commented Feb 4, 2021 •

edited

Loading

workingjubilee commented Feb 5, 2021

tarcieri commented Feb 5, 2021

calebzulawski commented Feb 5, 2021

workingjubilee commented Feb 19, 2021

jedbrown commented Feb 19, 2021

mhnatiuk commented Jul 13, 2021

jedbrown commented Jul 14, 2021

jorgecarleitao commented Nov 21, 2021

JeWaVe commented Feb 8, 2024

HadrienG2 commented Feb 12, 2024

calebzulawski commented Feb 12, 2024

Amanieu commented Feb 12, 2024

tarcieri commented Feb 12, 2024

mert-kurttutan commented Jul 5, 2024

tarcieri commented Jul 8, 2024

Status of AVX 512 ? #28

Status of AVX 512 ? #28

Comments

ManuelCostanzo commented Oct 6, 2020

Lokathor commented Oct 6, 2020

ManuelCostanzo commented Oct 6, 2020 • edited Loading

Lokathor commented Oct 6, 2020

ManuelCostanzo commented Oct 6, 2020 • edited Loading

Amanieu commented Oct 6, 2020 • edited Loading

workingjubilee commented Oct 7, 2020

workingjubilee commented Oct 7, 2020

jedbrown commented Nov 18, 2020

calebzulawski commented Nov 18, 2020

jedbrown commented Nov 18, 2020

workingjubilee commented Feb 3, 2021

tarcieri commented Feb 4, 2021 • edited Loading

workingjubilee commented Feb 5, 2021

tarcieri commented Feb 5, 2021

calebzulawski commented Feb 5, 2021

workingjubilee commented Feb 19, 2021

jedbrown commented Feb 19, 2021

mhnatiuk commented Jul 13, 2021

jedbrown commented Jul 14, 2021

jorgecarleitao commented Nov 21, 2021

JeWaVe commented Feb 8, 2024

HadrienG2 commented Feb 12, 2024

calebzulawski commented Feb 12, 2024

Amanieu commented Feb 12, 2024

tarcieri commented Feb 12, 2024

mert-kurttutan commented Jul 5, 2024

tarcieri commented Jul 8, 2024

ManuelCostanzo commented Oct 6, 2020 •

edited

Loading

ManuelCostanzo commented Oct 6, 2020 •

edited

Loading

Amanieu commented Oct 6, 2020 •

edited

Loading

tarcieri commented Feb 4, 2021 •

edited

Loading