-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVX-512 intrinsics #146
Comments
@hdevalence is there a rust bug for this? |
I've opened up a PR for the some rustc features at f764eaf453162cd19ef484ece07cc21e14dfb2c1 rust-lang/rust#45528 |
Not sure if this is related to the above or unrelated, but building
This is with |
@hdevalence hm... fascinating! In general you shouldn't need to build stdsimd with That being said this still shouldn't cause a problem! Mind opening a separate issue for that? |
Ok rust-lang/rust#45528 is now merged so I think these bugs should be fixed and we should be ready to go! |
I tried getting an AVX-512 intrinsic to work and ran into a bunch of difficulties. Some points:
It looks like the combination of AVX512's masks and AVX512VL (which lets AVX512 instructions operate on 128/256bit vectors) means that for most instructions there's one C intrinsic for each of {no mask, write mask, zero mask} x {xmm, ymm, zmm}.
These would probably be good to generate with a macro?
Because AVX512 uses mask registers, the
constify!
macro hacks are probably not needed for mask instructions.The list of intrinsics linked in the readme doesn't seem to have non-masked versions; I don't know if this is just an accident of how it was made.
Trying to use the
int_x86_avx512_mask_pmul_dq_512
intrinsic from that list usingdidn't work, failing with
which I guess means I was linking to the intrinsic incorrectly?
@alexcrichton reduced to this minimal example for the
vpmuldq
instruction: https://godbolt.org/g/VMCtYy and found https://github.com/rust-lang/rust/blob/4c053db233d69519b548e5b8ed7192d0783e582a/src/librustc_trans/cabi_x86_64.rs#L30-L31 which hardcodes the biggest vector as 256 bits (the size of a ymm register).The text was updated successfully, but these errors were encountered: