-
Notifications
You must be signed in to change notification settings - Fork 287
XSAVE, CPUID, Runtime AVX-512 and ARM support #175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This all looks great to me, thanks so much for digging into all this! One day I'd love to dig into all the cpu detection but it looks quite thorough here and there's plenty of links to follow so I'm more than ok with the current implementation. I'm ok personally landing this whenever CI is green (I don't think we need AVX-512 support in rustc to block this b/c it doesn't add any AVX-512 intrinsics, right?). It looks like CI currently has an error with |
I'll also note that personally I'm a little anxious about these intrinsics in the sense that there's not an Intel manual with the signatures (or maybe there is?) in the same way we have such a guide for all the SIMD intrinsics. That being said though I think this library will go through normal stabilization processes as other libraries in libstd, so it's ok for part of this library to be unstable (such as these intrinsics which I'm not currently sure about) and the other parts can be stable (like the Intel-defined signatures). |
AFAIK such a manual does not exist (maybe there is one in the Intel icc compiler?) but there is something much better, Intel's online data-base ! The data-base gives you the C signatures, the assembly instruction the CPUID flag (or flags) that you need to check... Clang and GCC deviate slightly from the C signatures in the Intel guide though (mainly when it comes to signed vs unsigned integers), so it is useful to also check the:
at least, when you read that Intel says some
I got these in my computer as well but I thought I fixed them, will recheck these once the rust PR is merged. LLVM is picky with the pointer types that the intrinsics take. |
Also technically |
Right. The only thing blocking this is the |
Nice! That's perfect. This all sounds great to me, let's merge when CI is green |
3dcee13
to
294aef9
Compare
@gnzlbg could the 4000 line macro be left to a separate PR perhaps? I'm not sure I'm entirely convinced we should add it yet... |
@alexcrichton yes, I can split that to a different PR before merging this one. But there was some runtime detection code there that I wanted to have here. Is that ok if I do that when the |
bf2b5cf
to
6445928
Compare
d103013
to
605b5ae
Compare
a3ee736
to
be24581
Compare
@BurntSushi @alexcrichton this one should be ready to go as well. The ARM ci still doesn't use run-time feature detection because there is a PR for @alexcrichton this PR also contains the bextri fix. Shall I split it into a different PR? |
@gnzlbg yeah want to split out the bextri fix? Otherwise seems fine by me! |
be24581
to
0a0aef6
Compare
@alexcrichton done :) |
0a0aef6
to
4126259
Compare
4126259
to
2aa0e81
Compare
std = [] | ||
intel_sde = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you get a chance, could you add documentation here for what intel_sde
is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll add a comment to the Cargo.toml.
This PR implements:
xsave
intrinsics,__readeflags
/__writeeflags
intrinsics,cpuid
intrinsics (except for GCC's__get_cpuid_max
, Issue Implement GCC's__get_cpuid_max
#174 ), andxsave
andAVX-512
,and fixes one bug:
CPUID
instruction was used before testing whether the CPU supported it.Note that the
has_cpuid
intrinsic does not exist in GCC nor Clang (it is a sub-set of GCC's__get_cpuid_max
) but I've included it because we can make it a no-op onx86_64
which is what most folks are targeting nowadays, which is useful for our own run-time feature detection support and for cpuid crates like @shepmaster's cupid.This PR does arguably a lot of stuff. We need the
EFLAGS
intrinsics to properly refactorcpuid
into its own intrinsic. We needxsave
run-time support for properly refactoring thexgetbv
intrinsic. Andxsave
run-time support is tightly coupled with AVX run-time feature detection, which is different for AVX/AVX2 and AVX-512 and since we want to support AVX-512 soon anyways I did not want to have to go through the spec twice. I also wanted to test thexgetbv
intrinsic and ended implementing all of thexsave
intrinsics for this. Arguably I just needed to implementxsetbv
but, again, I did not wanted to have to go through the spec twice.The run-time feature detection for x86 grows significantly with AVX-512 support, so I backported one of the refactorings I have in the run-time feature detection module for ARM (which is even bigger). There I refactor it further into its own top-level module to be able to share some code between both.
This is blocked on:
[xsave] whitelist xsave target feature rust#45761 (whitelisting XSAVE in rustc)Closes #171 .
Helps with the run-time part of #146 .