-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A CPUID library in Base or stdlib #36367
Comments
Yes, I think having Cpuid functionality in base would be good. |
Other than exporting the ccall, the constants should also be exported and it should be only used through names and not the values directly. There should be something like CPU.X86.fma and CPU.AArch32.neon and you can define them by generating a jl file from the features_*.h headers. |
Actually ccall will also lie to you about the features so a new intrinsic is needed to get a consistent answer with what the LLVM target actually is. The ccall could be used most of the time for loading external libraries but not anything that uses vector calls. |
Also what’s an example where you want to load libraries optimized for different uarch ? |
Ok. Sure. As long as it’s not the intended way to fix the compiler version regression. Though, as I mentioned in the other issue, it should be possible to just patch the library to support dispatch within the library. It should be within the interest of the library to do so since loading multiple libraries won’t benefit any normal library users... |
Why not? There's no reason for each library to ship its own micro-arch dispatching code. |
Because the mpfr issue doesn't even get the full performance back when compiled with gcc 4.8 on core2 target. And it's not even using any new instructions and is purely a compiler issue. This would simply be the wrong fix.
First, this is not related to why this is the wrong fix for the regression. And, if the library benefit from newer instruction it should support dispatch directly. It doesn't mean the library has to implement the dispatch, that's what GCC target_clone is for... As I said, the reason is that
and I don't believe julia is the most important user for most libraries. |
And IIUC icc also does so automatically so there should be little reason the library maintainer has to do this manually... |
This issue is also related to #33011. In this case, the detection must be done at runtime/load time. So, the CPUID feature in Base is useful, although it requires careful usage. |
That is correct and that is exactly why things that seems to work in a package can easily be wrong. For #33011, it is aimed at julia code so it must agree with the LLVM target and not the runtime environment so that's why I said a separate intrinsic is needed. It's also exactly why I mention about But for the propose of loading external libraries, as long as the library to be loaded doesn't have vector type as argument, the ABI should not be affected so loading a library that doesn't match the LLVM target isn't much of a deal as long as the library loaded can run. That's why I think it's still OK to just use ccall of |
I've just found that in libllvm there is ccall(:LLVMGetHostCPUFeatures, Cstring, ()) gives the list of CPU features, also on architectures different from x86. For example, on AWS Graviton 2: julia> features = filter(f -> startswith(f, "+"), split(unsafe_string(ccall(:LLVMGetHostCPUFeatures, Cstring, ())), ","))
4-element Array{SubString{String},1}:
"+neon"
"+fp-armv8"
"+crypto"
"+crc" Can you see any problem with using this function? This would be good enough for our purposes, it's more general than cpuid and it doesn't require extra code in Base. |
For your purpose of loading libraries it's answer is correct enough
It's an unstable API though. |
@yuyichao we've noticed that |
AFAICT there's no newer LLVM functions to use. As I said, exporting |
@Keno it seems that LLVM's HostCPUFeatures on ARM are not specific enough for us; for instance, it would be difficult to tell if an aarch64 processor had enough features to run |
Hmm, really? Does it not have the ISA version (e.g. ARMv8.3 or something). That should be sufficient for us to tell. Anyway, didn't @yuyichao say in the previous comment that we essentially already have this? |
No, you can see in #36367 (comment) that it does not, sadly. Unless we're just asking the wrong function. Exporting |
Don't we have that list in C already? Presumably we could just add a new function that uses the same information and just returns a list of all the applicable features? |
I didin't know that is supported now. What are the required features?
That's not actually detectable AFAICT. Only features are.
Errr, well, that's exactly what |
I wrote that recently?
lse atomics, but they've been filling out the whole set in subsequent ISA versions, so the vanilla 8.1 atomics are not sufficient. So far, we've only tested Graviton2 where it works well. |
Does it require not using any ll-sc instructions? Does LLVM/gcc/libc support doing that? |
Yes and yes. |
If it works on neoverse-n1 then I assume th additional one is rcpc so any cortex-a75+ including a55 and a65 should be fine. Is the use of LSE instructions a ABI requirement? I know they couldn't do 128bit atomic read without a lock without lse so it would make sense that without lse 128bit atomics requires libcall anyway. I imagine it should be legal for <=64bit atomics to use ll-sc though... So there's a chance AArch64 might get rr support before amd does? |
As @giordano and I are working on microarchitecture-specific tarballs in BinaryBuilder, we are currently importing a modified version of
CpuId.jl
, however ifBase
is going to be selecting artifacts based on the current CPU type, it makes sense that something with similar capabilities to thisCpuId
package would become a stdlib at the least, if not a part ofBase
.We already have some CPUID code in
src/processor*.cpp
; should we exportjl_test_cpu_feature()
and simply maintain a Julia mapping of the flags? The way I see it, we have three options:ccall()
to probe CPUID bits.CpuId.jl
), but have it shipped by default with Julia.Pkg.add()
-time artifacts allowed)I am in favor of 1 or 2, and between the two of them I'm in favor of (2) because it is most maintainable.
The text was updated successfully, but these errors were encountered: