A CPUID library in Base or stdlib #36367

staticfloat · 2020-06-19T23:28:50Z

As @giordano and I are working on microarchitecture-specific tarballs in BinaryBuilder, we are currently importing a modified version of CpuId.jl, however if Base is going to be selecting artifacts based on the current CPU type, it makes sense that something with similar capabilities to this CpuId package would become a stdlib at the least, if not a part of Base.

We already have some CPUID code in src/processor*.cpp; should we export jl_test_cpu_feature() and simply maintain a Julia mapping of the flags? The way I see it, we have three options:

Export the C function, create a simple wrapper stdlib that has various flag definitions and invokes a ccall() to probe CPUID bits.
Leave the C code alone, duplicate the functionality in pure Julia (similar to CpuId.jl), but have it shipped by default with Julia.
Force all packages that want to use microarchitecture-specific artifacts to do the artifact selection themselves (this will only work for lazy artifacts; no Pkg.add()-time artifacts allowed)

I am in favor of 1 or 2, and between the two of them I'm in favor of (2) because it is most maintainable.

The text was updated successfully, but these errors were encountered:

Keno · 2020-06-19T23:41:39Z

Yes, I think having Cpuid functionality in base would be good. Base seems like the right place for it, since it seems conceivable that the loading code would have to be aware of it. I think importing CpuId.jl as is for now, would be fine, though in the long run, we should make sure to remove as much duplication as possible.

yuyichao · 2020-06-19T23:46:09Z

The detection should not really happen at build time. Pkg.add should not do any architecture detection. It should be done at runtime/load time. Of course right now because it is only done on user's machine you can do it at build time but that'll not work for stdlib and base and for shared home directory.
You shouldn't really use CpuId.jl since it does not agree with the target julia uses. You should use the one we ship. I was previously thinking of exporting a more user friendly string based API but just exporting the ccall should work as well. We can just use ccall optimization to make it work and use aliases to provide the right generic names. (i.e. the user shouldn't need to know that vfp4 provides fma on ARM).

Other than exporting the ccall, the constants should also be exported and it should be only used through names and not the values directly. There should be something like CPU.X86.fma and CPU.AArch32.neon and you can define them by generating a jl file from the features_*.h headers.

yuyichao · 2020-06-19T23:54:42Z

Actually ccall will also lie to you about the features so a new intrinsic is needed to get a consistent answer with what the LLVM target actually is. The ccall could be used most of the time for loading external libraries but not anything that uses vector calls.

yuyichao · 2020-06-20T00:02:49Z

Also what’s an example where you want to load libraries optimized for different uarch ?

giordano · 2020-06-20T00:04:18Z

https://github.com/giordano/Darknet_jll.jl/releases

yuyichao · 2020-06-20T00:08:55Z

Ok. Sure. As long as it’s not the intended way to fix the compiler version regression.

Though, as I mentioned in the other issue, it should be possible to just patch the library to support dispatch within the library. It should be within the interest of the library to do so since loading multiple libraries won’t benefit any normal library users...

Keno · 2020-06-20T00:31:00Z

Ok. Sure. As long as it’s not the intended way to fix the compiler version regression.

Why not? There's no reason for each library to ship its own micro-arch dispatching code.

yuyichao · 2020-06-20T00:39:17Z

As long as it’s not the intended way to fix the compiler version regression.

Because the mpfr issue doesn't even get the full performance back when compiled with gcc 4.8 on core2 target. And it's not even using any new instructions and is purely a compiler issue. This would simply be the wrong fix.

There's no reason for each library to ship its own micro-arch dispatching code.

First, this is not related to why this is the wrong fix for the regression.

And, if the library benefit from newer instruction it should support dispatch directly. It doesn't mean the library has to implement the dispatch, that's what GCC target_clone is for... As I said, the reason is that

loading multiple libraries won’t benefit any normal library users...

and I don't believe julia is the most important user for most libraries.

yuyichao · 2020-06-20T00:50:47Z

And IIUC icc also does so automatically so there should be little reason the library maintainer has to do this manually...

kimikage · 2020-06-20T02:45:44Z

The detection should not really happen at build time. Pkg.add should not do any architecture detection. It should be done at runtime/load time.

This issue is also related to #33011. In this case, the detection must be done at runtime/load time. So, the CPUID feature in Base is useful, although it requires careful usage.

yuyichao · 2020-06-20T02:57:09Z

That is correct and that is exactly why things that seems to work in a package can easily be wrong.

For #33011, it is aimed at julia code so it must agree with the LLVM target and not the runtime environment so that's why I said a separate intrinsic is needed. It's also exactly why I mention about vfp4. People that want to query about existance of fma should write sth like CPU.hasfma() rather than CPU.test_feature(ARM::vfp4).

But for the propose of loading external libraries, as long as the library to be loaded doesn't have vector type as argument, the ABI should not be affected so loading a library that doesn't match the LLVM target isn't much of a deal as long as the library loaded can run. That's why I think it's still OK to just use ccall of jl_test_cpu_feature in this case and export all the constant. Both should be reasonably stable (as long as no one uses the numerical values in the code directly) but I don't want to wrap jl_test_cpu_feature in an easy to use julia function since it is too easy to misuse.

giordano · 2020-06-28T17:38:10Z

I've just found that in libllvm there is LLVMGetHostCPUFeatures:

ccall(:LLVMGetHostCPUFeatures, Cstring, ())

gives the list of CPU features, also on architectures different from x86. For example, on AWS Graviton 2:

julia> features = filter(f -> startswith(f, "+"), split(unsafe_string(ccall(:LLVMGetHostCPUFeatures, Cstring, ())), ","))
4-element Array{SubString{String},1}:
 "+neon"
 "+fp-armv8"
 "+crypto"
 "+crc"

Can you see any problem with using this function? This would be good enough for our purposes, it's more general than cpuid and it doesn't require extra code in Base.

yuyichao · 2020-06-28T17:45:01Z

For your purpose of loading libraries it's answer is correct enough

as long as the library to be loaded doesn't have vector type as argument

It's an unstable API though.

staticfloat · 2020-06-29T19:41:13Z

@yuyichao we've noticed that LLVMGetHostCPUFeatures is pretty sparse on armv8. For instance on the graviton 2 processor that @giordano just posted about, that is an armv8.2 processor with some extra extensions, and we know that it should have many more extensions listed in that LLVMGetHostCPUFeatures output. Is that something that might be fixed by a newer LLVM, or should we be calling a different LLVM function? As it stands, I don't think the output of LLVMGetHostCPUFeatures is going to be sufficient for us, as we need to do things like ensure that lse is enabled on this chip, and although we know it is enabled (because this is an armv8.2-a chip, and lse is on by default for anything armv8.1-a and higher) it's not showing in the LLVM features here.

yuyichao · 2020-06-29T20:57:04Z

AFAICT there's no newer LLVM functions to use. As I said, exporting jl_test_cpu_feature is fine and it already support lse. It's as correct as LLVMGetHostCPUFeatures so just don't use it for anything that affect julia code. Since it shouldn't be use for julia code, julia using ccall should be fine. It'll be easily confused with future functions that should be used with julia code otherwise.

staticfloat · 2020-07-08T21:36:25Z

@Keno it seems that LLVM's HostCPUFeatures on ARM are not specific enough for us; for instance, it would be difficult to tell if an aarch64 processor had enough features to run rr on with just this information. Is this something you would prefer we try to fix inside of LLVM, and rely on a newer LLVM in Julia to deal with, or should we just create our own CPUID library either in C or in Julia?

Keno · 2020-07-08T21:40:27Z

for instance, it would be difficult to tell if an aarch64 processor had enough features to run rr on with just this information

Hmm, really? Does it not have the ISA version (e.g. ARMv8.3 or something). That should be sufficient for us to tell. Anyway, didn't @yuyichao say in the previous comment that we essentially already have this?

staticfloat · 2020-07-08T21:43:57Z

Does it not have the ISA version (e.g. ARMv8.3 or something).

No, you can see in #36367 (comment) that it does not, sadly. Unless we're just asking the wrong function.

Exporting jl_test_cpu_feature() is fine as well, but we will have to maintain a list of CPU features that it knows how to test for, which may be somewhat onerous. If you think that's the way to go, then let's just do that.

Keno · 2020-07-08T21:46:03Z

Don't we have that list in C already? Presumably we could just add a new function that uses the same information and just returns a list of all the applicable features?

yuyichao · 2020-07-08T22:11:21Z

to run rr on

I didin't know that is supported now. What are the required features?

Does it not have the ISA version (e.g. ARMv8.3 or something)

That's not actually detectable AFAICT. Only features are.

Exporting jl_test_cpu_feature() is fine as well, but we will have to maintain a list of CPU features that it knows how to test for, which may be somewhat onerous. If you think that's the way to go, then let's just do that.

Errr, well, that's exactly what jl_test_cpu_feature is doing...

Keno · 2020-07-08T22:19:06Z

I didin't know that is supported now.

I wrote that recently?

What are the required features?

lse atomics, but they've been filling out the whole set in subsequent ISA versions, so the vanilla 8.1 atomics are not sufficient. So far, we've only tested Graviton2 where it works well.

yuyichao · 2020-07-08T22:21:12Z

Does it require not using any ll-sc instructions? Does LLVM/gcc/libc support doing that?

Keno · 2020-07-08T22:22:39Z

Yes and yes.

yuyichao · 2020-07-08T22:29:23Z

If it works on neoverse-n1 then I assume th additional one is rcpc so any cortex-a75+ including a55 and a65 should be fine.

Is the use of LSE instructions a ABI requirement? I know they couldn't do 128bit atomic read without a lock without lse so it would make sense that without lse 128bit atomics requires libcall anyway. I imagine it should be legal for <=64bit atomics to use ll-sc though...

So there's a chance AArch64 might get rr support before amd does?

chriselrod mentioned this issue Jul 15, 2020

Flux.jl precompile takes 20-40 minutes FluxML/Flux.jl#1283

Closed

staticfloat added this to the 1.6 milestone Aug 13, 2020

giordano mentioned this issue Sep 8, 2020

Add CPUID submodule #37479

Merged

staticfloat closed this as completed in #37479 Sep 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A CPUID library in Base or stdlib #36367

A CPUID library in Base or stdlib #36367

staticfloat commented Jun 19, 2020

Keno commented Jun 19, 2020

yuyichao commented Jun 19, 2020 •

edited

Loading

yuyichao commented Jun 19, 2020

yuyichao commented Jun 20, 2020 •

edited

Loading

giordano commented Jun 20, 2020

yuyichao commented Jun 20, 2020

Keno commented Jun 20, 2020

yuyichao commented Jun 20, 2020

yuyichao commented Jun 20, 2020

kimikage commented Jun 20, 2020

yuyichao commented Jun 20, 2020

giordano commented Jun 28, 2020

yuyichao commented Jun 28, 2020

staticfloat commented Jun 29, 2020

yuyichao commented Jun 29, 2020

staticfloat commented Jul 8, 2020

Keno commented Jul 8, 2020

staticfloat commented Jul 8, 2020

Keno commented Jul 8, 2020

yuyichao commented Jul 8, 2020

Keno commented Jul 8, 2020 •

edited

Loading

yuyichao commented Jul 8, 2020

Keno commented Jul 8, 2020

yuyichao commented Jul 8, 2020

A CPUID library in Base or stdlib #36367

A CPUID library in Base or stdlib #36367

Comments

staticfloat commented Jun 19, 2020

Keno commented Jun 19, 2020

yuyichao commented Jun 19, 2020 • edited Loading

yuyichao commented Jun 19, 2020

yuyichao commented Jun 20, 2020 • edited Loading

giordano commented Jun 20, 2020

yuyichao commented Jun 20, 2020

Keno commented Jun 20, 2020

yuyichao commented Jun 20, 2020

yuyichao commented Jun 20, 2020

kimikage commented Jun 20, 2020

yuyichao commented Jun 20, 2020

giordano commented Jun 28, 2020

yuyichao commented Jun 28, 2020

staticfloat commented Jun 29, 2020

yuyichao commented Jun 29, 2020

staticfloat commented Jul 8, 2020

Keno commented Jul 8, 2020

staticfloat commented Jul 8, 2020

Keno commented Jul 8, 2020

yuyichao commented Jul 8, 2020

Keno commented Jul 8, 2020 • edited Loading

yuyichao commented Jul 8, 2020

Keno commented Jul 8, 2020

yuyichao commented Jul 8, 2020

yuyichao commented Jun 19, 2020 •

edited

Loading

yuyichao commented Jun 20, 2020 •

edited

Loading

Keno commented Jul 8, 2020 •

edited

Loading