-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zen4 support #7840
Zen4 support #7840
Conversation
It's a superset of cannon lake, and a subset of sapphire rapids
I have felt all along that the former was the better way to go, but IIRC most other folks felt the other way was better -- given the existence of Zen4 as an in-betweener I'd think we should move to feature bits and redefine the existing microarch flags as combinations of those |
I think you might be referring to target flags being microarch vs features, but I was actually referring to how host detection is implemented. When we call the CPUID instruction on x86, should we be sniffing the bits that tell you what avx512 extensions exist, or should we be sniffing the bits that describe what the vendor and microarchitecture is? Either way should produce the same target string in theory, but they have different failure modes when things go wrong. On the target flag issue, it's just not feasible to do individual feature bits with the way avx512 variants work. The Zen4 CPU would have the target string x86-64-linux-sse41-f16c-fma-avx-avx2-avx512-avx512f-avx512dq-avx512ifma-avx512cd-avx512bw-avx512vl-avx512vbmi-avx512vbmi2-avx512vnni-avx512vpclmulqdq-avx512bitalg-avx512vpopcntdq. Sapphire rapids would be even longer, because it adds avx512fp16, avxvnni, and a few more I think. It's much more usable to identify sets and subsets and have a single flag that means "all the avx512 extensions present on zen4". |
My preferred design is to do things via individual feature bits but provide a way to map short names to a set of feature bits. The canonical form is fully expanded and the shortened names should be used only at a user facing command/UI level. Anything used for equality comparison or tagging in a build system etc. must use the expanded form. Generally anything in a build file or source code wants to use an expanded form for clarity and maintainability. (E.g. the thing here where Zen4 has features in between two other microarchitectures. This is really easy if all the feature bits the compiler knows about are surfaced.) The expansion could be a helper function or a Python script. It invariably leads to wanting a more powerful language like "The foo architecture minus the bar feature." That's fine in an external tool but shouldn't go into the target specification of the main compiler. I want both usability and uncomplicated specification at the compiler interface. Having an explicit conversion that is only used for user facing stuff is probably the best way to do this. |
I will gently point out though that discussion of redesigning what the avx512 feature flags look like in the Target struct is largely independent of this PR, and I don't want it to derail the issues I actually wanted to discuss here. The changes in this PR don't pose any challenge to the status quo. #7839 is more related, but that's really about what we should do when some target flags imply other target flags, and is also not about refactoring how we've done the avx512 target flags. |
Thinking about it more, the PR is likely the right approach for now. How the feature flags are done directly informs the question of how to detect the information. If the processor architecture is in the feature name, it makes sense to derive it from the information identifying the processor. If the feature just names a specific ISA component, it should probably be detected via feature detection. The tradeoff is that newer processors can refuse to run code that they do support running. The area is surprisingly complicated and thinking about it more, I'm leaning toward Target supporting both chip names and features with the compiler and runtime working in terms of an exhaustive list of features. So this PR likely doesn't make things much more difficult as backward compatibility is easy to provide if there is an explicit conversion anyway. (The alternative here would involve calling |
Failures unrelated. |
* Enable emission of float16/32 casts on x86 Fixes halide#7836 Fixes halide#4166 * Add support for zen4 * Add avx512_Zen4 target flag It's a superset of cannon lake, and a subset of sapphire rapids * Fix runtime detection, sapphire rapids CPUID bits * Fix comment * Don't catch bfloat casts * Fix Zen4 model number * Use llvm BFloat type for bfloat intrinsics * Give up on native bfloat16 conversion for now * Don't use llvm's bfloat type at all * Add missing enum * Fix constant in comment * clang-format
This PR adds support for Zen4 chips. This is currently broken, because we are misdetecting them as sapphire rapids (#7834). There were also some avx512_bf16 things broken in llvm that we weren't aware of because we have no buildbots that support it.
It turns out the subset of avx512 supported by Zen4 is a strict superset of cannonlake, and a strict subset of sapphire rapids, so it was easy to slot into the target feature list. However, I think we need to have a discussion about how long these x86 target strings are getting (#7839)
The other thing in this PR we should discuss is feature-bit-based host CPU detection vs detecting specific microarchitectures and setting the right flags. There were comments in the existing source indicating we should do the latter, so that's what I did. I'm not completely sold on that though because it doesn't seem very future-proof.
This PR is necessary to bring online a new Zen4 buildbot.