Zen4 support #7840

abadams · 2023-09-06T21:35:35Z

This PR adds support for Zen4 chips. This is currently broken, because we are misdetecting them as sapphire rapids (#7834). There were also some avx512_bf16 things broken in llvm that we weren't aware of because we have no buildbots that support it.

It turns out the subset of avx512 supported by Zen4 is a strict superset of cannonlake, and a strict subset of sapphire rapids, so it was easy to slot into the target feature list. However, I think we need to have a discussion about how long these x86 target strings are getting (#7839)

The other thing in this PR we should discuss is feature-bit-based host CPU detection vs detecting specific microarchitectures and setting the right flags. There were comments in the existing source indicating we should do the latter, so that's what I did. I'm not completely sold on that though because it doesn't seem very future-proof.

This PR is necessary to bring online a new Zen4 buildbot.

Fixes #7836 Fixes #4166

It's a superset of cannon lake, and a subset of sapphire rapids

steven-johnson · 2023-09-06T22:28:06Z

The other thing in this PR we should discuss is feature-bit-based host CPU detection vs detecting specific microarchitectures and setting the right flags

I have felt all along that the former was the better way to go, but IIRC most other folks felt the other way was better -- given the existence of Zen4 as an in-betweener I'd think we should move to feature bits and redefine the existing microarch flags as combinations of those

abadams · 2023-09-06T22:38:36Z

I think you might be referring to target flags being microarch vs features, but I was actually referring to how host detection is implemented.

When we call the CPUID instruction on x86, should we be sniffing the bits that tell you what avx512 extensions exist, or should we be sniffing the bits that describe what the vendor and microarchitecture is? Either way should produce the same target string in theory, but they have different failure modes when things go wrong.

On the target flag issue, it's just not feasible to do individual feature bits with the way avx512 variants work. The Zen4 CPU would have the target string x86-64-linux-sse41-f16c-fma-avx-avx2-avx512-avx512f-avx512dq-avx512ifma-avx512cd-avx512bw-avx512vl-avx512vbmi-avx512vbmi2-avx512vnni-avx512vpclmulqdq-avx512bitalg-avx512vpopcntdq. Sapphire rapids would be even longer, because it adds avx512fp16, avxvnni, and a few more I think. It's much more usable to identify sets and subsets and have a single flag that means "all the avx512 extensions present on zen4".

zvookin · 2023-09-06T22:38:41Z

My preferred design is to do things via individual feature bits but provide a way to map short names to a set of feature bits. The canonical form is fully expanded and the shortened names should be used only at a user facing command/UI level. Anything used for equality comparison or tagging in a build system etc. must use the expanded form.

Generally anything in a build file or source code wants to use an expanded form for clarity and maintainability. (E.g. the thing here where Zen4 has features in between two other microarchitectures. This is really easy if all the feature bits the compiler knows about are surfaced.)

The expansion could be a helper function or a Python script. It invariably leads to wanting a more powerful language like "The foo architecture minus the bar feature." That's fine in an external tool but shouldn't go into the target specification of the main compiler. I want both usability and uncomplicated specification at the compiler interface. Having an explicit conversion that is only used for user facing stuff is probably the best way to do this.

abadams · 2023-09-06T22:44:20Z

I will gently point out though that discussion of redesigning what the avx512 feature flags look like in the Target struct is largely independent of this PR, and I don't want it to derail the issues I actually wanted to discuss here. The changes in this PR don't pose any challenge to the status quo. #7839 is more related, but that's really about what we should do when some target flags imply other target flags, and is also not about refactoring how we've done the avx512 target flags.

zvookin · 2023-09-07T21:02:23Z

Thinking about it more, the PR is likely the right approach for now.

How the feature flags are done directly informs the question of how to detect the information. If the processor architecture is in the feature name, it makes sense to derive it from the information identifying the processor. If the feature just names a specific ISA component, it should probably be detected via feature detection. The tradeoff is that newer processors can refuse to run code that they do support running.

The area is surprisingly complicated and thinking about it more, I'm leaning toward Target supporting both chip names and features with the compiler and runtime working in terms of an exhaustive list of features. So this PR likely doesn't make things much more difficult as backward compatibility is easy to provide if there is an explicit conversion anyway.

(The alternative here would involve calling Target::AVX512_Cannonlake a baseline and adding feature specific flags for additions since then. Target:::AVX512_SapphireRapids would set all the new feature flags and there either wouldn't be a Zen4 specific feature or it would also set the ones it supports. It isn't strictly necessary to have bits for features that are irrelevant to Halide at the moment. They could be added later. All in all, probably not a better bet.)

abadams · 2023-09-11T17:40:25Z

Failures unrelated.

* Enable emission of float16/32 casts on x86 Fixes halide#7836 Fixes halide#4166 * Add support for zen4 * Add avx512_Zen4 target flag It's a superset of cannon lake, and a subset of sapphire rapids * Fix runtime detection, sapphire rapids CPUID bits * Fix comment * Don't catch bfloat casts * Fix Zen4 model number * Use llvm BFloat type for bfloat intrinsics * Give up on native bfloat16 conversion for now * Don't use llvm's bfloat type at all * Add missing enum * Fix constant in comment * clang-format

abadams added 14 commits September 5, 2023 10:13

Enable emission of float16/32 casts on x86

00d29bd

Fixes #7836 Fixes #4166

Add support for zen4

8b85321

Add avx512_Zen4 target flag

edffe44

It's a superset of cannon lake, and a subset of sapphire rapids

Fix runtime detection, sapphire rapids CPUID bits

5143840

Fix comment

264e062

Don't catch bfloat casts

0b12c24

Fix Zen4 model number

efb69cd

Use llvm BFloat type for bfloat intrinsics

2d48d37

Merge branch 'abadams/enable_f16c' into abadams/zen4

8122dc1

Give up on native bfloat16 conversion for now

58488e3

Don't use llvm's bfloat type at all

2eaa568

Add missing enum

d058212

Fix constant in comment

9d94842

Merge remote-tracking branch 'origin/main' into abadams/zen4

bc29e4a

abadams added the dev_meeting Topic to be discussed at the next dev meeting label Sep 6, 2023

vksnk approved these changes Sep 6, 2023

View reviewed changes

clang-format

45b767e

abadams merged commit 6569a83 into main Sep 11, 2023
3 checks passed

BrewTestBot mentioned this pull request Feb 2, 2024

halide 17.0.0 Homebrew/homebrew-core#161602

Closed

zvookin mentioned this pull request Feb 23, 2024

halide_get_cpu_features will need to be updated for x86 AVX10 and APX. #8120

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zen4 support #7840

Zen4 support #7840

abadams commented Sep 6, 2023

steven-johnson commented Sep 6, 2023

abadams commented Sep 6, 2023

zvookin commented Sep 6, 2023

abadams commented Sep 6, 2023

zvookin commented Sep 7, 2023

abadams commented Sep 11, 2023

Zen4 support #7840

Zen4 support #7840

Conversation

abadams commented Sep 6, 2023

steven-johnson commented Sep 6, 2023

abadams commented Sep 6, 2023

zvookin commented Sep 6, 2023

abadams commented Sep 6, 2023

zvookin commented Sep 7, 2023

abadams commented Sep 11, 2023