Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zen4 support #7840

Merged
merged 15 commits into from
Sep 11, 2023
Merged

Zen4 support #7840

merged 15 commits into from
Sep 11, 2023

Conversation

abadams
Copy link
Member

@abadams abadams commented Sep 6, 2023

This PR adds support for Zen4 chips. This is currently broken, because we are misdetecting them as sapphire rapids (#7834). There were also some avx512_bf16 things broken in llvm that we weren't aware of because we have no buildbots that support it.

It turns out the subset of avx512 supported by Zen4 is a strict superset of cannonlake, and a strict subset of sapphire rapids, so it was easy to slot into the target feature list. However, I think we need to have a discussion about how long these x86 target strings are getting (#7839)

The other thing in this PR we should discuss is feature-bit-based host CPU detection vs detecting specific microarchitectures and setting the right flags. There were comments in the existing source indicating we should do the latter, so that's what I did. I'm not completely sold on that though because it doesn't seem very future-proof.

This PR is necessary to bring online a new Zen4 buildbot.

@abadams abadams added the dev_meeting Topic to be discussed at the next dev meeting label Sep 6, 2023
@steven-johnson
Copy link
Contributor

The other thing in this PR we should discuss is feature-bit-based host CPU detection vs detecting specific microarchitectures and setting the right flags

I have felt all along that the former was the better way to go, but IIRC most other folks felt the other way was better -- given the existence of Zen4 as an in-betweener I'd think we should move to feature bits and redefine the existing microarch flags as combinations of those

@abadams
Copy link
Member Author

abadams commented Sep 6, 2023

I think you might be referring to target flags being microarch vs features, but I was actually referring to how host detection is implemented.

When we call the CPUID instruction on x86, should we be sniffing the bits that tell you what avx512 extensions exist, or should we be sniffing the bits that describe what the vendor and microarchitecture is? Either way should produce the same target string in theory, but they have different failure modes when things go wrong.

On the target flag issue, it's just not feasible to do individual feature bits with the way avx512 variants work. The Zen4 CPU would have the target string x86-64-linux-sse41-f16c-fma-avx-avx2-avx512-avx512f-avx512dq-avx512ifma-avx512cd-avx512bw-avx512vl-avx512vbmi-avx512vbmi2-avx512vnni-avx512vpclmulqdq-avx512bitalg-avx512vpopcntdq. Sapphire rapids would be even longer, because it adds avx512fp16, avxvnni, and a few more I think. It's much more usable to identify sets and subsets and have a single flag that means "all the avx512 extensions present on zen4".

@zvookin
Copy link
Member

zvookin commented Sep 6, 2023

My preferred design is to do things via individual feature bits but provide a way to map short names to a set of feature bits. The canonical form is fully expanded and the shortened names should be used only at a user facing command/UI level. Anything used for equality comparison or tagging in a build system etc. must use the expanded form.

Generally anything in a build file or source code wants to use an expanded form for clarity and maintainability. (E.g. the thing here where Zen4 has features in between two other microarchitectures. This is really easy if all the feature bits the compiler knows about are surfaced.)

The expansion could be a helper function or a Python script. It invariably leads to wanting a more powerful language like "The foo architecture minus the bar feature." That's fine in an external tool but shouldn't go into the target specification of the main compiler. I want both usability and uncomplicated specification at the compiler interface. Having an explicit conversion that is only used for user facing stuff is probably the best way to do this.

@abadams
Copy link
Member Author

abadams commented Sep 6, 2023

I will gently point out though that discussion of redesigning what the avx512 feature flags look like in the Target struct is largely independent of this PR, and I don't want it to derail the issues I actually wanted to discuss here. The changes in this PR don't pose any challenge to the status quo. #7839 is more related, but that's really about what we should do when some target flags imply other target flags, and is also not about refactoring how we've done the avx512 target flags.

@zvookin
Copy link
Member

zvookin commented Sep 7, 2023

Thinking about it more, the PR is likely the right approach for now.

How the feature flags are done directly informs the question of how to detect the information. If the processor architecture is in the feature name, it makes sense to derive it from the information identifying the processor. If the feature just names a specific ISA component, it should probably be detected via feature detection. The tradeoff is that newer processors can refuse to run code that they do support running.

The area is surprisingly complicated and thinking about it more, I'm leaning toward Target supporting both chip names and features with the compiler and runtime working in terms of an exhaustive list of features. So this PR likely doesn't make things much more difficult as backward compatibility is easy to provide if there is an explicit conversion anyway.

(The alternative here would involve calling Target::AVX512_Cannonlake a baseline and adding feature specific flags for additions since then. Target:::AVX512_SapphireRapids would set all the new feature flags and there either wouldn't be a Zen4 specific feature or it would also set the ones it supports. It isn't strictly necessary to have bits for features that are irrelevant to Halide at the moment. They could be added later. All in all, probably not a better bet.)

@abadams
Copy link
Member Author

abadams commented Sep 11, 2023

Failures unrelated.

@abadams abadams merged commit 6569a83 into main Sep 11, 2023
3 checks passed
ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024
* Enable emission of float16/32 casts on x86

Fixes halide#7836
Fixes halide#4166

* Add support for zen4

* Add avx512_Zen4 target flag

It's a superset of cannon lake, and a subset of sapphire rapids

* Fix runtime detection, sapphire rapids CPUID bits

* Fix comment

* Don't catch bfloat casts

* Fix Zen4 model number

* Use llvm BFloat type for bfloat intrinsics

* Give up on native bfloat16 conversion for now

* Don't use llvm's bfloat type at all

* Add missing enum

* Fix constant in comment

* clang-format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev_meeting Topic to be discussed at the next dev meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants