-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: renumber Arm targets + Apple feature detection #2076
Comments
Here is a function that can detect if an optional CPU feature is present on MacOS/iOS/iPad:
Need to include the A list of optional AArch64 SIMD ISA extensions that can be queried on MacOS/iOS/iPad can be found at https://developer.apple.com/documentation/kernel/1387446-sysctlbyname/determining_instruction_set_characteristics. |
Thanks @johnplatts - good point, seems like a good occasion to also add support for runtime dispatch on Apple.
|
AVX3/AVX3_DL target detection also should be updated for x86_64 on MacOS as Here are some functions that can be used to check that Highway is running on MacOS 12.2 or later (the below code requires that
Here is an updated snippet that correctly checks for AVX3 support on MacOS:
The MacOS AVX3 context saving bug was mentioned at https://community.intel.com/t5/Software-Tuning-Performance/MacOS-Darwin-kernel-bug-clobbers-AVX-512-opmask-register-state/m-p/1327259, golang/go#49233, and simdutf/simdutf#236. |
Nice find, thank you @johnplatts ! Would you like to send this code as a pull request, with a comment mentioning the intel.com forum discussion link? |
I have made the changes to x86 DetectTargets() that fix the issues with AVX3 detection on macOS in pull request #2083. Also added HasCpuFeature in hwy/targets.cc that is available if Highway is being compiled for macOS/iOS/iPadOS in pull request #2083. HasCpuFeature is used in the updated implementation of DetectTargets() on macOS on x86 in pull request #2083 to check that the OS supports AVX3, and HasCpuFeature can also be used to detect support for some of the AArch64 SIMD extension set extensions on Apple Silicon CPU's. |
Windows on AArch64 also has the IsProcessorFeaturePresent function that can check for the presence of some of the AArch64 instruction set extensions (including the SDOT/UDOT instructions), and the IsProcessorFeaturePresent function is described at https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-isprocessorfeaturepresent. |
PiperOrigin-RevId: 627388792
PiperOrigin-RevId: 627388792
PiperOrigin-RevId: 627388792
PiperOrigin-RevId: 627388792
PiperOrigin-RevId: 627710484
Unfortunately, that doesn't cover SVE. Any code with SVE intrinsics cannot be used on Windows targets, see: |
Microsoft is likely planning on adding support for SVE in a future Windows release as Microsoft has recently added detection for SVE on Windows on AArch64 in the .NET Runtime according to a pull request that can be found at dotnet/runtime#100937. There is a new constant PF_ARM_SVE_INSTRUCTIONS_AVAILABLE that was recently added to https://github.com/dotnet/runtime/blob/main/src/native/minipal/cpufeatures.c for the AArch64 SVE feature that hasn't yet made its way into Windows headers or the IsProcessorFeaturePresent API documentation. The Visual C++ 2022 compiler also does not currently have support for SVE, and compiling the SVE target for Windows on AArch64 requires Clang. |
The renumbering is done, and thanks @johnplatts for adding the Apple detection :) |
FYI we are working on supporting dynamic dispatch with Clang on Arm. As part of this, we may insert another NEON target using some of the optional features (fp16, bf16, dot, perhaps fp16fml - please let us know if you'd like to use/target others).
We'd want this target to be used if it's available, but it should not take precedence over any SVE targets. To enable that, we'd have to renumber the Arm targets. This could cause breakage for a project that uses the combination of:
This seems sufficiently unlikely, but please let us know within say a week if you have any concerns.
For concreteness, the plan is to insert 2 targets below HWY_NEON, 3 below HWY_SVE2, and that leaves 4 below HWY_SVE2_128.
The text was updated successfully, but these errors were encountered: