-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update sse2neon.h to latest for better support for Linux ARM64 #206
Conversation
The Github Actions based CI will be enabled once this PR is merged to |
f730df2
to
5d28f8a
Compare
* Use sse2neon.h from https://github.com/DLTcollab/sse2neon/blob/de2817727c72fc2f4ce9f54e2db6e40ce0548414/sse2neon.h This is current master from 10.10.2023 * _mm_shuffle_pd is provided by newer sse2neon.h https://github.com/DLTcollab/sse2neon/blob/de2817727c72fc2f4ce9f54e2db6e40ce0548414/sse2neon.h#L5118 * aarch64 does not support cpuid Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
5d28f8a
to
458fe43
Compare
please test to make sure these changes work on an Apple M1/2 ... i am unsure about your changes to |
I don't have access to Apple M1/M2 but I will ask a colleague of mine to test it! |
I checked in my M1 pro env, it works! Please let me know if I can help more. git checkout support-linux-aarch64
mkdir build
cd build
cmake ..
make -j Detail log: ➜ code uname -a
Darwin jiangyikundeMacBook-Pro.local 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:03:51 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T6000 arm64
➜ code java -version
openjdk version "1.8.0_322"
OpenJDK Runtime Environment (Zulu 8.60.0.21-CA-macos-aarch64) (build 1.8.0_322-b06)
OpenJDK 64-Bit Server VM (Zulu 8.60.0.21-CA-macos-aarch64) (build 25.322-b06, mixed mode)
➜ code gcc --version
Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: arm64-apple-darwin22.2.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
➜ code cmake --version
cmake version 3.26.4
CMake suite maintained and supported by Kitware (kitware.com/cmake).
➜ code echo $JAVA_HOME
/Users/yikun/Library/Java/JavaVirtualMachines/azul-1.8.0_322/Contents/Home
➜ code cd beagle-lib
➜ beagle-lib git:(master) git checkout support-linux-aarch64
branch 'support-linux-aarch64' set up to track 'origin/support-linux-aarch64'.
Switched to a new branch 'support-linux-aarch64'
➜ beagle-lib git:(support-linux-aarch64) mkdir build
➜ beagle-lib git:(support-linux-aarch64) cd build
➜ build git:(support-linux-aarch64) cmake ..
-- The C compiler identification is AppleClang 14.0.0.14000029
-- The CXX compiler identification is AppleClang 14.0.0.14000029
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- BEAGLE_VERSION = 4.0.0
-- BEAGLE_PLUGIN_VERSION = 40
-- Found JNI: /Users/yikun/Library/Java/JavaVirtualMachines/azul-1.8.0_322/Contents/Home/include found components: AWT JVM
-- JAVA_HOME=/Users/yikun/Library/Java/JavaVirtualMachines/azul-1.8.0_322/Contents/Home
-- JNI_INCLUDE_DIRS=/Users/yikun/Library/Java/JavaVirtualMachines/azul-1.8.0_322/Contents/Home/include;/Users/yikun/Library/Java/JavaVirtualMachines/azul-1.8.0_322/Contents/Home/include/darwin;/Users/yikun/Library/Java/JavaVirtualMachines/azul-1.8.0_322/Contents/Home/include
-- JNI_LIBRARIES=/Users/yikun/Library/Java/JavaVirtualMachines/azul-1.8.0_322/Contents/Home/jre/lib/libjawt.dylib;/Users/yikun/Library/Java/JavaVirtualMachines/azul-1.8.0_322/Contents/Home/jre/lib/server/libjvm.dylib
-- macOS universal (x86_64 / arm64) build
-- Performing Test COMPILER_OPT_ARCH_NATIVE_SUPPORTED
-- Performing Test COMPILER_OPT_ARCH_NATIVE_SUPPORTED - Failed
-- Performing Test COMPILER_OPT_ARCH_AVX_SUPPORTED
-- Performing Test COMPILER_OPT_ARCH_AVX_SUPPORTED - Failed
-- Not using libtools for plugins
-- Looking for CL_VERSION_3_0
-- Looking for CL_VERSION_3_0 - not found
-- Looking for CL_VERSION_2_2
-- Looking for CL_VERSION_2_2 - not found
-- Looking for CL_VERSION_2_1
-- Looking for CL_VERSION_2_1 - not found
-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - not found
-- Looking for CL_VERSION_1_2
-- Looking for CL_VERSION_1_2 - found
-- Found OpenCL: /Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/System/Library/Frameworks/OpenCL.framework (found version "1.2")
-- OpenCL Includes: /Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/System/Library/Frameworks/OpenCL.framework
-- OpenCL Libraries: /Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/System/Library/Frameworks/OpenCL.framework
CUDA_TOOLKIT_ROOT_DIR not found or specified
-- Could NOT find CUDA (missing: CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY)
-- Configuring done (1.7s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/yikun/code/beagle-lib/build
➜ build git:(support-linux-aarch64) make -j
[ 8%] Building OpenCL kernels
[ 12%] Building CXX object libhmsbeagle/CPU/CMakeFiles/hmsbeagle-cpu.dir/BeagleCPUPlugin.cpp.o
[ 12%] Building CXX object libhmsbeagle/CMakeFiles/hmsbeagle.dir/benchmark/BeagleBenchmark.cpp.o
[ 20%] Building CXX object libhmsbeagle/CMakeFiles/hmsbeagle.dir/benchmark/linalg.cpp.o
[ 20%] Building CXX object libhmsbeagle/CPU/CMakeFiles/hmsbeagle-cpu-sse.dir/BeagleCPUSSEPlugin.cpp.o
[ 25%] Building CXX object libhmsbeagle/CMakeFiles/hmsbeagle.dir/beagle.cpp.o
[ 29%] Building CXX object libhmsbeagle/CMakeFiles/hmsbeagle.dir/plugin/Plugin.cpp.o
[ 33%] Building CXX object libhmsbeagle/CMakeFiles/hmsbeagle.dir/plugin/UnixSharedLibrary.cpp.o
[ 37%] Linking CXX shared library libhmsbeagle.dylib
[ 37%] Built target hmsbeagle
[ 41%] Building CXX object libhmsbeagle/JNI/CMakeFiles/hmsbeagle-jni.dir/beagle_BeagleJNIWrapper.cpp.o
Making OpenCL SP state count = 16
Making OpenCL SP state count = 32
Making OpenCL SP state count = 48
Making OpenCL SP state count = 64
Making OpenCL SP state count = 80
Making OpenCL SP state count = 128
Making OpenCL SP state count = 192
[ 45%] Linking CXX shared library libhmsbeagle-jni.jnilib
Making OpenCL SP state count = 256
Making OpenCL DP state count = 16 DP
Making OpenCL DP state count = 32 DP
[ 45%] Built target hmsbeagle-jni
Making OpenCL DP state count = 48 DP
Making OpenCL DP state count = 64 DP
Making OpenCL DP state count = 80 DP
Making OpenCL DP state count = 128 DP
Making OpenCL DP state count = 192 DP
Making OpenCL DP state count = 256 DP
[ 45%] Built target OpenKernels
[ 54%] Building CXX object libhmsbeagle/GPU/CMake_OpenCL/CMakeFiles/hmsbeagle-opencl.dir/__/KernelResource.cpp.o
[ 54%] Building CXX object libhmsbeagle/GPU/CMake_OpenCL/CMakeFiles/hmsbeagle-opencl.dir/__/GPUInterfaceOpenCL.cpp.o
[ 58%] Building CXX object libhmsbeagle/GPU/CMake_OpenCL/CMakeFiles/hmsbeagle-opencl.dir/__/OpenCLPlugin.cpp.o
[ 62%] Building CXX object libhmsbeagle/GPU/CMake_OpenCL/CMakeFiles/hmsbeagle-opencl.dir/__/KernelLauncher.cpp.o
[ 66%] Building CXX object libhmsbeagle/GPU/CMake_OpenCL/CMakeFiles/hmsbeagle-opencl.dir/__/GPUImplHelper.cpp.o
/Users/yikun/code/beagle-lib/libhmsbeagle/GPU/GPUInterfaceOpenCL.cpp:1089:5: warning: 'sprintf' is deprecated: This function is provided for compatibility reasons only. Due to security concerns inherent in the design of sprintf(3), it is highly recommended that you use snprintf(3) instead. [-Wdeprecated-declarations]
sprintf(deviceDescription,
^
/Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/usr/include/stdio.h:188:1: note: 'sprintf' has been explicitly marked deprecated here
__deprecated_msg("This function is provided for compatibility reasons only. Due to security concerns inherent in the design of sprintf(3), it is highly recommended that you use snprintf(3) instead.")
^
/Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/usr/include/sys/cdefs.h:215:48: note: expanded from macro '__deprecated_msg'
#define __deprecated_msg(_msg) __attribute__((__deprecated__(_msg)))
^
1 warning generated.
/Users/yikun/code/beagle-lib/libhmsbeagle/GPU/GPUInterfaceOpenCL.cpp:1089:5: warning: 'sprintf' is deprecated: This function is provided for compatibility reasons only. Due to security concerns inherent in the design of sprintf(3), it is highly recommended that you use snprintf(3) instead. [-Wdeprecated-declarations]
sprintf(deviceDescription,
^
/Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/usr/include/stdio.h:188:1: note: 'sprintf' has been explicitly marked deprecated here
__deprecated_msg("This function is provided for compatibility reasons only. Due to security concerns inherent in the design of sprintf(3), it is highly recommended that you use snprintf(3) instead.")
^
/Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/usr/include/sys/cdefs.h:215:48: note: expanded from macro '__deprecated_msg'
#define __deprecated_msg(_msg) __attribute__((__deprecated__(_msg)))
^
1 warning generated.
[ 70%] Linking CXX shared library libhmsbeagle-opencl.so
[ 70%] Built target hmsbeagle-opencl
[ 75%] Linking CXX shared library libhmsbeagle-cpu-sse.so
[ 75%] Built target hmsbeagle-cpu-sse
[ 79%] Linking CXX shared library libhmsbeagle-cpu.so
[ 79%] Built target hmsbeagle-cpu
[ 83%] Building CXX object examples/CMakeFiles/synthetictest.dir/synthetictest/linalg.cpp.o
[ 91%] Building CXX object examples/CMakeFiles/hmctest.dir/hmctest/hmctest.cpp.o
[ 91%] Building CXX object examples/CMakeFiles/synthetictest.dir/synthetictest/synthetictest.cpp.o
[ 95%] Linking CXX executable hmctest
[ 95%] Built target hmctest
[100%] Linking CXX executable synthetictest
[100%] Built target synthetictest
➜ build git:(support-linux-aarch64) ll libhmsbeagle/
total 512
drwxr-xr-x 6 yikun staff 192B Oct 11 11:12 CMakeFiles
drwxr-xr-x 9 yikun staff 288B Oct 11 11:12 CPU
drwxr-xr-x 6 yikun staff 192B Oct 11 11:12 GPU
drwxr-xr-x 6 yikun staff 192B Oct 11 11:12 JNI
-rw-r--r-- 1 yikun staff 14K Oct 11 11:12 Makefile
-rw-r--r-- 1 yikun staff 5.5K Oct 11 11:12 cmake_install.cmake
-rw-r--r-- 1 yikun staff 272B Oct 11 11:12 hmsbeagle-1.pc
-rw-r--r-- 1 yikun staff 1.8K Oct 11 11:12 hmsbeagle-1ConfigVersion.cmake
-rwxr-xr-x 1 yikun staff 222K Oct 11 11:12 libhmsbeagle.1.dylib
lrwxr-xr-x 1 yikun staff 20B Oct 11 11:12 libhmsbeagle.dylib -> libhmsbeagle.1.dylib |
Thank you, @Yikun ! |
I see that there are some more usages of
IMO those should be replaced with @Yikun Could you please do the following test on Mac ARM64:
and paste the output |
➜ bioconda-recipes git:(aarch64) ✗ cat test.c
#if !defined(__aarch64__)
#error __aarch64__ is not defined
#endif
#if !defined(__arm64__)
#error __arm64__ is not defined
#endif
#if !defined(__ARM64_ARCH_8__)
#error __ARM64_ARCH_8__ is not defined
#endif
#if !defined(__APPLE__)
#error __APPLE__ is not defined
#endif
int main() {
return 0;
}
➜ bioconda-recipes git:(aarch64) ✗ gcc test.c
➜ bioconda-recipes git:(aarch64) ✗ echo $?
0 |
Thank you again, @Yikun ! Here is the result on Linux ARM64:
I am going to replace |
It is supported by both Mac M1/M2 and Linux ARM64 Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
The CI checks passed on my fork: martin-g#2 |
i can also confirm my M2 that (1) BEAGLE compiles both with and without NEON and that BEAST run-times (as expected) differ btw the generated (with and without NEON) libraries. thank you for your effort here!!! |
Awesome! |
done |
Use sse2neon.h from https://github.com/DLTcollab/sse2neon/blob/de2817727c72fc2f4ce9f54e2db6e40ce0548414/sse2neon.h This is current master from 10.10.2023
_mm_shuffle_pd is provided by newer sse2neon.h
https://github.com/DLTcollab/sse2neon/blob/de2817727c72fc2f4ce9f54e2db6e40ce0548414/sse2neon.h#L5118