Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update sse2neon.h to latest for better support for Linux ARM64 #206

Merged
merged 2 commits into from
Oct 13, 2023

Conversation

martin-g
Copy link
Contributor

@martin-g
Copy link
Contributor Author

martin-g commented Oct 10, 2023

The Github Actions based CI will be enabled once this PR is merged to master branch.
Until then you could see results at my fork https://github.com/martin-g/beagle-lib/actions/runs/6470457082

* Use sse2neon.h from https://github.com/DLTcollab/sse2neon/blob/de2817727c72fc2f4ce9f54e2db6e40ce0548414/sse2neon.h
This is current master from 10.10.2023

* _mm_shuffle_pd is provided by newer sse2neon.h

https://github.com/DLTcollab/sse2neon/blob/de2817727c72fc2f4ce9f54e2db6e40ce0548414/sse2neon.h#L5118

* aarch64 does not support cpuid

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@martin-g martin-g marked this pull request as draft October 10, 2023 14:05
@martin-g martin-g marked this pull request as ready for review October 10, 2023 14:14
@msuchard
Copy link
Member

please test to make sure these changes work on an Apple M1/2 ... i am unsure about your changes to libhmsbeagle/CPU/BeagleCPUSSEPlugin.cpp ... then i am happy to merge.

@martin-g
Copy link
Contributor Author

I don't have access to Apple M1/M2 but I will ask a colleague of mine to test it!

@Yikun
Copy link

Yikun commented Oct 11, 2023

I checked in my M1 pro env, it works! Please let me know if I can help more.

git checkout support-linux-aarch64
mkdir build
cd build
cmake ..
make -j

Detail log:

➜ code uname -a
Darwin jiangyikundeMacBook-Pro.local 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:03:51 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T6000 arm64

➜  code java -version
openjdk version "1.8.0_322"
OpenJDK Runtime Environment (Zulu 8.60.0.21-CA-macos-aarch64) (build 1.8.0_322-b06)
OpenJDK 64-Bit Server VM (Zulu 8.60.0.21-CA-macos-aarch64) (build 25.322-b06, mixed mode)
➜  code gcc --version
Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: arm64-apple-darwin22.2.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
➜  code cmake --version
cmake version 3.26.4

CMake suite maintained and supported by Kitware (kitware.com/cmake).
➜  code echo $JAVA_HOME
/Users/yikun/Library/Java/JavaVirtualMachines/azul-1.8.0_322/Contents/Home
➜  code cd beagle-lib
➜  beagle-lib git:(master) git checkout support-linux-aarch64
branch 'support-linux-aarch64' set up to track 'origin/support-linux-aarch64'.
Switched to a new branch 'support-linux-aarch64'
➜  beagle-lib git:(support-linux-aarch64) mkdir build
➜  beagle-lib git:(support-linux-aarch64) cd build
➜  build git:(support-linux-aarch64) cmake ..
-- The C compiler identification is AppleClang 14.0.0.14000029
-- The CXX compiler identification is AppleClang 14.0.0.14000029
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- BEAGLE_VERSION = 4.0.0
-- BEAGLE_PLUGIN_VERSION = 40
-- Found JNI: /Users/yikun/Library/Java/JavaVirtualMachines/azul-1.8.0_322/Contents/Home/include  found components: AWT JVM
-- JAVA_HOME=/Users/yikun/Library/Java/JavaVirtualMachines/azul-1.8.0_322/Contents/Home
-- JNI_INCLUDE_DIRS=/Users/yikun/Library/Java/JavaVirtualMachines/azul-1.8.0_322/Contents/Home/include;/Users/yikun/Library/Java/JavaVirtualMachines/azul-1.8.0_322/Contents/Home/include/darwin;/Users/yikun/Library/Java/JavaVirtualMachines/azul-1.8.0_322/Contents/Home/include
-- JNI_LIBRARIES=/Users/yikun/Library/Java/JavaVirtualMachines/azul-1.8.0_322/Contents/Home/jre/lib/libjawt.dylib;/Users/yikun/Library/Java/JavaVirtualMachines/azul-1.8.0_322/Contents/Home/jre/lib/server/libjvm.dylib
-- macOS universal (x86_64 / arm64) build
-- Performing Test COMPILER_OPT_ARCH_NATIVE_SUPPORTED
-- Performing Test COMPILER_OPT_ARCH_NATIVE_SUPPORTED - Failed
-- Performing Test COMPILER_OPT_ARCH_AVX_SUPPORTED
-- Performing Test COMPILER_OPT_ARCH_AVX_SUPPORTED - Failed
-- Not using libtools for plugins
-- Looking for CL_VERSION_3_0
-- Looking for CL_VERSION_3_0 - not found
-- Looking for CL_VERSION_2_2
-- Looking for CL_VERSION_2_2 - not found
-- Looking for CL_VERSION_2_1
-- Looking for CL_VERSION_2_1 - not found
-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - not found
-- Looking for CL_VERSION_1_2
-- Looking for CL_VERSION_1_2 - found
-- Found OpenCL: /Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/System/Library/Frameworks/OpenCL.framework (found version "1.2")
-- OpenCL Includes: /Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/System/Library/Frameworks/OpenCL.framework
-- OpenCL Libraries: /Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/System/Library/Frameworks/OpenCL.framework
CUDA_TOOLKIT_ROOT_DIR not found or specified
-- Could NOT find CUDA (missing: CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY)
-- Configuring done (1.7s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/yikun/code/beagle-lib/build
➜  build git:(support-linux-aarch64) make -j
[  8%] Building OpenCL kernels
[ 12%] Building CXX object libhmsbeagle/CPU/CMakeFiles/hmsbeagle-cpu.dir/BeagleCPUPlugin.cpp.o
[ 12%] Building CXX object libhmsbeagle/CMakeFiles/hmsbeagle.dir/benchmark/BeagleBenchmark.cpp.o
[ 20%] Building CXX object libhmsbeagle/CMakeFiles/hmsbeagle.dir/benchmark/linalg.cpp.o
[ 20%] Building CXX object libhmsbeagle/CPU/CMakeFiles/hmsbeagle-cpu-sse.dir/BeagleCPUSSEPlugin.cpp.o
[ 25%] Building CXX object libhmsbeagle/CMakeFiles/hmsbeagle.dir/beagle.cpp.o
[ 29%] Building CXX object libhmsbeagle/CMakeFiles/hmsbeagle.dir/plugin/Plugin.cpp.o
[ 33%] Building CXX object libhmsbeagle/CMakeFiles/hmsbeagle.dir/plugin/UnixSharedLibrary.cpp.o
[ 37%] Linking CXX shared library libhmsbeagle.dylib
[ 37%] Built target hmsbeagle
[ 41%] Building CXX object libhmsbeagle/JNI/CMakeFiles/hmsbeagle-jni.dir/beagle_BeagleJNIWrapper.cpp.o
Making OpenCL SP state count = 16
Making OpenCL SP state count = 32
Making OpenCL SP state count = 48
Making OpenCL SP state count = 64
Making OpenCL SP state count = 80
Making OpenCL SP state count = 128
Making OpenCL SP state count = 192
[ 45%] Linking CXX shared library libhmsbeagle-jni.jnilib
Making OpenCL SP state count = 256
Making OpenCL DP state count = 16 DP
Making OpenCL DP state count = 32 DP
[ 45%] Built target hmsbeagle-jni
Making OpenCL DP state count = 48 DP
Making OpenCL DP state count = 64 DP
Making OpenCL DP state count = 80 DP
Making OpenCL DP state count = 128 DP
Making OpenCL DP state count = 192 DP
Making OpenCL DP state count = 256 DP
[ 45%] Built target OpenKernels
[ 54%] Building CXX object libhmsbeagle/GPU/CMake_OpenCL/CMakeFiles/hmsbeagle-opencl.dir/__/KernelResource.cpp.o
[ 54%] Building CXX object libhmsbeagle/GPU/CMake_OpenCL/CMakeFiles/hmsbeagle-opencl.dir/__/GPUInterfaceOpenCL.cpp.o
[ 58%] Building CXX object libhmsbeagle/GPU/CMake_OpenCL/CMakeFiles/hmsbeagle-opencl.dir/__/OpenCLPlugin.cpp.o
[ 62%] Building CXX object libhmsbeagle/GPU/CMake_OpenCL/CMakeFiles/hmsbeagle-opencl.dir/__/KernelLauncher.cpp.o
[ 66%] Building CXX object libhmsbeagle/GPU/CMake_OpenCL/CMakeFiles/hmsbeagle-opencl.dir/__/GPUImplHelper.cpp.o
/Users/yikun/code/beagle-lib/libhmsbeagle/GPU/GPUInterfaceOpenCL.cpp:1089:5: warning: 'sprintf' is deprecated: This function is provided for compatibility reasons only.  Due to security concerns inherent in the design of sprintf(3), it is highly recommended that you use snprintf(3) instead. [-Wdeprecated-declarations]
    sprintf(deviceDescription,
    ^
/Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/usr/include/stdio.h:188:1: note: 'sprintf' has been explicitly marked deprecated here
__deprecated_msg("This function is provided for compatibility reasons only.  Due to security concerns inherent in the design of sprintf(3), it is highly recommended that you use snprintf(3) instead.")
^
/Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/usr/include/sys/cdefs.h:215:48: note: expanded from macro '__deprecated_msg'
        #define __deprecated_msg(_msg) __attribute__((__deprecated__(_msg)))
                                                      ^
1 warning generated.
/Users/yikun/code/beagle-lib/libhmsbeagle/GPU/GPUInterfaceOpenCL.cpp:1089:5: warning: 'sprintf' is deprecated: This function is provided for compatibility reasons only.  Due to security concerns inherent in the design of sprintf(3), it is highly recommended that you use snprintf(3) instead. [-Wdeprecated-declarations]
    sprintf(deviceDescription,
    ^
/Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/usr/include/stdio.h:188:1: note: 'sprintf' has been explicitly marked deprecated here
__deprecated_msg("This function is provided for compatibility reasons only.  Due to security concerns inherent in the design of sprintf(3), it is highly recommended that you use snprintf(3) instead.")
^
/Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/usr/include/sys/cdefs.h:215:48: note: expanded from macro '__deprecated_msg'
        #define __deprecated_msg(_msg) __attribute__((__deprecated__(_msg)))
                                                      ^
1 warning generated.
[ 70%] Linking CXX shared library libhmsbeagle-opencl.so
[ 70%] Built target hmsbeagle-opencl
[ 75%] Linking CXX shared library libhmsbeagle-cpu-sse.so
[ 75%] Built target hmsbeagle-cpu-sse
[ 79%] Linking CXX shared library libhmsbeagle-cpu.so
[ 79%] Built target hmsbeagle-cpu
[ 83%] Building CXX object examples/CMakeFiles/synthetictest.dir/synthetictest/linalg.cpp.o
[ 91%] Building CXX object examples/CMakeFiles/hmctest.dir/hmctest/hmctest.cpp.o
[ 91%] Building CXX object examples/CMakeFiles/synthetictest.dir/synthetictest/synthetictest.cpp.o
[ 95%] Linking CXX executable hmctest
[ 95%] Built target hmctest
[100%] Linking CXX executable synthetictest
[100%] Built target synthetictest
➜  build git:(support-linux-aarch64) ll libhmsbeagle/
total 512
drwxr-xr-x  6 yikun  staff   192B Oct 11 11:12 CMakeFiles
drwxr-xr-x  9 yikun  staff   288B Oct 11 11:12 CPU
drwxr-xr-x  6 yikun  staff   192B Oct 11 11:12 GPU
drwxr-xr-x  6 yikun  staff   192B Oct 11 11:12 JNI
-rw-r--r--  1 yikun  staff    14K Oct 11 11:12 Makefile
-rw-r--r--  1 yikun  staff   5.5K Oct 11 11:12 cmake_install.cmake
-rw-r--r--  1 yikun  staff   272B Oct 11 11:12 hmsbeagle-1.pc
-rw-r--r--  1 yikun  staff   1.8K Oct 11 11:12 hmsbeagle-1ConfigVersion.cmake
-rwxr-xr-x  1 yikun  staff   222K Oct 11 11:12 libhmsbeagle.1.dylib
lrwxr-xr-x  1 yikun  staff    20B Oct 11 11:12 libhmsbeagle.dylib -> libhmsbeagle.1.dylib

@martin-g
Copy link
Contributor Author

Thank you, @Yikun !

@martin-g
Copy link
Contributor Author

martin-g commented Oct 11, 2023

I see that there are some more usages of __ARM64_ARCH_8__ in the plugin related code:

grep -rnHi '__ARM64_ARCH_8__' *
libhmsbeagle/CPU/BeagleCPUPlugin.cpp:20:#ifdef __ARM64_ARCH_8__
libhmsbeagle/CPU/BeagleCPUSSEPlugin.cpp:27:#ifdef __ARM64_ARCH_8__
libhmsbeagle/CPU/BeagleCPUSSEPlugin.cpp:164:#if defined(__ARM64_ARCH_8__)

IMO those should be replaced with __aarch64__ for better compatibility.

@Yikun Could you please do the following test on Mac ARM64:

$ cat > test.c
#if !defined(__aarch64__)
    #error __aarch64__ is not defined
#endif

#if !defined(__arm64__)
    #error __arm64__ is not defined
#endif

#if !defined(__ARM64_ARCH_8__)
    #error __ARM64_ARCH_8__ is not defined
#endif

#if !defined(__APPLE__)
    #error __APPLE__ is not defined
#endif

int main() {
	return 0;
}
Ctrl+D

$ gcc test.c

and paste the output

@Yikun
Copy link

Yikun commented Oct 11, 2023

➜  bioconda-recipes git:(aarch64) ✗ cat test.c
#if !defined(__aarch64__)
    #error __aarch64__ is not defined
#endif

#if !defined(__arm64__)
    #error __arm64__ is not defined
#endif

#if !defined(__ARM64_ARCH_8__)
    #error __ARM64_ARCH_8__ is not defined
#endif

#if !defined(__APPLE__)
    #error __APPLE__ is not defined
#endif

int main() {
        return 0;
}
➜  bioconda-recipes git:(aarch64) ✗ gcc test.c
➜  bioconda-recipes git:(aarch64) ✗ echo $?
0

@martin-g
Copy link
Contributor Author

Thank you again, @Yikun !

Here is the result on Linux ARM64:

gcc test.c
test.c:6:6: error: #error __arm64__ is not defined
    6 |     #error __arm64__ is not defined
      |      ^~~~~
test.c:10:6: error: #error __ARM64_ARCH_8__ is not defined
   10 |     #error __ARM64_ARCH_8__ is not defined
      |      ^~~~~
test.c:14:6: error: #error __APPLE__ is not defined
   14 |     #error __APPLE__ is not defined
      |      ^~~~~

I am going to replace __ARM64_ARCH_8__ with __aarch64__ in the plugin code!

It is supported by both Mac M1/M2 and Linux ARM64

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@martin-g
Copy link
Contributor Author

The CI checks passed on my fork: martin-g#2

@msuchard msuchard merged commit 6d909c2 into beagle-dev:master Oct 13, 2023
@msuchard
Copy link
Member

i can also confirm my M2 that (1) BEAGLE compiles both with and without NEON and that BEAST run-times (as expected) differ btw the generated (with and without NEON) libraries. thank you for your effort here!!!

@martin-g
Copy link
Contributor Author

martin-g commented Oct 13, 2023

Awesome!
May I ask for a new release/tag please ? Thank you!

@msuchard
Copy link
Member

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants