Replies: 1 comment 3 replies
-
It has to be configured at compile time. The default for Intel is AVX2. With CMake it is pretty easy to cross-compile: On Ubuntu aarch64: sudo apt install gcc-x86-64-linux-gnu g++-x86-64-linux-gnu Create a toolchain file: # the name of the target operating system
set(CMAKE_SYSTEM_NAME Linux)
set(CMAKE_SYSTEM_PROCESSOR x86_64)
# which compilers to use for C and C++
set(CMAKE_C_COMPILER /usr/bin/x86_64-linux-gnu-gcc)
set(CMAKE_CXX_COMPILER /usr/bin/x86_64-linux-gnu-g++)
# where is the target environment located
set(CMAKE_FIND_ROOT_PATH /usr/x86_64-linux-gnu)
# adjust the default behavior of the FIND_XXX() commands:
# search programs in the host environment
set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
# search headers and libraries in the target environment
set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY) Then you can configure like so: cmake -B build-x86_64-noavx -DCMAKE_TOOLCHAIN_FILE=~/x86_64-toolchain.cmake -DLLAMA_AVX=OFF -DLLAMA_AVX2=OFF
cmake --build build-x86_64-noavx --parallel 4
cmake -B build-x86_64-avx -DCMAKE_TOOLCHAIN_FILE=~/x86_64-toolchain.cmake -DLLAMA_AVX=ON -DLLAMA_AVX2=OFF
cmake --build build-x86_64-avx --parallel 4
cmake -B build-x86_64-avx2 -DCMAKE_TOOLCHAIN_FILE=~/x86_64-toolchain.cmake -DLLAMA_AVX=ON -DLLAMA_AVX2=ON
cmake --build build-x86_64-avx2 --parallel 4
cmake -B build-x86_64-avx512 -DCMAKE_TOOLCHAIN_FILE=~/x86_64-toolchain.cmake -DLLAMA_AVX512=ON -DCMAKE_C_FLAGS=-march=skylake-avx512
cmake --build build-x86_64-avx512 --parallel 4 You get different builds. Adjust the variables as necessary. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I notice that the code is set to identify the available instruction sets. However, am I right in thinking that this is done at compile-time?
I am compiling on Linux ARM but targeting various Windows x86_64 Intel CPUs, for which I will have the parameter lists generated by gcc.exe. So I can target the specific architecture during compilation exactly, if I want to. However, doing this adds a lot of complexity to my distribution, it would be much easier just to target a broad-range of CPUs.
My question is: if I don't enable those various instruction sets as compiler flags, they will or will not be later detected and used correctly during inference (if existing on the executing architecture)?
Beta Was this translation helpful? Give feedback.
All reactions