Note: Current tutorial might be outdated, I will write a new version later
llama.cpp link: https://github.com/ggerganov/llama.cpp
Official Website: termux.
Change repo for faster speed (optional):
termux-change-repo
Check here for more help.
Download following packages in termux:
pkg install clang wget git cmake
Obtain llama.cpp source code:
git clone https://github.com/ggerganov/llama.cpp.git
Typetermux-setup-storage
in termux terminal before importing model. Grant access for termux so that user could access files outside of termux. For details, please visit: https://wiki.termux.com/wiki/Termux-setup-storage
Use adb push
command to import:
adb push \path\to\your\model\on\windows /storage/emulated/0/download
~/storage/downloads
in termux home directory shares download files on Android system. Move it to ~/llama.cpp/models
mv ~/storage/downloads/model_name ~/llama.cpp/models
Strongly recommend to use cmake rather than make
Location: https://github.com/lzhiyong/termux-ndk/releases/tag/ndk-r23/
wget https://github.com/lzhiyong/termux-ndk/releases/download/ndk-r23/android-ndk-r23c-aarch64.zip
Unzip and set NDK PATH:
unzip YOUR_ANDROID_NDK_ZIP_FILE
export NDK=~/path/to/your/unzip/directory
Build under ~/llama.cpp/build
:
mkdir build
cd build
cmake -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-23 -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod ..
make
Run:
cd bin/
./main YOUR_PARAMETERS
Download necessary packages:
apt install ocl-icd opencl-headers opencl-clhpp clinfo libopenblas
Manually compile CLBlast and copy clblast.h
into llama.cpp:
git clone https://github.com/CNugteren/CLBlast.git
cd CLBlast
cmake .
make
cp libclblast.so* $PREFIX/lib
cp ./include/clblast.h ../llama.cpp
Copy OpenBLAS files to llama.cpp:
cp /data/data/com.termux/files/usr/include/openblas/cblas.h .
cp /data/data/com.termux/files/usr/include/openblas/openblas_config.h .
cd ~/llama.cpp
mkdir build
cd build
cmake .. -DLLAMA_CLBLAST=ON
cmake --build . --config Release
Add LD_LIBRARY_PATH
under ~/.bashrc
(Run program directly on physical GPU):
echo "export LD_LIBRARY_PATH=/vendor/lib64:$LD_LIBRARY_PATH" >> ~/.bashrc
Check GPU is available for OpenCL:
clinfo -l
If everything works fine, for Qualcomm Snapdragon SoC, it will display:
Platform #0: QUALCOMM Snapdragon(TM)
`-- Device #0: QUALCOMM Adreno(TM)
Run:
cd bin/
./main YOUR_PARAMETERS
- SoC: Qualcomm Snapdragon 8 Gen 2
- RAM: 16 GB
- Model: llama-2-7B-Chat-Q4_0.gguf(Download)
- Multiple long conversations
- Params:
- Context size = 4096
- Batch size = 16
- Threads = 4
Result:
- Load time = 1129.17 ms
- 3.67 tokens per second