Skip to content

Commit 46f9f88

Browse files
committed
Merge branch 'master' into develop/personal
* master: (350 commits) speculative : ensure draft and target model vocab matches (ggml-org#3812) llama : correctly report GGUFv3 format (ggml-org#3818) simple : fix batch handling (ggml-org#3803) cuda : improve text-generation and batched decoding performance (ggml-org#3776) server : do not release slot on image input (ggml-org#3798) batched-bench : print params at start log : disable pid in log filenames server : add parameter -tb N, --threads-batch N (ggml-org#3584) (ggml-org#3768) server : do not block system prompt update (ggml-org#3767) sync : ggml (conv ops + cuda MSVC fixes) (ggml-org#3765) cmake : add missed dependencies (ggml-org#3763) cuda : add batched cuBLAS GEMM for faster attention (ggml-org#3749) Add more tokenizer tests (ggml-org#3742) metal : handle ggml_scale for n%4 != 0 (close ggml-org#3754) Revert "make : add optional CUDA_NATIVE_ARCH (ggml-org#2482)" issues : separate bug and enhancement template + no default title (ggml-org#3748) Update special token handling in conversion scripts for gpt2 derived tokenizers (ggml-org#3746) llama : remove token functions with `context` args in favor of `model` (ggml-org#3720) Fix baichuan convert script not detecing model (ggml-org#3739) make : add optional CUDA_NATIVE_ARCH (ggml-org#2482) ...
2 parents 855b808 + 41aee4d commit 46f9f88

File tree

210 files changed

+48517
-18053
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

210 files changed

+48517
-18053
lines changed

.clang-tidy

+5
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ Checks: >
33
bugprone-*,
44
-bugprone-easily-swappable-parameters,
55
-bugprone-implicit-widening-of-multiplication-result,
6+
-bugprone-misplaced-widening-cast,
67
-bugprone-narrowing-conversions,
78
readability-*,
89
-readability-avoid-unconditional-preprocessor-if,
@@ -15,4 +16,8 @@ Checks: >
1516
-clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling,
1617
performance-*,
1718
portability-*,
19+
misc-*,
20+
-misc-const-correctness,
21+
-misc-non-private-member-variables-in-classes,
22+
-misc-no-recursion,
1823
FormatStyle: none

.devops/cloud-v-pipeline

+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
node('x86_runner1'){ // Running on x86 runner containing latest vector qemu, latest vector gcc and all the necessary libraries
2+
stage('Cleanup'){
3+
cleanWs() // Cleaning previous CI build in workspace
4+
}
5+
stage('checkout repo'){
6+
retry(5){ // Retry if the cloning fails due to some reason
7+
checkout scm // Clone the repo on Runner
8+
}
9+
}
10+
stage('Compiling llama.cpp'){
11+
sh'''#!/bin/bash
12+
make RISCV=1 RISCV_CROSS_COMPILE=1 # Compiling llama for RISC-V
13+
'''
14+
}
15+
stage('Running llama.cpp'){
16+
sh'''#!/bin/bash
17+
module load gnu-bin2/0.1 # loading latest versions of vector qemu and vector gcc
18+
qemu-riscv64 -L /softwares/gnu-bin2/sysroot -cpu rv64,v=true,vlen=256,elen=64,vext_spec=v1.0 ./main -m /home/alitariq/codellama-7b.Q4_K_M.gguf -p "Anything" -n 9 > llama_log.txt # Running llama.cpp on vector qemu-riscv64
19+
cat llama_log.txt # Printing results
20+
'''
21+
}
22+
}

.devops/full-cuda.Dockerfile

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ FROM ${BASE_CUDA_DEV_CONTAINER} as build
1212
ARG CUDA_DOCKER_ARCH=all
1313

1414
RUN apt-get update && \
15-
apt-get install -y build-essential python3 python3-pip
15+
apt-get install -y build-essential python3 python3-pip git
1616

1717
COPY requirements.txt requirements.txt
1818

.devops/lamma-cpp-clblast.srpm.spec .devops/llama-cpp-clblast.srpm.spec

+35-9
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,13 @@
1313
# It is up to the user to install the correct vendor-specific support.
1414

1515
Name: llama.cpp-clblast
16-
Version: master
16+
Version: %( date "+%%Y%%m%%d" )
1717
Release: 1%{?dist}
18-
Summary: OpenCL Inference of LLaMA model in pure C/C++
18+
Summary: OpenCL Inference of LLaMA model in C/C++
1919
License: MIT
2020
Source0: https://github.com/ggerganov/llama.cpp/archive/refs/heads/master.tar.gz
21-
BuildRequires: coreutils make gcc-c++ git mesa-libOpenCL-devel
21+
BuildRequires: coreutils make gcc-c++ git mesa-libOpenCL-devel clblast-devel
22+
Requires: clblast
2223
URL: https://github.com/ggerganov/llama.cpp
2324

2425
%define debug_package %{nil}
@@ -35,18 +36,43 @@ make -j LLAMA_CLBLAST=1
3536

3637
%install
3738
mkdir -p %{buildroot}%{_bindir}/
38-
cp -p main %{buildroot}%{_bindir}/llamacppclblast
39-
cp -p server %{buildroot}%{_bindir}/llamacppclblastserver
40-
cp -p simple %{buildroot}%{_bindir}/llamacppclblastsimple
39+
cp -p main %{buildroot}%{_bindir}/llamaclblast
40+
cp -p server %{buildroot}%{_bindir}/llamaclblastserver
41+
cp -p simple %{buildroot}%{_bindir}/llamaclblastsimple
42+
43+
mkdir -p %{buildroot}/usr/lib/systemd/system
44+
%{__cat} <<EOF > %{buildroot}/usr/lib/systemd/system/llamaclblast.service
45+
[Unit]
46+
Description=Llama.cpp server, CPU only (no GPU support in this build).
47+
After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.target
48+
49+
[Service]
50+
Type=simple
51+
EnvironmentFile=/etc/sysconfig/llama
52+
ExecStart=/usr/bin/llamaclblastserver $LLAMA_ARGS
53+
ExecReload=/bin/kill -s HUP $MAINPID
54+
Restart=never
55+
56+
[Install]
57+
WantedBy=default.target
58+
EOF
59+
60+
mkdir -p %{buildroot}/etc/sysconfig
61+
%{__cat} <<EOF > %{buildroot}/etc/sysconfig/llama
62+
LLAMA_ARGS="-m /opt/llama2/ggml-model-f32.bin"
63+
EOF
4164

4265
%clean
4366
rm -rf %{buildroot}
4467
rm -rf %{_builddir}/*
4568

4669
%files
47-
%{_bindir}/llamacppclblast
48-
%{_bindir}/llamacppclblastserver
49-
%{_bindir}/llamacppclblastsimple
70+
%{_bindir}/llamaclblast
71+
%{_bindir}/llamaclblastserver
72+
%{_bindir}/llamaclblastsimple
73+
/usr/lib/systemd/system/llamaclblast.service
74+
%config /etc/sysconfig/llama
75+
5076

5177
%pre
5278

.devops/lamma-cpp-cublas.srpm.spec .devops/llama-cpp-cublas.srpm.spec

+25-1
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
# It is up to the user to install the correct vendor-specific support.
1414

1515
Name: llama.cpp-cublas
16-
Version: master
16+
Version: %( date "+%%Y%%m%%d" )
1717
Release: 1%{?dist}
1818
Summary: CPU Inference of LLaMA model in pure C/C++ (no CUDA/OpenCL)
1919
License: MIT
@@ -40,6 +40,28 @@ cp -p main %{buildroot}%{_bindir}/llamacppcublas
4040
cp -p server %{buildroot}%{_bindir}/llamacppcublasserver
4141
cp -p simple %{buildroot}%{_bindir}/llamacppcublassimple
4242

43+
mkdir -p %{buildroot}/usr/lib/systemd/system
44+
%{__cat} <<EOF > %{buildroot}/usr/lib/systemd/system/llamacublas.service
45+
[Unit]
46+
Description=Llama.cpp server, CPU only (no GPU support in this build).
47+
After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.target
48+
49+
[Service]
50+
Type=simple
51+
EnvironmentFile=/etc/sysconfig/llama
52+
ExecStart=/usr/bin/llamacppcublasserver $LLAMA_ARGS
53+
ExecReload=/bin/kill -s HUP $MAINPID
54+
Restart=never
55+
56+
[Install]
57+
WantedBy=default.target
58+
EOF
59+
60+
mkdir -p %{buildroot}/etc/sysconfig
61+
%{__cat} <<EOF > %{buildroot}/etc/sysconfig/llama
62+
LLAMA_ARGS="-m /opt/llama2/ggml-model-f32.bin"
63+
EOF
64+
4365
%clean
4466
rm -rf %{buildroot}
4567
rm -rf %{_builddir}/*
@@ -48,6 +70,8 @@ rm -rf %{_builddir}/*
4870
%{_bindir}/llamacppcublas
4971
%{_bindir}/llamacppcublasserver
5072
%{_bindir}/llamacppcublassimple
73+
/usr/lib/systemd/system/llamacublas.service
74+
%config /etc/sysconfig/llama
5175

5276
%pre
5377

.devops/llama-cpp.srpm.spec

+36-9
Original file line numberDiff line numberDiff line change
@@ -6,47 +6,74 @@
66
# Notes for llama.cpp:
77
# 1. Tags are currently based on hash - which will not sort asciibetically.
88
# We need to declare standard versioning if people want to sort latest releases.
9+
# In the meantime, YYYYMMDD format will be used.
910
# 2. Builds for CUDA/OpenCL support are separate, with different depenedencies.
1011
# 3. NVidia's developer repo must be enabled with nvcc, cublas, clblas, etc installed.
1112
# Example: https://developer.download.nvidia.com/compute/cuda/repos/fedora37/x86_64/cuda-fedora37.repo
1213
# 4. OpenCL/CLBLAST support simply requires the ICD loader and basic opencl libraries.
1314
# It is up to the user to install the correct vendor-specific support.
1415

1516
Name: llama.cpp
16-
Version: master
17+
Version: %( date "+%%Y%%m%%d" )
1718
Release: 1%{?dist}
1819
Summary: CPU Inference of LLaMA model in pure C/C++ (no CUDA/OpenCL)
1920
License: MIT
2021
Source0: https://github.com/ggerganov/llama.cpp/archive/refs/heads/master.tar.gz
21-
BuildRequires: coreutils make gcc-c++ git
22+
BuildRequires: coreutils make gcc-c++ git libstdc++-devel
23+
Requires: libstdc++
2224
URL: https://github.com/ggerganov/llama.cpp
2325

2426
%define debug_package %{nil}
2527
%define source_date_epoch_from_changelog 0
2628

2729
%description
2830
CPU inference for Meta's Lllama2 models using default options.
31+
Models are not included in this package and must be downloaded separately.
2932

3033
%prep
31-
%autosetup
34+
%setup -n llama.cpp-master
3235

3336
%build
3437
make -j
3538

3639
%install
3740
mkdir -p %{buildroot}%{_bindir}/
38-
cp -p main %{buildroot}%{_bindir}/llamacpp
39-
cp -p server %{buildroot}%{_bindir}/llamacppserver
40-
cp -p simple %{buildroot}%{_bindir}/llamacppsimple
41+
cp -p main %{buildroot}%{_bindir}/llama
42+
cp -p server %{buildroot}%{_bindir}/llamaserver
43+
cp -p simple %{buildroot}%{_bindir}/llamasimple
44+
45+
mkdir -p %{buildroot}/usr/lib/systemd/system
46+
%{__cat} <<EOF > %{buildroot}/usr/lib/systemd/system/llama.service
47+
[Unit]
48+
Description=Llama.cpp server, CPU only (no GPU support in this build).
49+
After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.target
50+
51+
[Service]
52+
Type=simple
53+
EnvironmentFile=/etc/sysconfig/llama
54+
ExecStart=/usr/bin/llamaserver $LLAMA_ARGS
55+
ExecReload=/bin/kill -s HUP $MAINPID
56+
Restart=never
57+
58+
[Install]
59+
WantedBy=default.target
60+
EOF
61+
62+
mkdir -p %{buildroot}/etc/sysconfig
63+
%{__cat} <<EOF > %{buildroot}/etc/sysconfig/llama
64+
LLAMA_ARGS="-m /opt/llama2/ggml-model-f32.bin"
65+
EOF
4166

4267
%clean
4368
rm -rf %{buildroot}
4469
rm -rf %{_builddir}/*
4570

4671
%files
47-
%{_bindir}/llamacpp
48-
%{_bindir}/llamacppserver
49-
%{_bindir}/llamacppsimple
72+
%{_bindir}/llama
73+
%{_bindir}/llamaserver
74+
%{_bindir}/llamasimple
75+
/usr/lib/systemd/system/llama.service
76+
%config /etc/sysconfig/llama
5077

5178
%pre
5279

.devops/main-cuda.Dockerfile

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ FROM ${BASE_CUDA_DEV_CONTAINER} as build
1212
ARG CUDA_DOCKER_ARCH=all
1313

1414
RUN apt-get update && \
15-
apt-get install -y build-essential
15+
apt-get install -y build-essential git
1616

1717
WORKDIR /app
1818

.devops/tools.sh

+4-7
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,12 @@ arg1="$1"
77
# Shift the arguments to remove the first one
88
shift
99

10-
# Join the remaining arguments into a single string
11-
arg2="$@"
12-
1310
if [[ "$arg1" == '--convert' || "$arg1" == '-c' ]]; then
14-
python3 ./convert.py "$arg2"
11+
python3 ./convert.py "$@"
1512
elif [[ "$arg1" == '--quantize' || "$arg1" == '-q' ]]; then
16-
./quantize "$arg2"
13+
./quantize "$@"
1714
elif [[ "$arg1" == '--run' || "$arg1" == '-r' ]]; then
18-
./main "$arg2"
15+
./main "$@"
1916
elif [[ "$arg1" == '--all-in-one' || "$arg1" == '-a' ]]; then
2017
echo "Converting PTH to GGML..."
2118
for i in `ls $1/$2/ggml-model-f16.bin*`; do
@@ -27,7 +24,7 @@ elif [[ "$arg1" == '--all-in-one' || "$arg1" == '-a' ]]; then
2724
fi
2825
done
2926
elif [[ "$arg1" == '--server' || "$arg1" == '-s' ]]; then
30-
./server "$arg2"
27+
./server "$@"
3128
else
3229
echo "Unknown command: $arg1"
3330
echo "Available commands: "

.dockerignore

+3
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
*.o
22
*.a
33
.cache/
4+
.git/
5+
.github/
6+
.gitignore
47
.vs/
58
.vscode/
69
.DS_Store

.editorconfig

+3
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,6 @@ indent_style = tab
1717

1818
[prompts/*.txt]
1919
insert_final_newline = unset
20+
21+
[examples/server/public/*]
22+
indent_size = 2

.github/ISSUE_TEMPLATE/custom.md .github/ISSUE_TEMPLATE/bug.md

+4-5
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
---
2-
name: Issue and enhancement template
3-
about: Used to report issues and request enhancements for llama.cpp
4-
title: "[User] Insert summary of your issue or enhancement.."
5-
labels: ''
2+
name: Bug template
3+
about: Used to report bugs in llama.cpp
4+
labels: ["bug"]
65
assignees: ''
76

87
---
@@ -46,7 +45,7 @@ $ g++ --version
4645

4746
# Failure Information (for bugs)
4847

49-
Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.
48+
Please help provide information about the failure / bug.
5049

5150
# Steps to Reproduce
5251

.github/ISSUE_TEMPLATE/enhancement.md

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
---
2+
name: Enhancement template
3+
about: Used to request enhancements for llama.cpp
4+
labels: ["enhancement"]
5+
assignees: ''
6+
7+
---
8+
9+
# Prerequisites
10+
11+
Please answer the following questions for yourself before submitting an issue.
12+
13+
- [ ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
14+
- [ ] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
15+
- [ ] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
16+
- [ ] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new bug or useful enhancement to share.
17+
18+
# Feature Description
19+
20+
Please provide a detailed written description of what you were trying to do, and what you expected `llama.cpp` to do as an enhancement.
21+
22+
# Motivation
23+
24+
Please provide a detailed written description of reasons why this feature is necessary and how it is useful to `llama.cpp` users.
25+
26+
# Possible Implementation
27+
28+
If you have an idea as to how it can be implemented, please write a detailed description. Feel free to give links to external sources or share visuals that might be helpful to understand the details better.

0 commit comments

Comments
 (0)