Skip to content

Chat support #100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 54 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
cf9a5aa
update the version
Mar 14, 2025
890dc1a
updating to new version of llamacpp
Mar 15, 2025
554d589
Merge branch 'master' of https://github.com/vaiju1981/java-llama.cpp.git
Mar 18, 2025
8a7923a
Merge branch 'master' of https://github.com/vaiju1981/java-llama.cpp
Mar 18, 2025
562dbfe
remove merge conflict
Mar 18, 2025
6b17d08
adding chat support
Mar 18, 2025
a2551dc
adding detailed tests for chat.
Mar 18, 2025
bb50995
setting temp to 0 to make sure consistent output.
Mar 18, 2025
f41fc8c
Ignoring fixed test
Mar 19, 2025
2a5a1b1
adding tool support and chat completions
Mar 19, 2025
f8bb268
code update
Mar 22, 2025
8b0973b
updating the yaml
Mar 22, 2025
c9515bf
setting temperature to 0
Mar 22, 2025
b3a1d65
adding chatFormat to avoid grammar issue
Mar 22, 2025
b56d4c5
trying one more time
Mar 22, 2025
48e14a1
code update for chat
Mar 22, 2025
bb680e5
updating multi-turn test
Mar 23, 2025
744beec
updating model and tests.
Mar 24, 2025
8de2503
fixed the fixed_test
Mar 24, 2025
2af33e2
enabling tool support
Mar 24, 2025
de3df06
ignore tool test
Mar 24, 2025
e7991a2
updating the workflow
Mar 24, 2025
2ae7cd8
updating the multi-turn test
Mar 24, 2025
db6d6a8
moving embedding to separate test suite
Mar 24, 2025
30908a2
adding sysout to check which test is failing
Mar 24, 2025
44a0e71
moving grammar to completions handle
Mar 24, 2025
363b3e0
updating code
Mar 25, 2025
0633df1
adding check for error json
Mar 25, 2025
8f52c90
updating multi-turn test
Mar 25, 2025
24cd359
setting a longer response
Mar 25, 2025
ab0f6e0
adding sysout to check the output.
Mar 25, 2025
c452bd7
reducing size to 50 tokens
Mar 25, 2025
cc78390
trying one more time
Mar 25, 2025
851c50d
missed commit.
Mar 25, 2025
7750636
updating code.
Mar 25, 2025
fd036c6
fixing code to simplify things
Mar 25, 2025
119a4ac
updating the model
Mar 25, 2025
053f7f7
asking for 100 tokens as opposed to 50
Mar 25, 2025
d15553c
trying one more time
Mar 25, 2025
0b3bd5f
ignoring the failed test.
Mar 25, 2025
1d1dbea
ignoring another test
Mar 25, 2025
7c0478b
Ignoring Grammar test.
Mar 25, 2025
a97ae5c
reverting pom.xml changes.
Mar 25, 2025
11ed103
enable tool test
Mar 25, 2025
b379eb3
ading KV Tests
Mar 26, 2025
29bef1a
adding parallel inference code
Mar 26, 2025
ab3e840
adding context size
Mar 26, 2025
014901e
adding context.
Mar 26, 2025
bfff111
removing GPU layers
Mar 26, 2025
c33bbd8
making a smaller prompt
Mar 26, 2025
ec3c717
adding GPU layers for macos-14
Mar 26, 2025
d33680c
updating test to match llama.cpp
Mar 26, 2025
0cfdb89
updating test
Mar 26, 2025
0f09c39
updating model path
Mar 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 50 additions & 8 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,17 @@ on:
- pull_request
- workflow_dispatch
env:
MODEL_URL: https://huggingface.co/TheBloke/CodeLlama-7B-GGUF/resolve/main/codellama-7b.Q2_K.gguf
MODEL_NAME: codellama-7b.Q2_K.gguf

REASONING_MODEL_URL: https://huggingface.co/ggml-org/models/resolve/main/tinyllamas/stories260K.gguf
REASONING_MODEL_NAME: stories260K.gguf
INFILL_MODEL_URL: https://huggingface.co/ggml-org/models/resolve/main/tinyllamas/stories260K-infill.gguf
INFILL_MODEL_NAME: stories260K-infill.gguf
MOE_MODEL_URL: https://huggingface.co/ggml-org/stories15M_MOE/resolve/main/stories15M_MOE-F16.gguf
MOE_MODEL_NAME: stories15M_MOE-F16.gguf
RERANKING_MODEL_URL: https://huggingface.co/gpustack/jina-reranker-v1-tiny-en-GGUF/resolve/main/jina-reranker-v1-tiny-en-Q4_0.gguf
RERANKING_MODEL_NAME: jina-reranker-v1-tiny-en-Q4_0.gguf
EMBEDDING_MODEL_URL: https://huggingface.co/ggml-org/models/resolve/main/bert-bge-small/ggml-model-f16.gguf
EMBEDDING_MODEL_NAME: ggml-model-f16.gguf
jobs:

build-and-test-linux:
Expand All @@ -23,10 +30,21 @@ jobs:
run: |
mvn compile
.github/build.sh -DLLAMA_VERBOSE=ON
- name: Download text generation model
run: curl -L ${MODEL_URL} --create-dirs -o models/${MODEL_NAME}
- name: Download reranking model
run: curl -L ${RERANKING_MODEL_URL} --create-dirs -o models/${RERANKING_MODEL_NAME}

- name: Download reasoning calling model
run: curl -L ${REASONING_MODEL_URL} --create-dirs -o models/${REASONING_MODEL_NAME}

- name: Download infill calling model
run: curl -L ${INFILL_MODEL_URL} --create-dirs -o models/${INFILL_MODEL_NAME}

- name: Download MOE model
run: curl -L ${MOE_MODEL_URL} --create-dirs -o models/${MOE_MODEL_NAME}

- name: Download EMBEDDING model
run: curl -L ${EMBEDDING_MODEL_URL} --create-dirs -o models/${EMBEDDING_MODEL_NAME}

- name: List files in models directory
run: ls -l models/
- name: Run tests
Expand Down Expand Up @@ -59,10 +77,22 @@ jobs:
run: |
mvn compile
.github/build.sh ${{ matrix.target.cmake }}
- name: Download text generaton model model
run: curl -L ${MODEL_URL} --create-dirs -o models/${MODEL_NAME}

- name: Download reranking model
run: curl -L ${RERANKING_MODEL_URL} --create-dirs -o models/${RERANKING_MODEL_NAME}

- name: Download reasoning calling model
run: curl -L ${REASONING_MODEL_URL} --create-dirs -o models/${REASONING_MODEL_NAME}

- name: Download infill calling model
run: curl -L ${INFILL_MODEL_URL} --create-dirs -o models/${INFILL_MODEL_NAME}

- name: Download MOE model
run: curl -L ${MOE_MODEL_URL} --create-dirs -o models/${MOE_MODEL_NAME}

- name: Download EMBEDDING model
run: curl -L ${EMBEDDING_MODEL_URL} --create-dirs -o models/${EMBEDDING_MODEL_NAME}

- name: List files in models directory
run: ls -l models/
- name: Run tests
Expand All @@ -87,10 +117,22 @@ jobs:
run: |
mvn compile
.github\build.bat -DLLAMA_VERBOSE=ON
- name: Download model
run: curl -L $env:MODEL_URL --create-dirs -o models/$env:MODEL_NAME

- name: Download reranking model
run: curl -L $env:RERANKING_MODEL_URL --create-dirs -o models/$env:RERANKING_MODEL_NAME

- name: Download reasoning calling model
run: curl -L $env:REASONING_MODEL_URL --create-dirs -o models/$env:REASONING_MODEL_NAME

- name: Download infill calling model
run: curl -L $env:INFILL_MODEL_URL --create-dirs -o models/$env:INFILL_MODEL_NAME

- name: Download MOE model
run: curl -L $env:MOE_MODEL_URL --create-dirs -o models/$env:MOE_MODEL_NAME

- name: Download EMBEDDING model
run: curl -L $env:EMBEDDING_MODEL_URL --create-dirs -o models/$env:EMBEDDING_MODEL_NAME

- name: List files in models directory
run: ls -l models/
- name: Run tests
Expand Down
27 changes: 22 additions & 5 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,16 @@ on:
release:
types: [ created ]
env:
MODEL_URL: "https://huggingface.co/TheBloke/CodeLlama-7B-GGUF/resolve/main/codellama-7b.Q2_K.gguf"
MODEL_NAME: "codellama-7b.Q2_K.gguf"
REASONING_MODEL_URL: "https://huggingface.co/ggml-org/models/resolve/main/tinyllamas/stories260K.gguf"
REASONING_MODEL_NAME: "stories260K.gguf"
INFILL_MODEL_URL: "https://huggingface.co/ggml-org/models/resolve/main/tinyllamas/stories260K-infill.gguf"
INFILL_MODEL_NAME: "stories260K-infill.gguf"
MOE_MODEL_URL: "https://huggingface.co/ggml-org/stories15M_MOE/resolve/main/stories15M_MOE-F16.gguf"
MOE_MODEL_NAME: "stories15M_MOE-F16.gguf"
RERANKING_MODEL_URL: "https://huggingface.co/gpustack/jina-reranker-v1-tiny-en-GGUF/resolve/main/jina-reranker-v1-tiny-en-Q4_0.gguf"
RERANKING_MODEL_NAME: "jina-reranker-v1-tiny-en-Q4_0.gguf"
EMBEDDING_MODEL_URL: "https://huggingface.co/ggml-org/models/resolve/main/bert-bge-small/ggml-model-f16.gguf"
EMBEDDING_MODEL_NAME: "ggml-model-f16.gguf"
jobs:

# todo: doesn't work with the newest llama.cpp version
Expand Down Expand Up @@ -146,10 +152,21 @@ jobs:
with:
name: Linux-x86_64-libraries
path: ${{ github.workspace }}/src/main/resources/de/kherud/llama/
- name: Download text generation model
run: curl -L ${MODEL_URL} --create-dirs -o models/${MODEL_NAME}
- name: Download reranking model

- name: Download reranking model
run: curl -L ${RERANKING_MODEL_URL} --create-dirs -o models/${RERANKING_MODEL_NAME}

- name: Download reasoning calling model
run: curl -L ${REASONING_MODEL_URL} --create-dirs -o models/${REASONING_MODEL_NAME}

- name: Download infill calling model
run: curl -L ${INFILL_MODEL_URL} --create-dirs -o models/${INFILL_MODEL_NAME}

- name: Download MOE model
run: curl -L ${MOE_MODEL_URL} --create-dirs -o models/${MOE_MODEL_NAME}

- name: Download EMBEDDING model
run: curl -L ${EMBEDDING_MODEL_URL} --create-dirs -o models/${EMBEDDING_MODEL_NAME}
- uses: actions/setup-java@v4
with:
distribution: 'zulu'
Expand Down
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -42,4 +42,6 @@ src/test/resources/**/*.gbnf

**/*.etag
**/*.lastModified
src/main/cpp/llama.cpp/
src/main/cpp/llama.cpp/
/.classpath
/.project
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ set(LLAMA_BUILD_COMMON ON)
FetchContent_Declare(
llama.cpp
GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
GIT_TAG b4916
GIT_TAG b4940
)
FetchContent_MakeAvailable(llama.cpp)

Expand Down
12 changes: 11 additions & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

<groupId>de.kherud</groupId>
<artifactId>llama</artifactId>
<version>4.1.0</version>
<version>4.1.1</version>
<packaging>jar</packaging>

<name>${project.groupId}:${project.artifactId}</name>
Expand Down Expand Up @@ -65,6 +65,16 @@
<version>24.1.0</version>
<scope>compile</scope>
</dependency>

<!--
https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-databind -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.18.3</version>
</dependency>


</dependencies>

<build>
Expand Down
Loading
Loading