Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tool call support (Llama 3.x, Functionary v3, Hermes 2 Pro, Mistral Nemo, generic) w/ lazy grammars #9639

Draft
wants to merge 267 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
267 commits
Select commit Hold shift + click to select a range
1b62801
fix editorconfig lints
ochafik Sep 26, 2024
76d2938
fix flake8 lints
ochafik Sep 26, 2024
c124ab4
`minja`: add str.endswith
ochafik Sep 26, 2024
595e11c
`tool-call`: fix/test functionary v3
ochafik Sep 26, 2024
94377d7
`server`: catch errors in format_final_response_oaicompat instead of …
ochafik Sep 26, 2024
059babd
`minja`: try to please gcc
ochafik Sep 26, 2024
4cd82d6
`tool-call`: fix pyright type errors
ochafik Sep 26, 2024
2eb29bf
`tool-call`: update chat templates/goldens
ochafik Sep 26, 2024
5f5be9c
`minja`: gcc tweaks
ochafik Sep 26, 2024
8e4a9ba
`minja`: allow none input to selectattr, and add safe passthrough filter
ochafik Sep 26, 2024
0c87013
`tool-call`: test/fix functionary-medium-v3.1's template (can "look" …
ochafik Sep 26, 2024
749a21c
gcc appeasement
ochafik Sep 26, 2024
3d2650c
fix gcc build
ochafik Sep 26, 2024
d7ec84f
`tool-call`: allow <|python_tag|> in functionary-medium-3.1
ochafik Sep 26, 2024
cf7bece
`tool-call`: factor chat template away from legacy API
ochafik Sep 26, 2024
9cfe4d7
`tool-call`: refactor llama_chat_template class + use in validate_mod…
ochafik Sep 26, 2024
296331b
`minja`: update chat template goldens w/ llama.3.1 arguments workaround
ochafik Sep 26, 2024
50685f8
`minja`: add str.title()
ochafik Sep 26, 2024
5840e10
`tool-call`: merge & fix jinja template tests into test-chat-template
ochafik Sep 26, 2024
2926089
fix lints
ochafik Sep 26, 2024
c88c932
fix gcc error + lint
ochafik Sep 26, 2024
10f9fe8
`tool-call`: fix tool call return format
ochafik Sep 26, 2024
8299fac
`tool-call`: adapt very simple agent + docker isolation from https://…
ochafik Sep 26, 2024
f9c1743
`minja`: fix iterables
ochafik Sep 27, 2024
1e5c0e7
`chat-template`: fix jinja tests (make safe a passthrough)
ochafik Sep 27, 2024
9295ca9
`tool-call`: fix agent type lints
ochafik Sep 27, 2024
27cd07a
`json`: fix grammar conversion typo
ochafik Sep 27, 2024
6610ecf
`server`: rm bad debug code
ochafik Sep 27, 2024
0abfa36
`tool-call`: move usage examples to examples/agent
ochafik Sep 27, 2024
f62e688
`tool-call`: fix crash / test non-tool call case (added llama_sampler…
ochafik Sep 27, 2024
e33b342
`tool-call`: fix passing of tools to template + allow agent to finish
ochafik Sep 27, 2024
e62b5de
`tool-call`: fix functionary-small-3.2 (first tool starts w/ name\n, …
ochafik Sep 27, 2024
86e4f99
Update README.md
ochafik Sep 27, 2024
2f25ee3
Update README.md
ochafik Sep 27, 2024
0093a5e
`minja`: fix identifiers parsing (when start w/ not/is/etc) and lstri…
ochafik Sep 27, 2024
701b664
`minja`: add `indent` filter to support command-r-plus's chat templates
ochafik Sep 27, 2024
887951b
`minja`: generate chat goldens w/ fixed date to support Llama-3.2-3B-…
ochafik Sep 27, 2024
0c85bc7
`tool-call`: test tool call style detection
ochafik Sep 28, 2024
d983516
`tool-call`: let the tool call handler expand chat template, moving b…
ochafik Sep 28, 2024
8b2cf35
`tool-call`: fix grammar trigger crash
ochafik Sep 28, 2024
7cef90c
`tool-call`: more eager function call parsing for Functionary & Llama…
ochafik Sep 28, 2024
55cf337
`tool-call`: better error reporting for server tests
ochafik Sep 28, 2024
c657857
`tool-call`: cleanup tools.py
ochafik Sep 28, 2024
6e0053a
`chat-template`: enumerate files w/ C API rather than private using s…
ochafik Sep 28, 2024
05bbba9
`tool-call`: only match json eagerly for Llama 3.2
ochafik Sep 28, 2024
ef2a020
`tool-call`: make agent async
ochafik Sep 28, 2024
e6be59c
`antiprompts`: fix gcc8 build (avoid recursive struct)
ochafik Sep 28, 2024
9358d1f
`minja`: fix gcc8 build of test
ochafik Sep 28, 2024
1b32ac1
`chat-template`: fix test-arg
ochafik Sep 28, 2024
0ae1112
`agent`: try to fix pyright lint
ochafik Sep 28, 2024
dbda025
`tool-call`: test messages -> template -> grammar -> tool call parser
ochafik Sep 28, 2024
b10ef04
`chat-template`: tweak --chat-template error message when --jinja is set
ochafik Sep 28, 2024
bc3e0c0
`tool-call`: Qwen 2.5 Instruct also requires object arguments
ochafik Sep 28, 2024
a072f30
`tests`: attempt to find assets for tests run from build subfolder
ochafik Sep 28, 2024
ad6719e
`tests`: fix typo
ochafik Sep 28, 2024
22493c8
`tests`: fix test-chat-template run from build
ochafik Sep 28, 2024
c87c121
`tool-call`: fix memory leak in test
ochafik Sep 28, 2024
8738d94
`minja`: qualify std::nullptr_t type for msys2 build
ochafik Sep 28, 2024
cb7912e
`chat-template`: add phi-3.5-vision-instruct
ochafik Sep 28, 2024
9ac4b04
`tool-call`: add fs_list_files to common, w/ win32 impl for msys2 build
ochafik Sep 28, 2024
277f385
`minja`: attempt to handle windows' crlf
ochafik Sep 30, 2024
0fc5ad7
`minja`: avoid c++20 struct initializers in test
ochafik Sep 30, 2024
d9451fd
`antiprompts`: avoid c++20 struct initializers in test
ochafik Sep 30, 2024
c36a196
`tool-call`: prepare possible externalization of minja + factor tool …
ochafik Oct 1, 2024
c76b145
`tool-call`: fix Makefile
ochafik Oct 1, 2024
5b01402
`agent`: add brave_search & fetch_page tools + move to examples/agent…
ochafik Oct 2, 2024
f3538e7
update tools
ochafik Oct 2, 2024
9e502e8
`tool-call`: promote getting chat templates w/ dedicated script rathe…
ochafik Oct 2, 2024
b559d64
Update README.md
ochafik Oct 2, 2024
2428b73
`agent`: ditch openai dependency, use cache_prompt and expose seed
ochafik Oct 2, 2024
e2a9ab6
`agent`: --openai flag (auto-fetches OPENAI_API_KEY), improved logging
ochafik Oct 2, 2024
6f2191d
`agent`: remove *lots* of cruft from tool definitions derived from Fa…
ochafik Oct 2, 2024
26e76f9
`agent`: allow interactive chat by default, and don't reuse sessions
ochafik Oct 2, 2024
6b4a454
`agent`: hard-code max_results=10 in brave_search
ochafik Oct 2, 2024
fa8df0c
`agent`: drop fastify.py -> simpler serve_tools.py, and expose other …
ochafik Oct 2, 2024
ece12b0
`antiprompts`: ensure partial match is at end of string (or else serv…
ochafik Oct 3, 2024
b4fc1e8
`tool-call`: adjust triggers to most common tool call variations from…
ochafik Oct 3, 2024
da02397
`agent`: support more providers (+ extract serve_tools_inside_docker.sh)
ochafik Oct 3, 2024
366efc8
`tool-call`: fix llama 3.x tc parsing when there are spaces before "n…
ochafik Oct 3, 2024
21a3c90
`agent`: tool tweaks (remove ansi escapes from python output, update …
ochafik Oct 3, 2024
a151ddc
`agent`: handle function errors and dont' stringify str outputs
ochafik Oct 4, 2024
241acc2
`agent`: disable brave_search when BRAVE_SEARCH_API_KEY unset
ochafik Oct 7, 2024
3325069
`tool-call`: accept `{"type": "function", "name": "fn"` for llama 3.x
ochafik Oct 7, 2024
e753f15
`agent`: move openapi helpers to their own file
ochafik Oct 8, 2024
7576487
`tool-call`: fix grammar roots
ochafik Oct 22, 2024
fa8462f
fix root
ochafik Oct 22, 2024
9f5ab97
`tool-calls`: add generic tool call style as default
ochafik Oct 22, 2024
b53362a
Update test-tool-call.cpp
ochafik Oct 22, 2024
7f2429e
`tool-calls`: fix grammar regression
ochafik Oct 22, 2024
db4bf93
Merge remote-tracking branch 'origin/master' into tool-call
ochafik Oct 22, 2024
351aecb
Update llama-sampling.cpp
ochafik Oct 22, 2024
a4f12a4
`minja`: fix string subscripts, add string pipe to support Mistral-Ne…
ochafik Oct 22, 2024
fc80ad2
`tool-call`: Log tool call style name, ensure returned content not null
ochafik Oct 22, 2024
3e12b9b
`tool-calls`: basic Nemo support, default parallel to true if templat…
ochafik Oct 23, 2024
2b49440
`tool-call`: fix previous commit's parallel arg
ochafik Oct 23, 2024
5f4aef1
Merge remote-tracking branch 'origin/master' into tool-call
ochafik Oct 23, 2024
4394e1c
Update tool-call.cpp
ochafik Oct 23, 2024
414f6f1
Merge branch 'tool-call' of github.com:ochafik/llama.cpp into tool-call
ochafik Oct 23, 2024
267e630
`agent`: isolate tools container + log its outgoing HTTP & HTTPS traf…
ochafik Oct 24, 2024
f5320af
`tool-call`: return tool_call.id (required by Nemo)
ochafik Oct 24, 2024
0f5d639
`agent`: display http errors nicely
ochafik Oct 24, 2024
d338bfb
`agent`: ditch aiohttp & define REQUESTS_CA_BUNDLE to fix http proxyi…
ochafik Oct 24, 2024
c2926e4
Update README.md
ochafik Oct 24, 2024
03b8641
`agent`: fix deps + make docker compose setup easier to debug
ochafik Oct 24, 2024
0f4fc8c
`agent`: fix no-cache issue in squid for brave tool
ochafik Oct 24, 2024
5c414a3
`agent`: simplify tools setup
ochafik Oct 25, 2024
30bd00b
`agent`: fix tools setup
ochafik Oct 25, 2024
080982e
`tool-call`: test MistralNemo in forced tools server tests (w/ parall…
ochafik Oct 27, 2024
ec9f3b1
nits
ochafik Oct 27, 2024
9a86ea7
`tool-call`: slow tool call integration tests
ochafik Oct 28, 2024
c88095e
space nits
ochafik Oct 28, 2024
7fde6d0
`tool_call`: test no tool call on a real model + rename scenarios
ochafik Oct 28, 2024
dd6d024
`tool-call`: script to prefetch models used in server tests
ochafik Oct 28, 2024
168add7
Update tool_call.feature
ochafik Oct 28, 2024
ec547e4
`tool-call`: add tests: tool_call=none, parallel_tool_calls=true
ochafik Oct 28, 2024
b51c71c
`tool-call`: remove duplicate script to fetch templates
ochafik Oct 28, 2024
74d71a6
`agent`: simplify syntax (default tools to local w/ default port)
ochafik Oct 28, 2024
b825440
`tool-call`: use Q4_K_M models
ochafik Oct 28, 2024
aefac1e
`tool-call`: update scripts/fetch_server_test_models.py
ochafik Oct 28, 2024
64287a3
`tool-call`: test Hermes-3-Llama-3.1-8B
ochafik Oct 29, 2024
fa4c111
`tool-call`: use functionary-small-v3.2-Q8_0.gguf in test (Q4_K_M too…
ochafik Oct 29, 2024
773ff91
`tool-call`: force printing of lazy grammar trigger tokens to regular…
ochafik Oct 29, 2024
92c384a
nits
ochafik Oct 29, 2024
3ebdb2b
`tool-call`: support tool_use variant in llama_chat_template_from_mod…
ochafik Oct 30, 2024
35ac17f
`tool-call`: fix missing initializer errors
ochafik Oct 30, 2024
5227321
`tool-call`: when slow server tests fail, hint to run `python scripts…
ochafik Oct 30, 2024
e4d5449
`tool-calls`: test Qwen2.5-7B-Instruct-Q4_K_M.gguf
ochafik Oct 30, 2024
61655b9
Merge remote-tracking branch 'origin/master' into tool-call
ochafik Oct 31, 2024
be9de3e
Update llama-sampling.cpp
ochafik Oct 31, 2024
542853b
`tool-call`: greedy sampling in server tests + tweak prompt
ochafik Oct 31, 2024
7d9c90f
`tool-call`: nemo tweak (accept raw sql again)
ochafik Oct 31, 2024
e8d9d71
Update tool_call.feature
ochafik Oct 31, 2024
c395d48
`tool-call`: behaviour-based detection of template features
ochafik Oct 31, 2024
f5b7825
`tool-call`: code_interpreter & system + tool call support for all ji…
ochafik Oct 31, 2024
c773516
`tool-call`: don't use -fa w/ Mistral-Nemo (hard crashes?)
ochafik Oct 31, 2024
b35aa4a
`tool-call`: add LLAMA_UPDATE_GOLDENS env for test-chat-template
ochafik Oct 31, 2024
9477c54
`tool-call`: functionary-small-v3.2 test now green
ochafik Oct 31, 2024
c4a8050
Update README.md
ochafik Oct 31, 2024
f5f7475
nits
ochafik Oct 31, 2024
fe967b6
Update README.md
ochafik Oct 31, 2024
479c152
`tool-call`: fix qwen template test
ochafik Oct 31, 2024
bc52c0a
`agent`: add missing tool name in response!
ochafik Oct 31, 2024
c059aec
`agent`: memorize, search_memory (sqlite-vec + sqlite-lembed), fetch …
ochafik Nov 9, 2024
5789f69
`minja`: don't explode upon referencing a field on an array (fixes He…
ochafik Nov 9, 2024
f9b1969
Update README.md
ochafik Nov 9, 2024
adc673c
agent: add --think "tool", default to local tools endpoint, support -…
ochafik Dec 5, 2024
1afa312
Merge remote-tracking branch 'origin/master' into tool-call
ochafik Dec 6, 2024
30fbcb2
agent: more robust squid config
ochafik Dec 6, 2024
a469f53
agent: update readme
ochafik Dec 6, 2024
cbe395d
minja: remove tests (now in https://github.com/google/minja)
ochafik Dec 6, 2024
1fd5f1a
Update README.md
ochafik Dec 6, 2024
5d0033f
minja: sync @ https://github.com/google/minja/commit/916c181c0d4a6f96…
ochafik Dec 7, 2024
1f0b157
tool-call: add firefunction-v2 style
ochafik Dec 7, 2024
93a5245
tool-calls: migrate tests to pytest
ochafik Dec 10, 2024
055053c
Merge remote-tracking branch 'origin/master' into tool-call
ochafik Dec 14, 2024
1e2115f
tool-calls: shorter name: grammar_triggers
ochafik Dec 14, 2024
7bfcd0a
Merge remote-tracking branch 'origin/master' into tool-call
ochafik Dec 14, 2024
7e3feff
tool-call: stabilize server tests
ochafik Dec 15, 2024
e70ce3f
Merge remote-tracking branch 'origin/master' into tool-call
ochafik Dec 26, 2024
f0bd693
Update test-tool-call.cpp
ochafik Dec 26, 2024
f645887
Update minja.hpp https://github.com/google/minja/commit/202aa2f3de21b…
ochafik Dec 26, 2024
0e87ae2
rm trailing spaces
ochafik Dec 27, 2024
0a5d527
Update fetch_server_test_models.py
ochafik Dec 27, 2024
a2fe8a4
Fix tool-call server tests
ochafik Dec 27, 2024
523ebf8
Simplify tool call grammars when there's only 1 tool
ochafik Dec 27, 2024
abd274a
Copy minja from https://github.com/google/minja/commit/58f0ca6dd74bcb…
ochafik Dec 30, 2024
e5113e8
Add --jinja and --chat-template-file flags
ochafik Dec 30, 2024
80138d9
Add missing <optional> include
ochafik Dec 30, 2024
06b5159
Avoid print in get_hf_chat_template.py
ochafik Dec 30, 2024
ce48584
No designated initializers yet
ochafik Dec 30, 2024
389d79b
Try and work around msvc++ non-macro max resolution quirk
ochafik Dec 30, 2024
238b968
Update test_chat_completion.py
ochafik Dec 30, 2024
cb72cf1
Merge remote-tracking branch 'origin/master' into jinja
ochafik Jan 13, 2025
78861a3
Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template
ochafik Jan 13, 2025
1aac99a
Refactor test-chat-template
ochafik Jan 13, 2025
7c84ebc
Test templates w/ minja
ochafik Jan 13, 2025
18f257b
Fix deprecation
ochafik Jan 13, 2025
8dd4f33
Add --jinja to llama-run
ochafik Jan 13, 2025
c04c50e
Merge remote-tracking branch 'origin/master' into jinja
ochafik Jan 13, 2025
a6afb27
Update common_chat_format_example to use minja template wrapper
ochafik Jan 13, 2025
b4083e4
Test chat_template in e2e test
ochafik Jan 13, 2025
b7e2171
Update utils.py
ochafik Jan 13, 2025
a57bb94
Update test_chat_completion.py
ochafik Jan 13, 2025
4daae0b
Update run.cpp
ochafik Jan 13, 2025
1b3bb7e
Update arg.cpp
ochafik Jan 14, 2025
e7ff6ec
Merge branch 'jinja' into tool-call
ochafik Jan 14, 2025
7a7d6f6
Fix merge
ochafik Jan 14, 2025
e183fa9
Update test-chat-template.cpp
ochafik Jan 14, 2025
010726c
Merge remote-tracking branch 'origin/master' into tool-call
ochafik Jan 14, 2025
d47f40c
Update test-chat-template.cpp
ochafik Jan 14, 2025
3ed670b
Merge remote-tracking branch 'origin/master' into jinja
ochafik Jan 14, 2025
3c7784c
Refactor common_chat_* functions to accept minja template + use_jinja…
ochafik Jan 18, 2025
b75d062
Refactor common_chat_* functions to accept minja template + use_jinja…
ochafik Jan 18, 2025
40db789
Merge remote-tracking branch 'origin/master' into jinja
ochafik Jan 18, 2025
81c0d43
Attempt to fix linkage of LLAMA_CHATML_TEMPLATE
ochafik Jan 18, 2025
138a4ba
Merge branch 'jinja' into tool-call
ochafik Jan 18, 2025
d5fa351
Revert LLAMA_CHATML_TEMPLATE refactor
ochafik Jan 18, 2025
045edd1
Merge branch 'jinja' into tool-call
ochafik Jan 18, 2025
2ceabee
Fix fetch_server_test_models.py (avoid conv trap)
ochafik Jan 18, 2025
259d9e4
tools: greedy sampling in tests
ochafik Jan 18, 2025
acf7c24
tools: run tool call slow tests when SLOW_TESTS=1 (+ prefetch models)
ochafik Jan 18, 2025
ee1e10e
Normalize newlines in test-chat-templates for windows tests
ochafik Jan 18, 2025
e63520f
Forward decl minja::chat_template to avoid eager json dep
ochafik Jan 18, 2025
33322e8
Flush stdout in chat template before potential crash
ochafik Jan 18, 2025
5074e6f
Fix copy elision warning
ochafik Jan 18, 2025
76893f5
Merge branch 'jinja' into tool-call
ochafik Jan 18, 2025
fc60802
Rm unused optional include
ochafik Jan 18, 2025
0e74c9d
Add missing optional include to server.cpp
ochafik Jan 18, 2025
d6f058d
Merge branch 'jinja' into tool-call
ochafik Jan 18, 2025
e3c475c
Disable jinja test that has a cryptic windows failure
ochafik Jan 18, 2025
cc50356
minja: fix vigogne (https://github.com/google/minja/pull/22)
ochafik Jan 18, 2025
c207fdc
Merge branch 'jinja' into tool-call
ochafik Jan 18, 2025
0401a83
agent: add --greedy, --top-p, --top-k options
ochafik Jan 19, 2025
153e852
Apply suggestions from code review
ochafik Jan 20, 2025
db9dd0c
Finish suggested renamings
ochafik Jan 20, 2025
c9e8fdd
Move chat_templates inside server_context + remove mutex
ochafik Jan 20, 2025
8c84aef
Update --chat-template-file w/ recent change to --chat-template
ochafik Jan 20, 2025
154bfaa
Refactor chat template validation
ochafik Jan 20, 2025
099f983
Merge remote-tracking branch 'origin/master' into jinja
ochafik Jan 20, 2025
54a669e
Guard against missing eos/bos tokens (null token otherwise throws in …
ochafik Jan 20, 2025
8348c60
Warn against missing eos / bos tokens when jinja template references …
ochafik Jan 20, 2025
ee475d2
rename: common_chat_template[s]
ochafik Jan 20, 2025
8a7c89e
reinstate assert on chat_templates.template_default
ochafik Jan 20, 2025
9bab693
Merge branch 'jinja' into tool-call
ochafik Jan 20, 2025
b110374
apply renames from jinja branch
ochafik Jan 20, 2025
8347da9
Update minja to https://github.com/google/minja/commit/b8437df626ac6c…
ochafik Jan 20, 2025
7ea6a06
Merge branch 'jinja' into tool-call
ochafik Jan 20, 2025
56aa93c
fix std imports for gcc build
ochafik Jan 21, 2025
ff2cce5
Update minja to https://github.com/google/minja/pull/25
ochafik Jan 21, 2025
ba8dd66
Merge branch 'jinja' into tool-call
ochafik Jan 21, 2025
9d8ebd6
Update minja from https://github.com/google/minja/pull/27
ochafik Jan 21, 2025
c606255
Merge branch 'jinja' into tool-call
ochafik Jan 21, 2025
fec0260
Merge remote-tracking branch 'origin/master' into tool-call
ochafik Jan 21, 2025
b49d052
rm tests/test-minja from makefile
ochafik Jan 21, 2025
f6e73da
Remove examples/agent (moved to https://gist.github.com/ochafik/9246d…
ochafik Jan 21, 2025
77f4098
Delete update_jinja_goldens.py
ochafik Jan 21, 2025
dbf841b
Push laziness down to grammar impl
ochafik Jan 22, 2025
ef61a4c
minimize diffs
ochafik Jan 22, 2025
3972945
common_tool_call rename
ochafik Jan 22, 2025
d77fecc
shrink diff in json conversion code
ochafik Jan 22, 2025
5268ec8
Refactor string helpers into common
ochafik Jan 22, 2025
9e8b43f
follow enum naming style for tool call styles
ochafik Jan 22, 2025
9a5acbb
Factor string_join, string_split, string_repeat into common
ochafik Jan 22, 2025
4de5cf8
json: refactor to surface a versatile builder
ochafik Jan 22, 2025
03fe80f
drop unused fs_list_files
ochafik Jan 22, 2025
41a613b
Merge branch 'string_utils' into tool-call
ochafik Jan 22, 2025
5140d7a
Update common.cpp
ochafik Jan 22, 2025
e211629
Merge branch 'string_utils' into tool-call
ochafik Jan 22, 2025
28cac49
drop llama_sampler_accept_str
ochafik Jan 22, 2025
2dd09c7
more cleanups
ochafik Jan 22, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,11 @@ indent_style = tab
[examples/cvector-generator/*.txt]
trim_trailing_whitespace = unset
insert_final_newline = unset

[tests/chat/templates/*.jinja]
indent_style = unset
indent_size = unset
end_of_line = unset
charset = unset
trim_trailing_whitespace = unset
insert_final_newline = unset
8 changes: 8 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ TEST_TARGETS = \
tests/test-quantize-perf \
tests/test-rope \
tests/test-sampling \
tests/test-tool-call \
tests/test-tokenizer-0 \
tests/test-tokenizer-1-bpe \
tests/test-tokenizer-1-spm
Expand Down Expand Up @@ -984,6 +985,7 @@ OBJ_COMMON = \
$(DIR_COMMON)/sampling.o \
$(DIR_COMMON)/speculative.o \
$(DIR_COMMON)/build-info.o \
$(DIR_COMMON)/tool-call.o \
$(DIR_COMMON)/json-schema-to-grammar.o

OBJ_ALL = $(OBJ_GGML) $(OBJ_LLAMA) $(OBJ_COMMON)
Expand Down Expand Up @@ -1364,6 +1366,7 @@ llama-server: \
common/chat-template.hpp \
common/json.hpp \
common/minja.hpp \
common/tool-call.h \
$(OBJ_ALL)
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out %.h %.hpp $<,$^) -Iexamples/server $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS) $(LWINSOCK2)
Expand Down Expand Up @@ -1471,6 +1474,11 @@ tests/test-json-schema-to-grammar: tests/test-json-schema-to-grammar.cpp \
$(CXX) $(CXXFLAGS) -Iexamples/server -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)

tests/test-tool-call: tests/test-tool-call.cpp \
$(OBJ_ALL)
$(CXX) $(CXXFLAGS) -Iexamples/server -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)

tests/test-opt: tests/test-opt.cpp \
$(OBJ_GGML)
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
Expand Down
1 change: 1 addition & 0 deletions common/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ add_library(${TARGET} STATIC
sampling.h
speculative.cpp
speculative.h
tool-call.cpp
)

if (BUILD_SHARED_LIBS)
Expand Down
42 changes: 42 additions & 0 deletions common/common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -484,6 +484,48 @@ void string_replace_all(std::string & s, const std::string & search, const std::
s = std::move(builder);
}

std::string string_join(const std::vector<std::string> & values, const std::string & separator) {
std::ostringstream result;
for (size_t i = 0; i < values.size(); ++i) {
if (i > 0) {
result << separator;
}
result << values[i];
}
return result.str();
}

std::vector<std::string> string_split(const std::string & str, const std::string & delimiter) {
std::vector<std::string> parts;
size_t start = 0;
size_t end = str.find(delimiter);

while (end != std::string::npos) {
parts.push_back(str.substr(start, end - start));
start = end + delimiter.length();
end = str.find(delimiter, start);
}

parts.push_back(str.substr(start));

return parts;
}

std::string string_repeat(const std::string & str, size_t n) {
if (n == 0) {
return "";
}

std::string result;
result.reserve(str.length() * n);

for (size_t i = 0; i < n; ++i) {
result += str;
}

return result;
}

std::string string_from(bool value) {
return value ? "true" : "false";
}
Expand Down
6 changes: 6 additions & 0 deletions common/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,8 @@ struct common_params_sampling {
};

std::string grammar; // optional BNF-like grammar to constrain sampling
std::vector<std::string> grammar_trigger_words; // optional trigger words to enable grammar
std::vector<llama_token> grammar_trigger_tokens; // optional trigger tokens to enable grammar

std::vector<llama_logit_bias> logit_bias; // logit biases to apply

Expand Down Expand Up @@ -429,6 +431,10 @@ std::string string_format(const char * fmt, ...);
std::string string_strip(const std::string & str);
std::string string_get_sortable_timestamp();

std::string string_join(const std::vector<std::string> & values, const std::string & separator);
std::vector<std::string> string_split(const std::string & str, const std::string & delimiter);
std::string string_repeat(const std::string & str, size_t n);

void string_replace_all(std::string & s, const std::string & search, const std::string & replace);

template<class T>
Expand Down
99 changes: 36 additions & 63 deletions common/json-schema-to-grammar.cpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
#include "json-schema-to-grammar.h"
#include "common.h"

#include <algorithm>
#include <fstream>
#include <map>
Expand All @@ -11,11 +13,6 @@

using json = nlohmann::ordered_json;

template <typename Iterator>
static std::string join(Iterator begin, Iterator end, const std::string & separator);

static std::string repeat(const std::string & str, size_t n);

static std::string build_repetition(const std::string & item_rule, int min_items, int max_items, const std::string & separator_rule = "") {
auto has_max = max_items != std::numeric_limits<int>::max();

Expand Down Expand Up @@ -128,8 +125,8 @@ static void _build_min_max_int(int min_value, int max_value, std::stringstream &
if (sub_len > 0) {
auto from_sub = from.substr(i + 1);
auto to_sub = to.substr(i + 1);
auto sub_zeros = repeat("0", sub_len);
auto sub_nines = repeat("9", sub_len);
auto sub_zeros = string_repeat("0", sub_len);
auto sub_nines = string_repeat("9", sub_len);

auto to_reached = false;
out << "(";
Expand Down Expand Up @@ -188,8 +185,8 @@ static void _build_min_max_int(int min_value, int max_value, std::stringstream &
auto max_digits = max_s.length();

for (auto digits = min_digits; digits < max_digits; digits++) {
uniform_range(min_s, repeat("9", digits));
min_s = "1" + repeat("0", digits);
uniform_range(min_s, string_repeat("9", digits));
min_s = "1" + string_repeat("0", digits);
out << " | ";
}
uniform_range(min_s, max_s);
Expand Down Expand Up @@ -318,49 +315,6 @@ std::unordered_map<char, std::string> GRAMMAR_LITERAL_ESCAPES = {
std::unordered_set<char> NON_LITERAL_SET = {'|', '.', '(', ')', '[', ']', '{', '}', '*', '+', '?'};
std::unordered_set<char> ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS = {'^', '$', '.', '[', ']', '(', ')', '|', '{', '}', '*', '+', '?'};

template <typename Iterator>
std::string join(Iterator begin, Iterator end, const std::string & separator) {
std::ostringstream result;
if (begin != end) {
result << *begin;
for (Iterator it = begin + 1; it != end; ++it) {
result << separator << *it;
}
}
return result.str();
}

static std::vector<std::string> split(const std::string & str, const std::string & delimiter) {
std::vector<std::string> tokens;
size_t start = 0;
size_t end = str.find(delimiter);

while (end != std::string::npos) {
tokens.push_back(str.substr(start, end - start));
start = end + delimiter.length();
end = str.find(delimiter, start);
}

tokens.push_back(str.substr(start));

return tokens;
}

static std::string repeat(const std::string & str, size_t n) {
if (n == 0) {
return "";
}

std::string result;
result.reserve(str.length() * n);

for (size_t i = 0; i < n; ++i) {
result += str;
}

return result;
}

static std::string replacePattern(const std::string & input, const std::regex & regex, const std::function<std::string(const std::smatch &)> & replacement) {
std::smatch match;
std::string result;
Expand Down Expand Up @@ -389,6 +343,7 @@ static std::string format_literal(const std::string & literal) {

class SchemaConverter {
private:
friend std::string build_grammar(const std::function<void(const llama_grammar_builder &)> & cb);
std::function<json(const std::string &)> _fetch_json;
bool _dotall;
std::map<std::string, std::string> _rules;
Expand Down Expand Up @@ -418,7 +373,7 @@ class SchemaConverter {
for (size_t i = 0; i < alt_schemas.size(); i++) {
rules.push_back(visit(alt_schemas[i], name + (name.empty() ? "alternative-" : "-") + std::to_string(i)));
}
return join(rules.begin(), rules.end(), " | ");
return string_join(rules, " | ");
}

std::string _visit_pattern(const std::string & pattern, const std::string & name) {
Expand Down Expand Up @@ -481,7 +436,7 @@ class SchemaConverter {
for (const auto & item : ret) {
results.push_back(to_rule(item));
}
return std::make_pair(join(results.begin(), results.end(), " "), false);
return std::make_pair(string_join(results, " "), false);
};

while (i < length) {
Expand Down Expand Up @@ -539,7 +494,7 @@ class SchemaConverter {
}
curly_brackets += '}';
i++;
auto nums = split(curly_brackets.substr(1, curly_brackets.length() - 2), ",");
auto nums = string_split(curly_brackets.substr(1, curly_brackets.length() - 2), ",");
int min_times = 0;
int max_times = std::numeric_limits<int>::max();
try {
Expand Down Expand Up @@ -854,7 +809,7 @@ class SchemaConverter {
return;
}
std::string pointer = ref.substr(ref.find('#') + 1);
std::vector<std::string> tokens = split(pointer, "/");
std::vector<std::string> tokens = string_split(pointer, "/");
for (size_t i = 1; i < tokens.size(); ++i) {
std::string sel = tokens[i];
if (target.is_null() || !target.contains(sel)) {
Expand Down Expand Up @@ -905,7 +860,7 @@ class SchemaConverter {
for (const auto & v : schema["enum"]) {
enum_values.push_back(_generate_constant_rule(v));
}
return _add_rule(rule_name, "(" + join(enum_values.begin(), enum_values.end(), " | ") + ") space");
return _add_rule(rule_name, "(" + string_join(enum_values, " | ") + ") space");
} else if ((schema_type.is_null() || schema_type == "object")
&& (schema.contains("properties") ||
(schema.contains("additionalProperties") && schema["additionalProperties"] != true))) {
Expand Down Expand Up @@ -1019,10 +974,10 @@ class SchemaConverter {

void check_errors() {
if (!_errors.empty()) {
throw std::runtime_error("JSON schema conversion failed:\n" + join(_errors.begin(), _errors.end(), "\n"));
throw std::runtime_error("JSON schema conversion failed:\n" + string_join(_errors, "\n"));
}
if (!_warnings.empty()) {
fprintf(stderr, "WARNING: JSON schema conversion was incomplete: %s\n", join(_warnings.begin(), _warnings.end(), "; ").c_str());
fprintf(stderr, "WARNING: JSON schema conversion was incomplete: %s\n", string_join(_warnings, "; ").c_str());
}
}

Expand All @@ -1036,10 +991,28 @@ class SchemaConverter {
};

std::string json_schema_to_grammar(const json & schema) {
SchemaConverter converter([](const std::string &) { return json::object(); }, /* dotall= */ false);
auto copy = schema;
converter.resolve_refs(copy, "input");
converter.visit(copy, "");
return build_grammar([&](const llama_grammar_builder & callbacks) {
auto copy = schema;
callbacks.resolve_refs(copy);
callbacks.add_schema("", copy);
});
}

std::string build_grammar(const std::function<void(const llama_grammar_builder &)> & cb) {
SchemaConverter converter([&](const std::string &) { return json(); }, /* dotall= */ false);
llama_grammar_builder builder {
/* .add_rule = */ [&](const std::string & name, const std::string & rule) {
return converter._add_rule(name, rule);
},
/* .add_schema = */ [&](const std::string & name, const nlohmann::ordered_json & schema) {
return converter.visit(schema, name == "root" ? "" : name);
},
/* .resolve_refs = */ [&](nlohmann::ordered_json & schema) {
converter.resolve_refs(schema, "");
}
};
cb(builder);
converter.check_errors();
return converter.format_grammar();
}

10 changes: 9 additions & 1 deletion common/json-schema-to-grammar.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,12 @@
#define JSON_ASSERT GGML_ASSERT
#include "json.hpp"

std::string json_schema_to_grammar(const nlohmann::ordered_json& schema);
std::string json_schema_to_grammar(const nlohmann::ordered_json & schema);

struct llama_grammar_builder {
std::function<std::string(const std::string &, const std::string &)> add_rule;
std::function<std::string(const std::string &, const nlohmann::ordered_json &)> add_schema;
std::function<void(nlohmann::ordered_json &)> resolve_refs;
};

std::string build_grammar(const std::function<void(const llama_grammar_builder &)> & cb);
9 changes: 8 additions & 1 deletion common/sampling.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -151,9 +151,16 @@ struct common_sampler * common_sampler_init(const struct llama_model * model, co

lparams.no_perf = params.no_perf;

std::vector<const char *> trigger_words;
trigger_words.reserve(params.grammar_trigger_words.size());
for (const auto & str : params.grammar_trigger_words) {
trigger_words.push_back(str.c_str());
}
auto * result = new common_sampler {
/* .params = */ params,
/* .grmr = */ llama_sampler_init_grammar(vocab, params.grammar.c_str(), "root"),
/* .grmr = */ llama_sampler_init_grammar(vocab, params.grammar.c_str(), "root",
trigger_words.data(), trigger_words.size(),
params.grammar_trigger_tokens.data(), params.grammar_trigger_tokens.size()),
/* .chain = */ llama_sampler_chain_init(lparams),
/* .prev = */ ring_buffer<llama_token>(std::max(32, params.n_prev)),
/* .cur = */ {},
Expand Down
Loading
Loading