-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server
: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless --reasoning-format none
#11607
Merged
Merged
Changes from 75 commits
Commits
Show all changes
94 commits
Select commit
Hold shift + click to select a range
d3b60b8
minja: enhance backfill of templates w/o tools description (use examp…
ochafik 87de852
pass vocab to common_chat_params_init
ochafik 130ca22
DeepSeek R1: parse thoughts / return in separate field in API (non st…
ochafik 04d511b
Avoid double bos w/ jinja
ochafik 2834587
server/oai: ensure content is null when there are tool calls
ochafik c80cb30
update logs
ochafik 0871628
rename tests
ochafik 73d08d4
tool-call: allow `--jinja --chat-template chatml`
ochafik 04be723
tool-call: fix command-r7b parsing when response is multiline
ochafik ae9d581
tool-calls: add DeepSeek R1 Qwen 7B to server test_hello_world
ochafik 19bea4e
tell DS R1 not to overthink (weather test)
ochafik 5e6f2a2
add deepseek models to server tool call section in readme
ochafik 1e9acd2
tool-call: allow `--jinja --chat-template chatml`
ochafik 77ae97e
Update test_tool_call.py
ochafik a76073c
minimize diffs
ochafik cf83623
fix typo
ochafik 5d18d76
fix double bos issue (drop bos/eos tokens from jinja template)
ochafik aa98e59
fix bad merge
ochafik 2b3c482
fix build / rm diff
ochafik 4cb0e1d
Merge branch 'jinja-chatml' into r1-toolcall
ochafik b2dd490
add missing try catch around jinja parsing to default to chatml
ochafik 08271b5
Merge branch 'jinja-chatml' into r1-toolcall
ochafik df3474e
tool-calls: r1: add missing <|tool▁calls▁end|> to grammar!
ochafik c397bd1
tweak delta logic
ochafik 569610e
tool-calls: accommodate variety of wrong tool call opening tags both …
ochafik d73448d
Simplify default chatml logic
ochafik 0be7f65
Merge branch 'jinja-chatml' into r1-toolcall
ochafik 7dc271f
tool-calls: add deepseek r1 template + accommodate broken official te…
ochafik c6214ee
rm unneeded vocab
ochafik 1c302e1
simpler hacky fixes for original broken template (+ fix minja example…
ochafik 108da90
sync: minja https://github.com/google/minja/pull/46
ochafik bc6d910
Merge branch 'master' into r1-toolcall
ochafik 11c1f0c
actually we want eos_token in the template to infer tool call example…
ochafik 30ea359
update to minja's new api
ochafik bbd45bf
sync: minja
ochafik bff549d
simplify hack to fix original template's backfill from minja
ochafik ce28224
tool-call: r1: add one more trigger approx "<|tool calls begin|>"
ochafik e84ee88
r1: fix inadvertent newline in grammar before <|tool▁call▁end|>
ochafik 18a11f4
tool-call: r1: fix grammar
ochafik 9a6847c
move trigger_words init inside non-llguidance branch
ochafik a682d12
fix / test parsing of r1 parser
ochafik f0154a6
Fix / test models/templates/llama-cpp-deepseek-r1.jinja
ochafik 326e700
update test_calc_result
ochafik 78b47bb
fix test_calc_result
ochafik 86994db
fix spaces
ochafik 09caa63
`sync`: minja
ochafik b152729
Update test-chat.cpp
ochafik 56a14dd
fix mistral chat test: need empty tokens
ochafik f12e350
Update chat.cpp
ochafik d43e4f6
Merge branch 'sync-minja-4' into r1-toolcall
ochafik 812544a
server: check that content is null when we get tool_calls
ochafik d44eb95
tool-call: ensure we don't return content when there are tool calls /…
ochafik b6e14a4
fix mistral expectation
ochafik 1f5ec59
ensure deepseek r1 thoughts parsed even w/o tool calls
ochafik 438ce0b
fix test-chat
ochafik 21f2071
Update chat.cpp
ochafik b5b117f
Merge branch 'sync-minja-4' into r1-toolcall
ochafik 0db9881
Fix r1 grammar since we made <|tool▁calls▁begin|> optional (triggerin…
ochafik d1b6691
r1: revert making <|tool▁calls▁begin|> optional as somehow sampling t…
ochafik 39c1d81
return thoughts in reasoning_content field
ochafik b2d1728
update readme section about common model tool call formats
ochafik 933f7a1
Merge branch 'master' into r1-toolcall
ochafik 5d60ceb
Update test_tool_call.py
ochafik 1f1f06a
Merge branch 'master' into r1-toolcall
ochafik 9d7c3cc
--think to force any model to return reasoning_content (or just parse…
ochafik d20c2ce
Merge branch 'r1-toolcall' of github.com:ochafik/llama.cpp into r1-to…
ochafik f3e9f8b
fix test_thoughts
ochafik 3841a16
fix compiler warning about parens
ochafik e6d9b52
align Command R7B w/ --think / reasoning_content behaviour
ochafik 39b50c3
Update README.md
ochafik 0917e0a
fix --think arg env
ochafik 098629d
disable some failing chatml tests
ochafik 33efcb3
Update README.md
ochafik 994301d
use existing string_strip
ochafik d1a0640
revert tool example backfill change - command 7rb just needs the righ…
ochafik cc2c712
Merge remote-tracking branch 'origin/master' into r1-toolcall
ochafik c0f972b
Use --reasoning-format, remove forced thinking for now
ochafik af63886
return reasoning_content before content
ochafik a59fde2
update model template / format mapping
ochafik b829cab
fix test-chat
ochafik 95cddfd
rm thoughts from generic parser
ochafik e598e7a
sync: minja (https://github.com/google/minja/pull/52)
ochafik 91542ca
tool-calls: allow r1 output to miss <think> opening tag (since latest…
ochafik 8d82be9
sync: minja (https://github.com/ggerganov/llama.cpp/pull/11774)
ochafik 30dcfaa
rm wrong warning in command-r parser (when normal text)
ochafik e1bff8f
update deepseek r1 templates (+ put update commands in ./scripts/get_…
ochafik a29dc92
fix server test_tool_calls.py
ochafik ea2f41e
add models/templates/README.md
ochafik 8409bf1
fix test_calc_result & test_thoughts
ochafik 01db429
fix test-chat (update delta to latest r1 template change)
ochafik 37a4bb2
Merge remote-tracking branch 'origin/master' into r1-toolcall
ochafik d52579a
prefer json::at to operator[] in chat.cpp
ochafik 4700245
Merge remote-tracking branch 'origin/master' into r1-toolcall
ochafik 043cb99
Apply suggestions from code review
ochafik File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the
--think
flag is not very intuitive, it should be something like--reasoning-format
, but personally I still prefer to do it per-request basis.Also, for future-proof, we should make this flag accepts a param. For example,
--reasoning-format deepseek
will return it asreasoning_content
. Again, this is because we are pretty sure that openai will break the whole thing in the near future.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a usability standpoint,
--think
feels a bit more intuitive / memorable to me about what it does.Other alternatives might be
--separate-thinking
or--extract-reasoning
or--format-reasoning
or...?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, what non-intuitive about it is that the model will still think even with --think not being added. This flag is supposed to force the model to start the reasoning process, but not to enable/disable it completely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tbh I'm not really a fan of a query parameter yet, mostly because:
Re/ flags, I think there may be room for two: one that controls the reasoning behaviour (extract, leave inline, force), and one for the return format. For the first one, how about:
--reasoning=extract
→ Parses DeepSeek R1 & Command R7B thoughts (default)--reasoning=inline
→ Leaves thoughts inline in the content (format is model-specific)--reasoning=force
→ Coerces non-thinking models to think (edit: maybeforce-experimental
for now)As for the format, there are already two APIs out there:
reasoning_content
I'd favour just sticking to
reasoning_content
for now until OpenAI announces their own format (and when they do, default to OpenAI's format and offer--reasoning-format=deepseek
for backwards compatibility). OR decide to create our own format now / default--reasoning-format=llama.cpp
that returns thoughts in themessage.thoughts
field, for instance.WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems ok for me, but eventually I think someone will gonna add this as a per-request param.
Re. having 2 flags for behavior / format, this seems more reasonable for me. From a functional programming perspective, it can be expressed as
response = format(behavior(original_generated_content))
But I think your idea is still mixed between these 2 layers.
For
--reasoning extract|inline|force
:force
is supposed to do, but seems like it needs some prompt engineering so I think you should consider that, maintaining prompts can be a burdeninline
does that means reasoning can appear middle of generation? Example,content..think..content..think
. Please lmk if I understand correctly.For
--reasoning-format
I don't get why we want to invent a new--reasoning-format llama.cpp
that put things insidemessage.thoughts
. IMO we should keep thing simple until openai drop their format. Probably we can have--reasoning-format deepseek|none
and setdeepseek
as the default for now, then change the default tooai
once we have that openai formatThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO if we want
--reasoning
to control the behavior, then it should affect the generation phrase (for example, control grammar/logits bias). So, it should 3 values:enabled
: the model behaves as usualdisabled
: we never allow model to use<think>
token ==> control via logits biasforce
: force the model to think, maybe use grammar or prompt engineering?Then for the
--reasoning-format
, it should only "transform" the result into the desired format (a.k.a a pure function), we can have 3 values:deepseek
: put content insidereasoning_content
none
: do not format, simply forward all the generated tokens to useroai
can be added in the futureThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngxson Makes sense, removed the forced thinking from this PR / will explore again as follow up (also, see more of a case of this option as a query param, while reasoning-format now has stronger flag vibes)
Good point. In earlier experimentations I tried controlling the entire tool call process (even grammar generation) from Jinja templates, might play with this again.
Updated the code (defaulting to deepseek), thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I’ve updated the code to the latest DeepSeek template changes (they added a trailing
<think>
😅; updated minja accordingly: #11774 (comment) )There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ngxson, any more concerns / suggestions about this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm a bit busy recently, will do a review later today or tomorrow