-
Couldn't load subscription status.
- Fork 155
Deepseek V3.1 native tool calling support (OpenAI Style) #771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…(#15639) Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
* feat: Set enable_thinking IFF not disabled and supported Branch: gabe-l-hart/thinking-model-disabled-agent-prefill Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Fix inverted logic condition for prefill error Branch: gabe-l-hart/thinking-model-disabled-agent-prefill Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Always parse the enable_thinking kwarg to overwrite the default value From what I can tell, this started as a Qwen3-specific keyword, but from the use in `chat.cpp` translates this inputs.enable_thinking to the right thinking kwarg for the given model, this is now more of a standardized kwarg, so it should always override the default value when sent as part of the chat_template_kwargs field in the API. Branch: gabe-l-hart/thinking-model-disabled-agent-prefill Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Don't limit tempalte expansion check to jinja With the use_jinja check, non-jinja models would enable thinking and always fail assistant prefill Branch: gabe-l-hart/thinking-model-disabled-agent-prefill Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Add the error text to json type errors in json_value Branch: gabe-l-hart/thinking-model-disabled-agent-prefill Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Explicitly reject string values for "enable_thinking" There are too many possible "truthy" / "falsy" strings and too many ambiguous strings that don't have a clear truthy/falsy value, so the simplest thing to do here is to reject the request. Ideally, this would be a 422 (Unprocessable Entity), but right now it's coming back as a 500. Branch: gabe-l-hart/thinking-model-disabled-agent-prefill Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * refactor: Move logic for detecting template enable_thinking support to common Branch: gabe-l-hart/thinking-model-disabled-agent-prefill Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Use raw pointer for common chat template function Branch: gabe-l-hart/thinking-model-disabled-agent-prefill Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> # Conflicts: # tools/server/server.cpp # tools/server/utils.hpp
…) (#15533) * Add DeepSeek V3.1 thinking mode support - Added COMMON_CHAT_FORMAT_DEEPSEEK_V3_1 enum value - Created common_chat_params_init_deepseek_v3_1() function (currently uses R1 implementation) - Created common_chat_parse_deepseek_v3_1() function that handles V3.1 thinking format: - Extracts reasoning content before '</think>' tag into reasoning_content - Extracts regular content after '</think>' tag into content - No opening '<think>' tag in V3.1 format - Added detection logic for V3.1 templates based on pattern: 'message['prefix'] is defined and message['prefix'] and thinking' - Added V3.1 case to parsing switch statement This addresses the issue where V3.1 outputs reasoning content followed by '</think>' and then regular content without the opening '<think>' tag. * Another attempt by V3.1 non-thinking * Fix test, but it's not asserting anything. * Ignore vim swap files in tests dir * Update the test * Try using try_find_literal instead of regex * passing test * Revert "Try using try_find_literal instead of regex" This reverts commit c50d887ec2780dd9e6b8b397e92347d3db8d5575. * Remove unnecessary change * Remove comment * Add code to handle non-thinking mode. * Try to set message['prefix'] when thinking is enabled. * This fixes reasoning, but breaks normal content. We need state in the chat parser. * DeepSeek V3.1 thinking is now the default. Disable with `--reasoning-budget 0`. * Simplify (DeepSeek V3.1 reasoning) * Fix sign inversion bug * Add some tool calling code (not working). * Tool calls working in non-reasoning mode. * Attempt a unit test for tool call parsing. * Passing test * Add tests for both happy path and broken fenced DeepSeek V3.1 tool call variants. * Passing DeepSeek V3.1 tool call tests, but model is not working. * Revert assistance response prefill change. Not my monkeys. * Add fenced_thinking unit test variant. Passes, but thinking tool calling still isn't working for some reason. * Tests pass in reasoning mode. Also e2e tool test passes. * Make a copy of the parse_json_tool_calls function for deepseek-v3.1 so as to not accidentally introduce regressions. * Fix thinking_forced_open logic. tool calling broken. Need to add another test case. * That's what I get for cargo culting a newline. * Add multi tool call test for deepseek v3.1 non-reasoning * Move test, remove .gitignore change * Place deepseek-v3.1 reasoning test directly into existing reasoning function per CISC's request. * Address whitespace CI failure. * Merge two assert_equals per CISC's request. * Add DeepSeek-V3.1 tests to tests/test-chat.cpp per CISC's request. * Merge deepseek V3.1 and regular parse_json_tool_calls() function behaviors by adding optional update_cursor argument. * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * DeepSeek V3.1 fix reasoning_format none * Strip grammar down to strictly what we expect based on model card. Throw out parts we cargo culted from R1 that don't make sense. * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * DeepSeek V3.1 - Add edge case where thinking is forced open, there is tool calling in the reasoning content, but then the model just stops the output without closing the </think> tag, so it's not a partial. In this case, use the tool call in the reasoning content. * DeepSeek V3.1 - simplify update_cursor * Update common/chat.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update common/chat.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update common/chat.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Fix indent --------- Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
Can somebody test this? Thanks! |
|
I've just tested it and it seems to be working fine. My test was with the Ubergarm's DeepSeek-V3.1-smol-IQ4_KSS model in Roo Code, and, in reasoning mode, it uses the tools and it shows the thinking output in the proper frame, but:
That was the command line I've used:
and, that is the chat-template-file I've used: https://huggingface.co/unsloth/DeepSeek-V3.1/blob/main/chat_template.jinja I almost forgot!!! Thanks, @firecoperana! |
|
@ikawrakow I can test this as well, compiling as we speak - FWIW I am using |
|
Noticed this branch spits out
But my command line reads: I tried to remove it and I still get the same. I was expecting false or 0 for what is worth but it might be unrelated. EDIT: the patch is definitely applied as I see the correct error response when sending a payload with |
|
Checked with mainline and it also shows Enable thinking? 1. |
|
@firecoperana that makes me wonder where the above Shouldn't |
|
Yes, they do the same thing. The only caveat is that is when you have both, enable_thinking will now override reasoning_budget. |
|
FWIW, this looks good here (I compiled rebasing onto |
|
Thanks for adding this. For some reason I can't get tool calling working properly with DeepSeek V3.1. It works fine with upstream. Could it be a chat template issue? The template ChicoPinto70 used should be the same as the one included with the Unsloth model I tested with, but I am a bit unsure if RooCode actually uses native OpenAI tool calling, I thought they used a custom XML-tag styled formatting/parsing. Model used: Notable arguments: Test Casepayload = {
"model": "model",
"messages": [{"role": "user", "content": "List files in /tmp"}],
"tools": [{
"type": "function",
"function": {
"name": "list_directory",
"description": "List files in a directory",
"parameters": {"type": "object", "properties": {"path": {"type": "string"}}}
}
}],
"tool_choice": "auto"
}Response (ik_llama.cpp 18f0435):Response (llama.cpp): |
|
So the reasoning works for deepseek v3.1. Can you try to send the same payload a few times? The success rate of tool calls varies by model. |
|
Thanks. Yes, the reasoning works great. I have run the test about 20-30 times, and the only thing I have noticed is that the LLM sometimes outputs the tool call in reasoning content and sometimes in the regular content. This particular model has been performing really well with tool calling in llama.cpp. |
|
If you change |
|
If I set too_choice to required it does the same thing - output tool call in content like in my example output above. I don't know what else to test, same model works flawless in llama.cpp with the same options and after making 100's of tool calls it hasn't failed formatting even once, DeepSeek 3.1 is exceptionally good at this task, especially compared to V3 and R1. |
|
Ok. Unfortunately I don't have Deepseek V3.1 at hand to test and it will be a while before I have time to try it. Hope someone who have used tool call successfully on Deepseek V3.1 can share their experience. |
|
Thanks a lot. I will try to do more debugging and report any potential findings. |
|
#799 See if this makes any difference for you. |
Enable DeepSeek V3.1 thinking mode as the default. Disable with --reasoning-budget 0.
It also implements tool calling support.
Thinking model disabled assistant prefill.
Merges ggml-org/llama.cpp#15533 and ggml-org/llama.cpp#15404