Deepseek V3.1 native tool calling support (OpenAI Style) #771

firecoperana · 2025-09-09T23:54:39Z

Enable DeepSeek V3.1 thinking mode as the default. Disable with --reasoning-budget 0.
It also implements tool calling support.
Thinking model disabled assistant prefill.

Merges ggml-org/llama.cpp#15533 and ggml-org/llama.cpp#15404

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

…(#15639) Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>

* feat: Set enable_thinking IFF not disabled and supported Branch: gabe-l-hart/thinking-model-disabled-agent-prefill Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Fix inverted logic condition for prefill error Branch: gabe-l-hart/thinking-model-disabled-agent-prefill Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Always parse the enable_thinking kwarg to overwrite the default value From what I can tell, this started as a Qwen3-specific keyword, but from the use in `chat.cpp` translates this inputs.enable_thinking to the right thinking kwarg for the given model, this is now more of a standardized kwarg, so it should always override the default value when sent as part of the chat_template_kwargs field in the API. Branch: gabe-l-hart/thinking-model-disabled-agent-prefill Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Don't limit tempalte expansion check to jinja With the use_jinja check, non-jinja models would enable thinking and always fail assistant prefill Branch: gabe-l-hart/thinking-model-disabled-agent-prefill Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Add the error text to json type errors in json_value Branch: gabe-l-hart/thinking-model-disabled-agent-prefill Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Explicitly reject string values for "enable_thinking" There are too many possible "truthy" / "falsy" strings and too many ambiguous strings that don't have a clear truthy/falsy value, so the simplest thing to do here is to reject the request. Ideally, this would be a 422 (Unprocessable Entity), but right now it's coming back as a 500. Branch: gabe-l-hart/thinking-model-disabled-agent-prefill Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * refactor: Move logic for detecting template enable_thinking support to common Branch: gabe-l-hart/thinking-model-disabled-agent-prefill Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Use raw pointer for common chat template function Branch: gabe-l-hart/thinking-model-disabled-agent-prefill Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> # Conflicts: # tools/server/server.cpp # tools/server/utils.hpp

…) (#15533) * Add DeepSeek V3.1 thinking mode support - Added COMMON_CHAT_FORMAT_DEEPSEEK_V3_1 enum value - Created common_chat_params_init_deepseek_v3_1() function (currently uses R1 implementation) - Created common_chat_parse_deepseek_v3_1() function that handles V3.1 thinking format: - Extracts reasoning content before '</think>' tag into reasoning_content - Extracts regular content after '</think>' tag into content - No opening '<think>' tag in V3.1 format - Added detection logic for V3.1 templates based on pattern: 'message['prefix'] is defined and message['prefix'] and thinking' - Added V3.1 case to parsing switch statement This addresses the issue where V3.1 outputs reasoning content followed by '</think>' and then regular content without the opening '<think>' tag. * Another attempt by V3.1 non-thinking * Fix test, but it's not asserting anything. * Ignore vim swap files in tests dir * Update the test * Try using try_find_literal instead of regex * passing test * Revert "Try using try_find_literal instead of regex" This reverts commit c50d887ec2780dd9e6b8b397e92347d3db8d5575. * Remove unnecessary change * Remove comment * Add code to handle non-thinking mode. * Try to set message['prefix'] when thinking is enabled. * This fixes reasoning, but breaks normal content. We need state in the chat parser. * DeepSeek V3.1 thinking is now the default. Disable with `--reasoning-budget 0`. * Simplify (DeepSeek V3.1 reasoning) * Fix sign inversion bug * Add some tool calling code (not working). * Tool calls working in non-reasoning mode. * Attempt a unit test for tool call parsing. * Passing test * Add tests for both happy path and broken fenced DeepSeek V3.1 tool call variants. * Passing DeepSeek V3.1 tool call tests, but model is not working. * Revert assistance response prefill change. Not my monkeys. * Add fenced_thinking unit test variant. Passes, but thinking tool calling still isn't working for some reason. * Tests pass in reasoning mode. Also e2e tool test passes. * Make a copy of the parse_json_tool_calls function for deepseek-v3.1 so as to not accidentally introduce regressions. * Fix thinking_forced_open logic. tool calling broken. Need to add another test case. * That's what I get for cargo culting a newline. * Add multi tool call test for deepseek v3.1 non-reasoning * Move test, remove .gitignore change * Place deepseek-v3.1 reasoning test directly into existing reasoning function per CISC's request. * Address whitespace CI failure. * Merge two assert_equals per CISC's request. * Add DeepSeek-V3.1 tests to tests/test-chat.cpp per CISC's request. * Merge deepseek V3.1 and regular parse_json_tool_calls() function behaviors by adding optional update_cursor argument. * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * DeepSeek V3.1 fix reasoning_format none * Strip grammar down to strictly what we expect based on model card. Throw out parts we cargo culted from R1 that don't make sense. * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * DeepSeek V3.1 - Add edge case where thinking is forced open, there is tool calling in the reasoning content, but then the model just stops the output without closing the </think> tag, so it's not a partial. In this case, use the tool call in the reasoning content. * DeepSeek V3.1 - simplify update_cursor * Update common/chat.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update common/chat.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update common/chat.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Fix indent --------- Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

ikawrakow · 2025-09-10T15:17:39Z

Can somebody test this? Thanks!

ChicoPinto70 · 2025-09-10T19:42:50Z

I've just tested it and it seems to be working fine.

My test was with the Ubergarm's DeepSeek-V3.1-smol-IQ4_KSS model in Roo Code, and, in reasoning mode, it uses the tools and it shows the thinking output in the proper frame, but:

To make it work, I must to replace the chat-template-file for the provided by the Unsloth version (I believe, in the Mainline page, they are using the Unsloth ones).
I've noticed it, sometimes, fails in the tools calling. It may be because this model doesn't support tools calling with reasoning natively or because the chat-template-file injection is not a perfect solution.

That was the command line I've used:

CUDA_VISIBLE_DEVICES="1,2,0" ./build/bin/llama-server --alias DeepSeek-V3.1-IQ4_KSS -m /home/chico/.lmstudio/models/ubergarm/DeepSeek-V3.1-GGUF/DeepSeek-V3.1-smol-IQ4_KSS-00001-of-00007.gguf -ngl 64 -c 65536 -mla 3 -fa -amb 512 -fmoe -t 28 -ctk q8_0 -ot "blk.[0-6].._exps.=CUDA1,blk.(7|8|9|10).._exps.=CUDA2,exps=CPU" --parallel 1 --numa distribute -b 512 -ub 512 -ts 1,0,0 --host 192.168.0.9 --port 1235 --jinja --chat-template-file /home/chico/ik_llama.cpp/models/templates/Unsloth-DeepSeek-V3.1.jinja --reasoning-format auto

and, that is the chat-template-file I've used: https://huggingface.co/unsloth/DeepSeek-V3.1/blob/main/chat_template.jinja

I almost forgot!!! Thanks, @firecoperana!

arichiardi · 2025-09-10T20:25:44Z

@ikawrakow I can test this as well, compiling as we speak - FWIW I am using GLM-4.5-Air with a custom chat template

arichiardi · 2025-09-10T20:37:50Z

Noticed this branch spits out

ik-llama@GLM-4.5-Air-ik[35756]: Enable thinking? 1

But my command line reads:

--chat-template-kwargs '{"enable_thinking":false}'

I tried to remove it and I still get the same. I was expecting false or 0 for what is worth but it might be unrelated.

EDIT: the patch is definitely applied as I see the correct error response when sending a payload with {"enable_thinking": "false"} (should be the literal false).

ik-llama@GLM-4.5-Air-ik[36148]: INFO [      log_server_request] request | tid="140108745404416" timestamp=1757536917 remote_addr="..." remote_port=62582 status=500 method="POST" path="/v1/chat/completions" params={}

firecoperana · 2025-09-11T00:24:21Z

Checked with mainline and it also shows Enable thinking? 1.
It does not use --chat-template-kwargs, just reasoning-budget to set this value. I will remove this to avoid confusion.

arichiardi · 2025-09-11T13:21:32Z

@firecoperana that makes me wonder where the above "enable_thinking" is actually used. I would suggest you double check if it works on your side as well.

Shouldn't "enable_thinking" and "reasoning_budget=0" do the same thing after all?

firecoperana · 2025-09-11T14:03:00Z

Yes, they do the same thing. The only caveat is that is when you have both, enable_thinking will now override reasoning_budget.

arichiardi · 2025-09-12T19:19:10Z

FWIW, this looks good here (I compiled rebasing onto main as well)

kirnat · 2025-09-23T16:48:16Z

Thanks for adding this. For some reason I can't get tool calling working properly with DeepSeek V3.1. It works fine with upstream. Could it be a chat template issue? The template ChicoPinto70 used should be the same as the one included with the Unsloth model I tested with, but I am a bit unsure if RooCode actually uses native OpenAI tool calling, I thought they used a custom XML-tag styled formatting/parsing.

Model used:
unsloth/DeepSeek-V3.1-GGUF/UD-Q3_K_XL

Notable arguments:
--jinja --reasoning-format auto

Test Case

payload = {
    "model": "model", 
    "messages": [{"role": "user", "content": "List files in /tmp"}],
    "tools": [{
        "type": "function",
        "function": {
            "name": "list_directory",
            "description": "List files in a directory",
            "parameters": {"type": "object", "properties": {"path": {"type": "string"}}}
        }
    }],
    "tool_choice": "auto"
}

Response (ik_llama.cpp `18f0435`):

"choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": "Okay, the user wants me to list the files in the /tmp directory. This is a straightforward request that requires a simple directory listing operation. \n\nI'll use the list_directory function with the path parameter set to \"/tmp\". This should return the contents of that directory. \n\nThe function is designed to handle this exact type of request, so no additional parameters or special handling is needed.",
        "content": "list_directory{\"path\": \"/tmp\"}"
      }
    }
  ]

Response (llama.cpp):

"choices": [
    {
      "finish_reason": "tool_calls",
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": "First, the user is asking to list the files in the /tmp directory. I have a function called list_directory that can handle this. The function requires a path parameter, which in this case is \"/tmp\".\n\nI need to call the list_directory function with the path set to \"/tmp\". The tool call syntax is:<\uff5ctool\u2581calls\u2581begin\uff5c><\uff5ctool\u2581call\u2581begin\uff5c>tool_name<\uff5ctool\u2581sep\uff5c>{\"arg1\": \"some_value\"}<\uff5ctool\u2581call\u2581end\uff5c><\uff5ctool\u2581calls\u2581end\uff5c> So for this, it should be:<\uff5ctool\u2581calls\u2581begin\uff5c><\uff5ctool\u2581call\u2581begin\uff5c>list_directory<\uff5ctool\u2581sep\uff5c>{\"path\": \"/tmp\"}<\uff5ctool\u2581call\u2581end\uff5c><\uff5ctool\u2581calls\u2581end\uff5c>",
        "content": "I'll list the files in the /tmp directory for you.",
        "tool_calls": [
          {
            "type": "function",
            "function": {
              "name": "list_directory",
              "arguments": "{\"path\":\"/tmp\"}"
            },
            "id": "xgYG5r2rWS2WBewIvPmYLNEuPfkTHsGC"
          }
        ]
      }
    }
  ]

firecoperana · 2025-09-23T18:13:06Z

So the reasoning works for deepseek v3.1. Can you try to send the same payload a few times? The success rate of tool calls varies by model.

kirnat · 2025-09-23T19:04:54Z

Thanks. Yes, the reasoning works great. I have run the test about 20-30 times, and the only thing I have noticed is that the LLM sometimes outputs the tool call in reasoning content and sometimes in the regular content. This particular model has been performing really well with tool calling in llama.cpp.

firecoperana · 2025-09-23T20:07:33Z

If you change tool_choice from auto to required, does it force it to generate tool call? Seems like not a parsing issue, but the model does not generate tool call content.

kirnat · 2025-09-25T18:14:04Z

If I set too_choice to required it does the same thing - output tool call in content like in my example output above. I don't know what else to test, same model works flawless in llama.cpp with the same options and after making 100's of tool calls it hasn't failed formatting even once, DeepSeek 3.1 is exceptionally good at this task, especially compared to V3 and R1.

firecoperana · 2025-09-25T19:04:45Z

Ok. Unfortunately I don't have Deepseek V3.1 at hand to test and it will be a while before I have time to try it. Hope someone who have used tool call successfully on Deepseek V3.1 can share their experience.

kirnat · 2025-09-26T08:30:56Z

Thanks a lot. I will try to do more debugging and report any potential findings.

firecoperana · 2025-09-26T13:29:27Z

#799 See if this makes any difference for you.

ExtReMLapin and others added 3 commits September 9, 2025 18:30

chat : fixed crash when Hermes 2 <tool_call> had a newline before it …

63f4032

…(#15639) Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>

firecoperana requested review from ikawrakow and saood06 September 9, 2025 23:54

firecoperana self-assigned this Sep 9, 2025

firecoperana mentioned this pull request Sep 9, 2025

Tool calls support from mainline #723

Merged

4 tasks

Remove enable thinking log

c27edd9

firecoperana mentioned this pull request Sep 11, 2025

Bug: stream response without <think> token #776

Closed

ikawrakow approved these changes Sep 13, 2025

View reviewed changes

ikawrakow merged commit 6d2e7ca into main Sep 13, 2025

binghanc mentioned this pull request Sep 24, 2025

Bug: trellis dequantization fails because of INF and NaN values #793

Closed

firecoperana mentioned this pull request Sep 26, 2025

sync: vendor #799

Merged

magikRUKKOLA mentioned this pull request Sep 28, 2025

Bug: Got exception while trying to continue the conversation in Deepseek V3.1 Terminus #804

Closed

firecoperana deleted the fcp/deepseek3.1_toolcall branch October 26, 2025 16:57

Uh oh!

Deepseek V3.1 native tool calling support (OpenAI Style) #771

Deepseek V3.1 native tool calling support (OpenAI Style) #771

Uh oh!

Conversation

firecoperana commented Sep 9, 2025

Uh oh!

ikawrakow commented Sep 10, 2025

Uh oh!

ChicoPinto70 commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arichiardi commented Sep 10, 2025

Uh oh!

arichiardi commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

firecoperana commented Sep 11, 2025

Uh oh!

arichiardi commented Sep 11, 2025

Uh oh!

firecoperana commented Sep 11, 2025

Uh oh!

arichiardi commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kirnat commented Sep 23, 2025

Test Case

Response (ik_llama.cpp 18f0435):

Response (llama.cpp):

Uh oh!

firecoperana commented Sep 23, 2025

Uh oh!

kirnat commented Sep 23, 2025

Uh oh!

firecoperana commented Sep 23, 2025

Uh oh!

kirnat commented Sep 25, 2025

Uh oh!

firecoperana commented Sep 25, 2025

Uh oh!

kirnat commented Sep 26, 2025

Uh oh!

firecoperana commented Sep 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

ChicoPinto70 commented Sep 10, 2025 •

edited

Loading

arichiardi commented Sep 10, 2025 •

edited

Loading

arichiardi commented Sep 12, 2025 •

edited

Loading

Response (ik_llama.cpp `18f0435`):