Name and Version
literally head
Device 0: NVIDIA GeForce RTX 3080, compute capability 8.6, VMM: yes
version: 6134 (be48528)
built with MSVC 19.41.34120.0 for x64
also tested at the office on ubuntu, on head
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
` -fa -ngl 999999 -ctk q4_0 -ctv q4_0 -c 128000 --jinja -m Qwen3-4B-Q5_K_S.gguf --port 2483 --slots`
Problem description & steps to reproduce
If you call chat.completions.create
on a reasoning model for example here Qwen3 with tool_choice: "required"
It will refuse to do the reasoning because of the grammar implemented.
First Bad Commit
No response
Relevant log output