Closed
Description
Name and Version
llama-cli.exe --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
version: 4831 (5e43f104)
built with MSVC 19.39.33523.0 for x64
I'm using llama server exe that I compiled 3/6/25 from the master branch, using DLLAMA_CUDA:
cmake .. -DLLAMA_CUDA=ON
cmake --build . --config Release
Operating systems
Windows
GGML backends
CUDA
Hardware
RTX 3090
Models
Qwen2.5-7B-Instruct-1M-Q4_K_M.gguf
Problem description & steps to reproduce
Ocassionally, I see a tool_call that comes back as response.choices[0].message.content, rather than as response.choices[0].message.tool_calls.
Example of the issue:
{
"role": "assistant",
"content": "<tool_call>\n{\"name\": \"aiCreatePlan\", \"arguments\": {...}}}\n</tool_call>"
}
Example code with debugger and values:
Example of a good result, with the same parameters & prompt
First Bad Commit
No response
Relevant log output
const response = await openai.chat.completions.create({
model: model.modelName,
messages: openAiMessages,
tools: aiFunctionContext.aiFunctionExecutor?.getToolsMetadata(),
stream: false,
}, { signal });
const assistantMessage = response.choices[0].message;
// Add the assistant's message to our conversation
openAiMessages.push({
role: 'assistant' as const,
content: assistantMessage.content,
tool_calls: assistantMessage.tool_calls
});
const toolCallsFromOpenAi = assistantMessage.tool_calls;