Skip to content

Eval bug: Llama server <tool_call> is occasionally not parsed as json, and is in content rather than tool_calls #12256

Closed
@jasonmcaffee

Description

@jasonmcaffee

Name and Version

llama-cli.exe --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
version: 4831 (5e43f104)
built with MSVC 19.39.33523.0 for x64

I'm using llama server exe that I compiled 3/6/25 from the master branch, using DLLAMA_CUDA:

cmake .. -DLLAMA_CUDA=ON

cmake --build . --config Release

Operating systems

Windows

GGML backends

CUDA

Hardware

RTX 3090

Models

Qwen2.5-7B-Instruct-1M-Q4_K_M.gguf

Problem description & steps to reproduce

Ocassionally, I see a tool_call that comes back as response.choices[0].message.content, rather than as response.choices[0].message.tool_calls.

Example of the issue:

{
  "role": "assistant",
  "content": "<tool_call>\n{\"name\": \"aiCreatePlan\", \"arguments\": {...}}}\n</tool_call>"
}

Example code with debugger and values:

Image

Example of a good result, with the same parameters & prompt

Image

First Bad Commit

No response

Relevant log output

const response = await openai.chat.completions.create({
        model: model.modelName,
        messages: openAiMessages,
        tools: aiFunctionContext.aiFunctionExecutor?.getToolsMetadata(),
        stream: false,
      }, { signal });

      const assistantMessage = response.choices[0].message;

      // Add the assistant's message to our conversation
      openAiMessages.push({
        role: 'assistant' as const,
        content: assistantMessage.content,
        tool_calls: assistantMessage.tool_calls
      });
      const toolCallsFromOpenAi = assistantMessage.tool_calls;

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions