Skip to content

Eval bug: Qwen models lost ability to think #14147

Closed
@pwilkin

Description

@pwilkin

Name and Version

(dev-venv) ilintar@LinuksowaJaskinia:/devel/alt/llama-runner$ llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: yes
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3080, compute capability 8.6, VMM: yes
version: 5646 (7d51644)
built with cc (Ubuntu 14.2.0-19ubuntu2) 14.2.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

i7-9700K CPU + GF 3080 10GB VRAM

Models

Qwen3-8B@Q5_K_M, Qwen3-30B-A3B@Q4_K_XL

Problem description & steps to reproduce

At some point yesterday/today, the Qwen models stopped thinking by default. Even if I use /think in the prompt, they still refuse to think. I'm not setting any new options in the server, specifically not using --reasoning-budget 0. This still worked correctly 2 days ago, if no commit comes to mind, I can try to work to narrow down the exact commit.

First Bad Commit

No response

Relevant log output

llama-server --model /mnt/win/k/models/unsloth/Qwen3-30B-A3B-GGUF/Qwen3-30B-A3B-UD-Q4_K_XL.gguf --host 127.0.0.1 --ctx-size 15000 --gpu-layers 99 --cache-type-k f16 --cache-type-v q4_0 --flash-attn --min-p 0 --top-p 0.9 --top-k 20 --temp 0.6 --threads 4 -ot exps=CPU --jinja

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions