server : Fixed canceling pending tasks #16467

issixx · 2025-10-07T18:07:57Z

Description

There’s an issue in the condition used when canceling pending tasks in llama-server.
Because of this, when there are two or more queued tasks, pending tasks cannot be canceled properly.
(Only the currently running task can be canceled.)

Steps to reproduce

You can easily reproduce this by repeatedly sending and canceling heavy requests:

Send a prompt to llama_decode that takes several seconds to process.
Quickly repeat sending and canceling the request several times.
After llama_decode finishes, the active task is canceled as expected, but pending tasks remain uncanceled and continue to run.

This issue is especially critical for use cases such as code completion, where requests are sent and canceled rapidly.
Thank you for your review and consideration.

* master: (113 commits) webui: updated the chat service to only include max_tokens in the req… (ggml-org#16489) cpu : optimize the ggml NORM operation (ggml-org#15953) server : host-memory prompt caching (ggml-org#16391) No markdown in cot (ggml-org#16483) model-conversion : add support for SentenceTransformers (ggml-org#16387) ci: add ARM64 Kleidiai build and test support (ggml-org#16462) CANN: Improve ACL graph matching (ggml-org#16166) kleidiai: kernel interface refactoring (ggml-org#16460) [SYCL] refactor soft_max, add soft_max_back (ggml-org#16472) model: EmbeddingGemma Adding Support for SentenceTransformers Dense Modules (ggml-org#16367) refactor: centralize CoT parsing in backend for streaming mode (ggml-org#16394) Disable CUDA host buffers on integrated GPUs (ggml-org#16308) server : fix cancel pending task (ggml-org#16467) metal : mark FA blocks (ggml-org#16372) server : improve context checkpoint logic (ggml-org#16440) ggml webgpu: profiling, CI updates, reworking of command submission (ggml-org#16452) llama : support LiquidAI LFM2-MoE hybrid model (ggml-org#16464) server : add `/v1/health` endpoint (ggml-org#16461) webui : added download action (ggml-org#13552) (ggml-org#16282) presets : fix pooling param for embedding models (ggml-org#16455) ...

fix cancel pending task

cdc9cdc

issixx requested review from ggerganov and ngxson as code owners October 7, 2025 18:07

github-actions bot added examples server labels Oct 7, 2025

ggerganov approved these changes Oct 8, 2025

View reviewed changes

ggerganov merged commit d2ee056 into ggml-org:master Oct 8, 2025
63 of 66 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server : Fixed canceling pending tasks #16467

server : Fixed canceling pending tasks #16467

Uh oh!

issixx commented Oct 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

server : Fixed canceling pending tasks #16467

server : Fixed canceling pending tasks #16467

Uh oh!

Conversation

issixx commented Oct 7, 2025

Description

Steps to reproduce

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants