[Core] Fix abrupt request abort #18485

NickLucche · 2025-05-21T13:08:46Z

There's currently an issue when a KV Connector is in use and requests are aborted before they have been scheduled (and so before any blocks have been allocated in the kv cache manager). The scheduler _connector_finished method is invoked and it calls kv_cache_manager.get_block_ids(request_id) which raises an exception because the kv cache manager doesn't recognize the request.

We do still want to invoke the connector in this case, so that it can perform external cleanup if needed, but it makes sense to pass it empty blocks.

It's not always 100% guaranteed due to timings, but this setup makes it pretty reproducible:

# start P
VLLM_NIXL_SIDE_CHANNEL_PORT=$5557 CUDA_VISIBLE_DEVICES=0 vllm serve Qwen/Qwen3-0.6B --port 8100 --enforce-eager --disable-log-requests --tensor-parallel-size 1 --gpu-memory-utilization 0.5 --trust-remote-code --max-model-len 128 --max-num-seqs 1 --max-num-batched-tokens 128 --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both"}'

# start D
VLLM_NIXL_SIDE_CHANNEL_PORT=5558 CUDA_VISIBLE_DEVICES=1 vllm serve Qwen/Qwen3-0.6B --port 8200 --enforce-eager --disable-log-requests --tensor-parallel-size 1 --gpu-memory-utilization 0.5 --trust-remote-code --max-model-len 128 --max-num-seqs 1  --max-num-batched-tokens 128 --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both"}'

# start proxy server
python tests/v1/kv_connector/nixl_integration/toy_proxy_server.py --port 8192 \
      --prefiller-port 8100 \
      --decoder-port 8200
...
# send some requests and abort some of those
python simulate_abort.py

simulate_abort.py

import asyncio
import httpx


API_URL = "http://localhost:40876/v1/completions" 
# API_URL = "http://localhost:8000/v1/completions" 
NUM_REQUESTS = 10
ABORT_INDICES = {1, 9} 

async def send_request(i, abort=False):
    async with httpx.AsyncClient(timeout=None) as client:
        task = asyncio.create_task(client.post(
            API_URL,
            json={
                "model": "Qwen/Qwen3-0.6B",
                "prompt": f"This is test request {i}. what do you say",
                "max_tokens": 100,
            },
            timeout=None
        ))

        if abort:
            # await asyncio.sleep(0.2 + random.random() * 0.3)  
            await asyncio.sleep(0.5)
            task.cancel()
            try:
                await task
            except asyncio.CancelledError:
                print(f"Request {i} aborted!!\n")
        else:
            try:
                response = await task
                print(f"Request {i} completed: {response.status_code}")
            except Exception as e:
                print(f"Request {i} failed: {e}")

async def main():
    tasks = [
        send_request(i, abort=(i in ABORT_INDICES))
        for i in range(NUM_REQUESTS)
    ]
    await asyncio.gather(*tasks, return_exceptions=True)

asyncio.run(main())

Client gets 500, D crashes with logs:

DEBUG 05-21 12:59:23 [loggers.py:116] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
ERROR 05-21 12:59:23 [core.py:495] EngineCore encountered a fatal error.
ERROR 05-21 12:59:23 [core.py:495] Traceback (most recent call last):
ERROR 05-21 12:59:23 [core.py:495]   File "/home/nicolo/vllmd/vllm/vllm/v1/engine/core.py", line 486, in run_engine_core
ERROR 05-21 12:59:23 [core.py:495]     engine_core.run_busy_loop()
ERROR 05-21 12:59:23 [core.py:495]   File "/home/nicolo/vllmd/vllm/vllm/v1/engine/core.py", line 511, in run_busy_loop
ERROR 05-21 12:59:23 [core.py:495]     self._process_input_queue()
ERROR 05-21 12:59:23 [core.py:495]   File "/home/nicolo/vllmd/vllm/vllm/v1/engine/core.py", line 532, in _process_input_queue
ERROR 05-21 12:59:23 [core.py:495]     self._handle_client_request(*req)
ERROR 05-21 12:59:23 [core.py:495]   File "/home/nicolo/vllmd/vllm/vllm/v1/engine/core.py", line 550, in _handle_client_request
ERROR 05-21 12:59:23 [core.py:495]     self.abort_requests(request)
ERROR 05-21 12:59:23 [core.py:495]   File "/home/nicolo/vllmd/vllm/vllm/v1/engine/core.py", line 202, in abort_requests
ERROR 05-21 12:59:23 [core.py:495]     self.scheduler.finish_requests(request_ids,
ERROR 05-21 12:59:23 [core.py:495]   File "/home/nicolo/vllmd/vllm/vllm/v1/core/sched/scheduler.py", line 866, in finish_requests
ERROR 05-21 12:59:23 [core.py:495]     self._free_request(request)
ERROR 05-21 12:59:23 [core.py:495]   File "/home/nicolo/vllmd/vllm/vllm/v1/core/sched/scheduler.py", line 872, in _free_request
ERROR 05-21 12:59:23 [core.py:495]     delay_free_blocks, kv_xfer_params = self._connector_finished(request)
ERROR 05-21 12:59:23 [core.py:495]                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-21 12:59:23 [core.py:495]   File "/home/nicolo/vllmd/vllm/vllm/v1/core/sched/scheduler.py", line 952, in _connector_finished
ERROR 05-21 12:59:23 [core.py:495]     block_ids = self.kv_cache_manager.get_block_ids(request.request_id)[0]
ERROR 05-21 12:59:23 [core.py:495]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-21 12:59:23 [core.py:495]   File "/home/nicolo/vllmd/vllm/vllm/v1/core/kv_cache_manager.py", line 369, in get_block_ids
ERROR 05-21 12:59:23 [core.py:495]     assert request_id in self.single_type_manager.req_to_blocks
ERROR 05-21 12:59:23 [core.py:495]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-21 12:59:23 [core.py:495] AssertionError

The following is a straightforward fix to the issue, let me know if you see a better solution to this.

github-actions · 2025-05-21T13:08:56Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mergify · 2025-05-21T17:26:18Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @NickLucche.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2025-05-23T16:58:49Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @NickLucche.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

heheda12345 · 2025-05-29T13:16:17Z

@NickLucche Does #18829 fix this bug?

NickLucche · 2025-05-29T16:24:13Z

Quickly tried to run it with my setup but it's crashing.
Is it working for you with the snippet I posted?

heheda12345 · 2025-05-30T05:19:29Z

I didn't try. I'm asking because #18829 (comment) by @Abatom says that PR can fix this problem.

vllm/v1/core/sched/scheduler.py

NickLucche · 2025-06-02T08:07:01Z

I will address asap to try and cover your PD separation scheme

Abatom · 2025-06-02T08:23:17Z

I will address asap to try and cover your PD separation scheme

Thanks

NickLucche · 2025-06-03T15:33:12Z

I've addressed the PR so that it should be compatible with your connector as well as latest changes to get_block_ids logic. Thanks for reviewing @Abatom!

njhill · 2025-06-05T01:32:21Z

@NickLucche I don't think this has anything to do with the connector really or P/D lifecycle, it's a general problem related to aborting a request before it's been scheduled. It's just we only hit that path when a connector is in use.

I actually agree with @Abatom and I think it would make sense to change the KVCacheManager get_block_ids() method to return empty blocks if the request isn't recognized. I think this is reasonable because this just means there are no blocks allocated for this request, so returning zero blocks is logically "correct".

I have pushed my own version of this here: main...njhill:vllm:fix-pd-abort (also moved the single cache group assertion to the constructor)

NickLucche · 2025-06-05T15:25:52Z

I think this is reasonable because this just means there are no blocks allocated for this request, so returning zero blocks is logically "correct"

I am still not 100% sure that changing the interface from a .get(k) to a .get_with_default(k) is always going to be safe, given that we won't raise on possible buggy states where a request slips in the loop.
But I am ok with just getting this fixed after all.

And yeah this has nothing to do with pd my bad, it just pops up more easily given the higher latency.

njhill · 2025-06-05T16:58:25Z

Thanks @NickLucche, I pushed my update as you agreed to offline :)

I've also now opened a related PR #19223 which I think helps shrink our P/D abort race condition window, which relies on the behaviour we've kept here to always notify the connector when requests are finished (it sounds similar to what @Abatom needs it for).

njhill · 2025-06-05T17:28:59Z

I will add @Abatom as co-author too since this is similar to his other PR #18829.

njhill

Thanks @NickLucche!

NickLucche · 2025-06-05T17:31:26Z

I will add @Abatom as co-author too since this is similar to his other PR #18829.

Absolutely, thanks for your work @njhill !

Signed-off-by: nicklucche <nlucches@redhat.com>

Signed-off-by: Nick Hill <nhill@redhat.com>

NickLucche · 2025-06-06T09:36:17Z

I've rebased in light of the recent hybrid allocator changes, ptal when you get the chance @njhill

Signed-off-by: NickLucche <nlucches@redhat.com>

NickLucche · 2025-06-06T12:47:12Z

This test tests/v1/kv_connector/unit/test_multi_connector.py::test_multi_shared_storage_connector_consistency got broken on main so CI will report a failure here too, looking into it.

njhill · 2025-06-06T19:19:04Z

vllm/v1/core/kv_cache_coordinator.py

            manager.req_to_blocks[request_id]
            for manager in self.single_type_managers
+            if request_id in manager.req_to_blocks


@NickLucche I think we'll want to actually return a list of empty lists in this case. I didn't update it yet since we might as well do it after rebasing on the other fix.

Suggested change

manager.req_to_blocks[request_id]

for manager in self.single_type_managers

if request_id in manager.req_to_blocks

manager.req_to_blocks.get(request_id) or []

for manager in self.single_type_managers

Signed-off-by: Nick Hill <nhill@redhat.com>

NickLucche requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners May 21, 2025 13:08

mergify bot added the v1 label May 21, 2025

mergify bot added the needs-rebase label May 21, 2025

NickLucche force-pushed the fix-pd-abort branch from ddf5fe1 to b27037e Compare May 22, 2025 07:42

mergify bot removed the needs-rebase label May 22, 2025

mergify bot added the needs-rebase label May 23, 2025

njhill mentioned this pull request May 23, 2025

[Bugfix][Nixl] Fix Preemption Bug #18631

Merged

NickLucche mentioned this pull request May 28, 2025

[P/D] Heterogeneous TP #18833

Merged

Abatom mentioned this pull request May 29, 2025

[Bugfix] AssertionError when releasing waiting requests without KV cache allocation #18829

Closed

Abatom reviewed May 30, 2025

View reviewed changes

vllm/v1/core/sched/scheduler.py Outdated Show resolved Hide resolved

Abatom reviewed May 30, 2025

View reviewed changes

vllm/v1/core/sched/scheduler.py Outdated Show resolved Hide resolved

Abatom reviewed May 30, 2025

View reviewed changes

vllm/v1/core/sched/scheduler.py Outdated Show resolved Hide resolved

Abatom reviewed May 30, 2025

View reviewed changes

vllm/v1/core/sched/scheduler.py Outdated Show resolved Hide resolved

NickLucche force-pushed the fix-pd-abort branch from b01aae2 to 1f33427 Compare June 3, 2025 15:32

mergify bot removed the needs-rebase label Jun 3, 2025

NickLucche requested a review from Abatom June 3, 2025 15:35

Abatom mentioned this pull request Jun 4, 2025

[V1][P/D] An native implementation of xPyD based on P2P NCCL #18242

Merged

8 tasks

NickLucche changed the title ~~[P/D][Core] Fix abrupt request abort~~ [Core] Fix abrupt request abort Jun 5, 2025

njhill mentioned this pull request Jun 5, 2025

[P/D] Avoid stranding blocks in P when aborted in D's waiting queue #19223

Merged

njhill approved these changes Jun 5, 2025

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 5, 2025

NickLucche and others added 4 commits June 6, 2025 08:13

fix req abort

b962e3e

Signed-off-by: nicklucche <nlucches@redhat.com>

empty block_ids

7cb6c47

Signed-off-by: nicklucche <nlucches@redhat.com>

new block_ids logic

12a2c06

Signed-off-by: nicklucche <nlucches@redhat.com>

have KVCacheManager return empty blocks for nonexistent requests

e00cb76

Signed-off-by: Nick Hill <nhill@redhat.com>

NickLucche force-pushed the fix-pd-abort branch from f9178dd to e00cb76 Compare June 6, 2025 09:33

space

65d3f14

Signed-off-by: NickLucche <nlucches@redhat.com>

njhill reviewed Jun 6, 2025

View reviewed changes

njhill added 2 commits June 6, 2025 13:20

Merge remote-tracking branch 'origin/main' into fix-pd-abort

4210211

return list of empty lists in no-request case

02b0578

Signed-off-by: Nick Hill <nhill@redhat.com>

njhill merged commit b6a3a9f into vllm-project:main Jun 6, 2025
64 checks passed

Uh oh!

Uh oh!

[Core] Fix abrupt request abort #18485

[Core] Fix abrupt request abort #18485

Uh oh!

Conversation

NickLucche commented May 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 21, 2025

Uh oh!

mergify bot commented May 21, 2025

Uh oh!

mergify bot commented May 23, 2025

Uh oh!

heheda12345 commented May 29, 2025

Uh oh!

NickLucche commented May 29, 2025

Uh oh!

heheda12345 commented May 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NickLucche commented Jun 2, 2025

Uh oh!

Abatom commented Jun 2, 2025

Uh oh!

NickLucche commented Jun 3, 2025

Uh oh!

njhill commented Jun 5, 2025

Uh oh!

NickLucche commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

njhill commented Jun 5, 2025

Uh oh!

njhill commented Jun 5, 2025

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

NickLucche commented Jun 5, 2025

Uh oh!

NickLucche commented Jun 6, 2025

Uh oh!

NickLucche commented Jun 6, 2025

Uh oh!

njhill Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

NickLucche commented May 21, 2025 •

edited by github-actions bot

Loading

NickLucche commented Jun 5, 2025 •

edited

Loading