You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This situation can occur when the API server receives a client
disconnect (and thus sends an abort) around the same time a prefill
completes and we keep the blocks (delay_free_blocks) around for a
remote decode. We should assume the blocks may be used, and so
we ignore the abort. If they are not used, they should be freed
by the connector after a timeout.
The original error was:
```
[scheduler.py:1183] Finished sending KV transfer for request cmpl-37c560d3-5680-4bd1-97f9-7ed31a56de60-0
File "/opt/vllm-source/vllm/v1/engine/core.py", line 292, in step
engine_core_outputs = self.scheduler.update_from_output(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/vllm-source/vllm/v1/core/sched/scheduler.py", line 893, in update_from_output
self._update_from_kv_xfer_finished(
File "/opt/vllm-source/vllm/v1/core/sched/scheduler.py", line 1184, in _update_from_kv_xfer_finish>
self._free_blocks(self.requests[req_id])
~~~~~~~~~~~~~^^^^^^^^
KeyError: 'cmpl-37c560d3-5680-4bd1-97f9-7ed31a56de60-0'
```
But since vllm-project#25844 we would log a warning. This fix makes it so
that situation in `_update_from_kv_xfer_finish()` should never
occur.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
0 commit comments