Your current environment
The output of `python collect_env.py`
vllm-ascend: main branch
vllm: main branch
🐛 Describe the bug
First, I'm sorry for omitting some env details here, because we found this bug in our private server environment.
In short summary, in vllm-ascend, we use additional config to enable v0 style scheduler for better performance. However, when we install the newest code d066e52013be278c7a3bc54ec9799d8457895f4d
of vllm and 218f21d..68fb634 of vllm-ascend, we encountered errors when dealing with requests such as
Runtime Error: object of type KVCacheBlocks has no len()
What happened?
The root cause of this problem is that, recently, the vllm project has rewritten the following methods of KVCacheManager (details can be found at this PR):
- Introduce
KVCacheBlocks
- get_computed_blocks method returns
tuple[KVCacheBlocks, int]
instead of Tuple[List[BlockHashType], int]
- allocate_slots has one extra arg named
num_new_computed_tokens
and returns Optional[KVCacheBlocks]