Skip to content

Commit cedceb7

Browse files
yewentao256xuebwang-amd
authored andcommitted
[Bug] Fix Long Context OOM Issue (vllm-project#25290)
Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
1 parent 8f45029 commit cedceb7

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm/v1/attention/backends/mla/common.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -481,7 +481,7 @@ def __init__(self,
481481
# which would result in up-projected context being
482482
# 2*(192*128)*(64*1024) = 3gb
483483
# (assuming 192 QK head dim, 128 heads, and fp16)
484-
128 * 1024)
484+
64 * 1024)
485485
assert self.chunked_prefill_workspace_size >= \
486486
scheduler_config.max_num_seqs * cache_config.block_size
487487
if self.dcp_world_size > 1:

0 commit comments

Comments
 (0)