Skip to content

Commit 18ed313

Browse files
lcy4869Sherry4869
andauthored
[Misc] update the comments (#15780)
Signed-off-by: chengyang liu <lcy4869@gmail.com> Co-authored-by: chengyang liu <lcy4869@gmail.com>
1 parent 9b459ec commit 18ed313

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm/v1/worker/gpu_model_runner.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -673,7 +673,7 @@ def _compute_cascade_attn_prefix_len(
673673
# use two kernels for cascade attention. Let's imagine:
674674
# Request 3's input query: [D]
675675
# Request 3's kv cache: [A, B, C, D]
676-
# Request 3's num_computed_tokens: 4 (i.e., [A, B, C, D])
676+
# Request 3's num_computed_tokens: 3 (i.e., [A, B, C])
677677
# If we use [A, B, C, D] as the common prefix for Request 1-3,
678678
# then Request 3 will be processed only by the first kernel,
679679
# and the second kernel will get an empty input. While this is not

0 commit comments

Comments
 (0)