[V1] Fix: make sure `k_index` is int64 for `apply_top_k_only` #15907

b8zhong · 2025-04-01T21:20:41Z

The sampler in vLLM fails with

RuntimeError: gather(): Expected dtype int64 for index

because the index tensor for torch.gather isn’t cast to torch.int64, resulting in a dtype mismatch error. this cast should be safe.

Bug reproduction

python benchmarks/benchmark_serving.py \
    --backend deepspeed-mii \
    --model NousResearch/Meta-Llama-3-8B-Instruct \
    --host 127.0.0.1 \
    --port 8000 \
    --dataset-name random \
    --num-prompts 100 \
    --request-rate 10

Similar to #15065, #15049

cc @houseroad I think you would probably know this. What do you think?

Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>

github-actions · 2025-04-01T21:20:50Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

WoosukKwon

LGTM. @njhill An idea why this bug has not been detected in tests?

WoosukKwon · 2025-04-02T02:09:33Z

@njhill It's because the test uses k = torch.randint, and torch.randint outputs int64 tensors

vllm/tests/v1/sample/test_topk_topp_sampler.py

Line 21 in 6efb195

k = torch.randint(1, 10, (BATCH_SIZE, ), generator=generator)

njhill · 2025-04-02T03:40:29Z

Thanks for fixing @b8zhong

…roject#15907) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca> Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>

…roject#15907) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

…roject#15907) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>

…roject#15907) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

fix: make sure is int64

84c098c

Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>

b8zhong requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners April 1, 2025 21:20

mergify bot added the v1 label Apr 1, 2025

WoosukKwon approved these changes Apr 2, 2025

View reviewed changes

WoosukKwon merged commit 6efb195 into vllm-project:main Apr 2, 2025
14 checks passed

b8zhong deleted the fix-topk-sampler branch April 2, 2025 02:08

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[V1] Fix: make sure k_index is int64 for apply_top_k_only (vllm-p…

213cecc

…roject#15907) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

hyeygit mentioned this pull request Apr 15, 2025

[V1][TPU] Enable Top K #15489

Merged

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[V1] Fix: make sure k_index is int64 for apply_top_k_only (vllm-p…

96854be

…roject#15907) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[V1] Fix: make sure k_index is int64 for apply_top_k_only (vllm-p…

a1fbc63

…roject#15907) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[V1] Fix: make sure `k_index` is int64 for `apply_top_k_only` #15907

[V1] Fix: make sure `k_index` is int64 for `apply_top_k_only` #15907

Uh oh!

b8zhong commented Apr 1, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Apr 1, 2025

Uh oh!

WoosukKwon left a comment

Uh oh!

Uh oh!

WoosukKwon commented Apr 2, 2025

Uh oh!

njhill commented Apr 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[V1] Fix: make sure k_index is int64 for apply_top_k_only #15907

[V1] Fix: make sure k_index is int64 for apply_top_k_only #15907

Uh oh!

Conversation

b8zhong commented Apr 1, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 1, 2025

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

WoosukKwon commented Apr 2, 2025

Uh oh!

njhill commented Apr 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[V1] Fix: make sure `k_index` is int64 for `apply_top_k_only` #15907

[V1] Fix: make sure `k_index` is int64 for `apply_top_k_only` #15907

b8zhong commented Apr 1, 2025 •

edited by github-actions bot

Loading