Skip to content

Conversation

@b8zhong
Copy link
Contributor

@b8zhong b8zhong commented Apr 1, 2025

The sampler in vLLM fails with

RuntimeError: gather(): Expected dtype int64 for index

because the index tensor for torch.gather isn’t cast to torch.int64, resulting in a dtype mismatch error. this cast should be safe.

Bug reproduction
python benchmarks/benchmark_serving.py \
    --backend deepspeed-mii \
    --model NousResearch/Meta-Llama-3-8B-Instruct \
    --host 127.0.0.1 \
    --port 8000 \
    --dataset-name random \
    --num-prompts 100 \
    --request-rate 10
Similar to #15065, #15049

cc @houseroad I think you would probably know this. What do you think?

Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
@github-actions
Copy link

github-actions bot commented Apr 1, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the v1 label Apr 1, 2025
Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @njhill An idea why this bug has not been detected in tests?

@WoosukKwon WoosukKwon merged commit 6efb195 into vllm-project:main Apr 2, 2025
14 checks passed
@b8zhong b8zhong deleted the fix-topk-sampler branch April 2, 2025 02:08
@WoosukKwon
Copy link
Collaborator

@njhill It's because the test uses k = torch.randint, and torch.randint outputs int64 tensors

k = torch.randint(1, 10, (BATCH_SIZE, ), generator=generator)

@njhill
Copy link
Member

njhill commented Apr 2, 2025

Thanks for fixing @b8zhong

Alex4210987 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Apr 5, 2025
…roject#15907)

Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>
lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025
…roject#15907)

Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
@hyeygit hyeygit mentioned this pull request Apr 15, 2025
lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025
…roject#15907)

Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025
…roject#15907)

Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
…roject#15907)

Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants