Skip to content

[Bug]: AscendSampler does not handle empty logit tensor #1133

@marcobarlo

Description

@marcobarlo

Your current environment

Collecting environment information... PyTorch version: 2.5.1 Is debug build: False

OS: Ubuntu 22.04.5 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 4.0.2
Libc version: glibc-2.35

Versions of relevant libraries:
[pip3] mypy==1.15.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] pyzmq==26.3.0
[pip3] torch==2.5.1
[pip3] torch-npu==2.5.1
[pip3] torchvision==0.20.1
[pip3] transformers==4.52.4
[conda] Could not collect
vLLM Version: 0.7.4
vLLM Ascend Version: 0.7.3.post2.dev2+gb69d41d (git sha: b69d41d)

🐛 Describe the bug

When running inference with the v0.7.3-dev model_runner.py, it uses AscendSampler which is not able to handle an empty logit tensor, necessary for chunked prefill.

Snippet of the failure code:

[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/root/vllm-ascend-vanilla/vllm-ascend/vllm_ascend/sample/sampler.py", line 61, in forward
[rank0]: logits = _apply_top_k_top_p_npu(logits, sampling_tensors.top_ps,
[rank0]: File "/root/vllm-ascend-vanilla/vllm-ascend/vllm_ascend/sample/sampler.py", line 126, in _apply_top_k_top_p_npu
[rank0]: top_p_mask[:, -1] = True
[rank0]: IndexError: index -1 is out of bounds for dimension 1 with size 0

In vllm_ascend/sampler/sample.py, when the input logit tensor to the function _apply_top_k_top_p_npu is empty, the following code misbehaves:

cutoff = top_k_mask.sum(dim=-1).min()
probs_sort = logits_sort.softmax(dim=-1)[:, cutoff:]
probs_sum = probs_sort.cumsum(dim=-1)
top_p_mask = probs_sum > 1 - p.unsqueeze(dim=1)
top_p_mask[:, -1] = True

The cutoff appears extremely high, and leaves probs_sort of size (0,0), causing a failure when the top_p_mask is sliced.

Please consider empty logits tensors for chunked prefill.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions