Skip to content

[Bug]: RuntimeError: Expected index [32, 1543] to be smaller than self [31, 152065] apart from dimension 1 and to be smaller size than src [32, 1543] #13076

@sleepwalker2017

Description

@sleepwalker2017

Your current environment

The output of `python collect_env.py`
Your output of `python collect_env.py` here

🐛 Describe the bug

I run serving using vllm v1 engine, using qwen32B-instruct, using fp8 static quant.
vllm==0.7.2

The error reports:

ERROR 02-11 07:01:54 core.py:210] EngineCore hit an exception: Traceback (most recent call last):
ERROR 02-11 07:01:54 core.py:210]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 203, in run_engine_core
ERROR 02-11 07:01:54 core.py:210]     engine_core.run_busy_loop()
ERROR 02-11 07:01:54 core.py:210]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 243, in run_busy_loop
ERROR 02-11 07:01:54 core.py:210]     outputs = self.step()
ERROR 02-11 07:01:54 core.py:210]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 129, in step
ERROR 02-11 07:01:54 core.py:210]     output = self.model_executor.execute_model(scheduler_output)
ERROR 02-11 07:01:54 core.py:210]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/executor/abstract.py", line 77, in execute_model
ERROR 02-11 07:01:54 core.py:210]     output = self.collective_rpc("execute_model",
ERROR 02-11 07:01:54 core.py:210]   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/uniproc_executor.py", line 51, in collective_rpc
ERROR 02-11 07:01:54 core.py:210]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 02-11 07:01:54 core.py:210]   File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 2220, in run_method
ERROR 02-11 07:01:54 core.py:210]     return func(*args, **kwargs)
ERROR 02-11 07:01:54 core.py:210]   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 02-11 07:01:54 core.py:210]     return func(*args, **kwargs)
ERROR 02-11 07:01:54 core.py:210]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/worker/gpu_worker.py", line 236, in execute_model
ERROR 02-11 07:01:54 core.py:210]     output = self.model_runner.execute_model(scheduler_output)
ERROR 02-11 07:01:54 core.py:210]   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 02-11 07:01:54 core.py:210]     return func(*args, **kwargs)
ERROR 02-11 07:01:54 core.py:210]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 804, in execute_model
ERROR 02-11 07:01:54 core.py:210]     sampler_output = self.model.sample(
ERROR 02-11 07:01:54 core.py:210]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/qwen2.py", line 505, in sample
ERROR 02-11 07:01:54 core.py:210]     next_tokens = self.sampler(logits, sampling_metadata)
ERROR 02-11 07:01:54 core.py:210]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 02-11 07:01:54 core.py:210]     return self._call_impl(*args, **kwargs)
ERROR 02-11 07:01:54 core.py:210]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 02-11 07:01:54 core.py:210]     return forward_call(*args, **kwargs)
ERROR 02-11 07:01:54 core.py:210]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/sample/sampler.py", line 46, in forward
ERROR 02-11 07:01:54 core.py:210]     logits = self.apply_penalties(logits, sampling_metadata)
ERROR 02-11 07:01:54 core.py:210]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/sample/sampler.py", line 130, in apply_penalties
ERROR 02-11 07:01:54 core.py:210]     logits = apply_all_penalties(
ERROR 02-11 07:01:54 core.py:210]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/sample/ops/penalties.py", line 42, in apply_all_penalties
ERROR 02-11 07:01:54 core.py:210]     return apply_penalties(logits, prompt_token_ids, output_tokens_t,
ERROR 02-11 07:01:54 core.py:210]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/utils.py", line 44, in apply_penalties
ERROR 02-11 07:01:54 core.py:210]     _, prompt_mask = get_token_bin_counts_and_mask(prompt_tokens_tensor,
ERROR 02-11 07:01:54 core.py:210]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/utils.py", line 18, in get_token_bin_counts_and_mask
ERROR 02-11 07:01:54 core.py:210]     bin_counts.scatter_add_(1, tokens, torch.ones_like(tokens))
ERROR 02-11 07:01:54 core.py:210] RuntimeError: Expected index [32, 1543] to be smaller than self [31, 152065] apart from dimension 1 and to be smaller size than src [32, 1543]
ERROR 02-11 07:01:54 core.py:210]
C

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions