-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of `python collect_env.py`
Your output of `python collect_env.py` here
🐛 Describe the bug
I run serving using vllm v1 engine, using qwen32B-instruct, using fp8 static quant.
vllm==0.7.2
The error reports:
ERROR 02-11 07:01:54 core.py:210] EngineCore hit an exception: Traceback (most recent call last):
ERROR 02-11 07:01:54 core.py:210] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 203, in run_engine_core
ERROR 02-11 07:01:54 core.py:210] engine_core.run_busy_loop()
ERROR 02-11 07:01:54 core.py:210] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 243, in run_busy_loop
ERROR 02-11 07:01:54 core.py:210] outputs = self.step()
ERROR 02-11 07:01:54 core.py:210] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 129, in step
ERROR 02-11 07:01:54 core.py:210] output = self.model_executor.execute_model(scheduler_output)
ERROR 02-11 07:01:54 core.py:210] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/executor/abstract.py", line 77, in execute_model
ERROR 02-11 07:01:54 core.py:210] output = self.collective_rpc("execute_model",
ERROR 02-11 07:01:54 core.py:210] File "/usr/local/lib/python3.10/dist-packages/vllm/executor/uniproc_executor.py", line 51, in collective_rpc
ERROR 02-11 07:01:54 core.py:210] answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 02-11 07:01:54 core.py:210] File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 2220, in run_method
ERROR 02-11 07:01:54 core.py:210] return func(*args, **kwargs)
ERROR 02-11 07:01:54 core.py:210] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 02-11 07:01:54 core.py:210] return func(*args, **kwargs)
ERROR 02-11 07:01:54 core.py:210] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/worker/gpu_worker.py", line 236, in execute_model
ERROR 02-11 07:01:54 core.py:210] output = self.model_runner.execute_model(scheduler_output)
ERROR 02-11 07:01:54 core.py:210] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 02-11 07:01:54 core.py:210] return func(*args, **kwargs)
ERROR 02-11 07:01:54 core.py:210] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 804, in execute_model
ERROR 02-11 07:01:54 core.py:210] sampler_output = self.model.sample(
ERROR 02-11 07:01:54 core.py:210] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/qwen2.py", line 505, in sample
ERROR 02-11 07:01:54 core.py:210] next_tokens = self.sampler(logits, sampling_metadata)
ERROR 02-11 07:01:54 core.py:210] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 02-11 07:01:54 core.py:210] return self._call_impl(*args, **kwargs)
ERROR 02-11 07:01:54 core.py:210] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 02-11 07:01:54 core.py:210] return forward_call(*args, **kwargs)
ERROR 02-11 07:01:54 core.py:210] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/sample/sampler.py", line 46, in forward
ERROR 02-11 07:01:54 core.py:210] logits = self.apply_penalties(logits, sampling_metadata)
ERROR 02-11 07:01:54 core.py:210] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/sample/sampler.py", line 130, in apply_penalties
ERROR 02-11 07:01:54 core.py:210] logits = apply_all_penalties(
ERROR 02-11 07:01:54 core.py:210] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/sample/ops/penalties.py", line 42, in apply_all_penalties
ERROR 02-11 07:01:54 core.py:210] return apply_penalties(logits, prompt_token_ids, output_tokens_t,
ERROR 02-11 07:01:54 core.py:210] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/utils.py", line 44, in apply_penalties
ERROR 02-11 07:01:54 core.py:210] _, prompt_mask = get_token_bin_counts_and_mask(prompt_tokens_tensor,
ERROR 02-11 07:01:54 core.py:210] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/utils.py", line 18, in get_token_bin_counts_and_mask
ERROR 02-11 07:01:54 core.py:210] bin_counts.scatter_add_(1, tokens, torch.ones_like(tokens))
ERROR 02-11 07:01:54 core.py:210] RuntimeError: Expected index [32, 1543] to be smaller than self [31, 152065] apart from dimension 1 and to be smaller size than src [32, 1543]
ERROR 02-11 07:01:54 core.py:210]
C
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
imkero
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working