You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In draft models like Medusa, MLPSpeculator etc., when spec. decode is disabled (e.g. when the num_tokens + spec_tokens > max_len of the model) HiddenStates are not handled properly which causes an invalid shape error.
How to reproduce?
Code:
fromvllmimportLLM, SamplingParamsllm=LLM(
model="JackFram/llama-160m",
speculative_model="ibm-fms/llama-160m-accelerator",
num_speculative_tokens=3,
use_v2_block_manager=True,
enforce_eager=True,
)
prompt="The president of the United States is"output=llm.generate(prompt, SamplingParams(max_tokens=2048, ignore_eos=True))
Output:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
[<ipython-input-1-dfa52d56a4c5>](https://localhost:8080/#) in <cell line: 12>()
10 prompt = "The president of the United States is"
11
---> 12 output = llm.generate(prompt, SamplingParams(max_tokens=2048, ignore_eos=True))
10 frames
[/usr/local/lib/python3.10/dist-packages/vllm/spec_decode/spec_decode_worker.py](https://localhost:8080/#) in _verify_tokens(self, seq_group_metadata_list, proposal_scores, proposals, max_proposal_len)
645 # Contract hidden states based on accepted tokens
646 hs_size = hidden_states.shape[1]
--> 647 hidden_states = hidden_states.reshape(-1, max_proposal_len + 1,
648 hs_size)
649 accepted_index = accepted_token_ids + 1 # Convert -1 to 0
RuntimeError: shape '[-1, 4, 768]' is invalid for input of size 768
Code:
fromvllmimportLLM, SamplingParamsllm=LLM(
model="JackFram/llama-68m",
speculative_model="abhigoyal/vllm-medusa-llama-68m-random",
num_speculative_tokens=3,
use_v2_block_manager=True,
enforce_eager=True,
)
prompt="The president of the United States is"output=llm.generate(prompt, SamplingParams(max_tokens=2048, ignore_eos=True))
Output:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
[<ipython-input-1-415db326cfe4>](https://localhost:8080/#) in <cell line: 12>()
10 prompt = "The president of the United States is"
11
---> 12 output = llm.generate(prompt, SamplingParams(max_tokens=2048, ignore_eos=True))
10 frames
[/usr/local/lib/python3.10/dist-packages/vllm/spec_decode/spec_decode_worker.py](https://localhost:8080/#) in _verify_tokens(self, seq_group_metadata_list, proposal_scores, proposals, max_proposal_len)
645 # Contract hidden states based on accepted tokens
646 hs_size = hidden_states.shape[1]
--> 647 hidden_states = hidden_states.reshape(-1, max_proposal_len + 1,
648 hs_size)
649 accepted_index = accepted_token_ids + 1 # Convert -1 to 0
RuntimeError: shape '[-1, 4, 768]' is invalid for input of size 768
The text was updated successfully, but these errors were encountered:
Your current environment
The output of `python collect_env.py`
🐛 Describe the bug
In draft models like Medusa, MLPSpeculator etc., when spec. decode is disabled (e.g. when the num_tokens + spec_tokens > max_len of the model) HiddenStates are not handled properly which causes an invalid shape error.
How to reproduce?
Code:
Output:
Code:
Output:
The text was updated successfully, but these errors were encountered: