Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when reproducing mistral results #55

Open
yuanyehome opened this issue Nov 8, 2024 · 2 comments
Open

Error when reproducing mistral results #55

yuanyehome opened this issue Nov 8, 2024 · 2 comments

Comments

@yuanyehome
Copy link

When I want to reproduce the original model results of mistral-7b-v0.2 without flash-attn I got the error:

Traceback (most recent call last):
  File "/home/yuanye/long_llm/InfLLM/benchmark/pred.py", line 330, in <module>
    preds = get_pred(
  File "/home/yuanye/long_llm/InfLLM/benchmark/pred.py", line 263, in get_pred
    output = searcher.generate(
  File "/home/yuanye/long_llm/InfLLM/inf_llm/utils/greedy_search.py", line 32, in generate
    result = self._decode(input_ids, **kwargs)
  File "/home/yuanye/long_llm/InfLLM/inf_llm/utils/greedy_search.py", line 54, in _decode
    out = self.model(
  File "/home/ma-user/anaconda3/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ma-user/anaconda3/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ma-user/anaconda3/envs/infllm/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1065, in forward
    outputs = self.model(
  File "/home/ma-user/anaconda3/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ma-user/anaconda3/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/yuanye/long_llm/InfLLM/inf_llm/utils/patch.py", line 102, in model_forward
    layer_outputs = decoder_layer(
  File "/home/ma-user/anaconda3/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ma-user/anaconda3/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ma-user/anaconda3/envs/infllm/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 528, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/ma-user/anaconda3/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ma-user/anaconda3/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/yuanye/long_llm/InfLLM/inf_llm/utils/patch.py", line 16, in hf_forward
    ret = forward(
  File "/home/yuanye/long_llm/InfLLM/inf_llm/attention/origin.py", line 49, in forward
    score = torch.matmul(h_q, h_k.transpose(-1, -2))
RuntimeError: The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 1

It seems that inf_llm/attention/origin.py does not support GQA in mistral. How to fix it?

@guyan364
Copy link
Collaborator

guyan364 commented Nov 8, 2024

Hi, you can add repeat_kv from inf_llm/attention/utils.py before the qk computation.

@yuanyehome
Copy link
Author

Hi, you can add repeat_kv from inf_llm/attention/utils.py before the qk computation.

I've found it. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants