[Bug]: Failed to generate normal outputs on deepseek-vl2-tiny's MoE LM backbone #12015
Closed
1 task done
Labels
bug
Something isn't working
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
When developing support for
deepseek-vl2-tiny
, I noticed the model was generating gibberish outputs except first prompt in the batch, even if setmax_num_seqs=1
:After further investigation, I found that the problem occurred on the output of MoE LM's attention decode. Here is code to reproduce the issue from the extracted MoE LM:
By running this code with xformers backend, in the first self-attention layer during decoding, the q and k input of
Attention
is same across the batch. However, the decode output became all zeros after first prompt in the batch:If using flash-attention backend on V0, an RuntimeError about incorrect k_cache shape will occur:
Error on V0 with FA
A similar error will also occur if I switch to V1:
Error on V1 with FA
Noted that this issue only occurred on
deepseek-vl2-tiny
's DeepSeek-V1 style MoE LM backbone, while other DeepSeek-V1 checkpoints don't have this issueBefore submitting a new issue...
The text was updated successfully, but these errors were encountered: