Skip to content

Commit 0ea7d83

Browse files
committed
fix: fix accuracy problem for quantized deepseek models
Signed-off-by: linfeng-yuan <1102311262@qq.com>
1 parent afe1767 commit 0ea7d83

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

vllm_ascend/quantization/w8a8_dynamic.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -285,7 +285,8 @@ def fused_experts(hidden_states: torch.Tensor,
285285
valid_token_mask = torch.arange(
286286
0, sorted_token_indices.shape[0],
287287
device=device).unsqueeze(1) < num_valid_tokens
288-
down_out_list.mul_(valid_token_mask)
288+
down_out_list = down_out_list.masked_fill_(~valid_token_mask,
289+
0).to(dtype)
289290
final_hidden_states.index_add_(0, sorted_token_indices, down_out_list)
290291
else:
291292
# TODO: Reorder device memory 2 times here, replace the current

0 commit comments

Comments
 (0)