Skip to content

Commit e417997

Browse files
committed
fix: fix accuracy problem for quantized deepseek models
Signed-off-by: linfeng-yuan <1102311262@qq.com>
1 parent afe1767 commit e417997

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

vllm_ascend/quantization/w8a8_dynamic.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -285,8 +285,9 @@ def fused_experts(hidden_states: torch.Tensor,
285285
valid_token_mask = torch.arange(
286286
0, sorted_token_indices.shape[0],
287287
device=device).unsqueeze(1) < num_valid_tokens
288-
down_out_list.mul_(valid_token_mask)
289-
final_hidden_states.index_add_(0, sorted_token_indices, down_out_list)
288+
valid_output = torch.where(valid_token_mask, down_out_list,
289+
torch.zeros_like(down_out_list)).to(dtype)
290+
final_hidden_states.index_add_(0, sorted_token_indices, valid_output)
290291
else:
291292
# TODO: Reorder device memory 2 times here, replace the current
292293
# implementation here when suitable operators become available.

0 commit comments

Comments
 (0)