Skip to content

Commit 524d7b0

Browse files
committed
solve precision synchronization
Co-authored-by: baxingpiaochong <771405853@qq.com> Signed-off-by: LCAIZJ <leichao139636@163.com>
1 parent 517fd92 commit 524d7b0

File tree

1 file changed

+1
-2
lines changed

1 file changed

+1
-2
lines changed

vllm_ascend/worker/model_runner_v1.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2339,7 +2339,6 @@ def execute_model(
23392339
attn_metadata, self.with_prefill, maybe_padded_num_tokens,
23402340
input_ids, positions, intermediate_tensors, inputs_embeds)
23412341

2342-
self.maybe_wait_for_kv_save()
23432342
finished_sending, finished_recving = self.get_finished_kv_transfer(
23442343
scheduler_output)
23452344

@@ -2603,7 +2602,7 @@ def propose_draft_token_ids(sampled_token_ids):
26032602
# ngram and other speculative decoding methods use the sampled
26042603
# tokens on the CPU, so they are run after bookkeeping.
26052604
propose_draft_token_ids(valid_sampled_token_ids)
2606-
2605+
self.maybe_wait_for_kv_save()
26072606
if has_kv_transfer_group():
26082607
get_kv_transfer_group().clear_connector_metadata()
26092608

0 commit comments

Comments
 (0)