Skip to content

Conversation

@JC-ut0
Copy link
Contributor

@JC-ut0 JC-ut0 commented Aug 1, 2025

What this PR does / why we need it?

[BUGFIX][0.9.1] FIX ring_mla input ‘query_lens’ to cpu

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: xuyexiong <xuyexiong@huawei.com>
@ganyi1996ppo ganyi1996ppo merged commit 741a8cf into vllm-project:v0.9.1-dev Aug 1, 2025
16 checks passed
liyu119 added a commit to rjg-lyh/vllm-ascend that referenced this pull request Aug 11, 2025
…nto qwen30-dev

* 'qwen30-dev' of https://github.com/rjg-lyh/vllm-ascend:
  [V0.9.1] Replace FA ops with FA_V2 to optimize perf
  [0.9.1]remove chunked_prefill_for_mla (vllm-project#2177)
  move with_prefill allreduce from cpu to npu (vllm-project#2230)
  [v0.9.1] Add release note for v0.9.1rc2 (vllm-project#2233)
  [Docs] Sync main doc to v0.9.1-dev (vllm-project#2227)
  [0.9.1] Enable external distributed dp deployments in vllm ascend(0.9.1 only) (vllm-project#2109)
  [V0.9.1][BugFix] Fix the bug in decoraotor patch (vllm-project#2199)
  [v0.9.1][Bugfix][PD] Auto-clear producer KV cache if no pull notification (vllm-project#2085)
  [BUGFIX][0.9.1] FIX ring_mla input ‘query_lens’ to cpu (vllm-project#2170)
  [0.9.1][Prefill Perf] add D2H & initRoutingQuantV2 (vllm-project#2038)
  [bugfix] add with_prefill cpu allreduce to handle D-node recomputatio… (vllm-project#2129)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants