kvcache nz #1054

chenwaner · 2025-06-03T09:08:38Z

What this PR does / why we need it?

Enable kvcache_nz for the decode process in the graph mode, which reduces the time consumed by FA in long sequences.

Does this PR introduce any user-facing change?

If need to enable kvcache_nz, should set the environment variable VLLM_ENABLE_KV_NZ: "1"

How was this patch tested?

Tested in deepseek model, with batchsize 64 and seq_len 1k+3k: 61 layers FA total time 20.80ms -> 19.76ms

github-actions · 2025-06-04T10:32:15Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

realliujiaxu · 2025-06-05T03:05:27Z

torch_npu.npu_fused_infer_attention_score only support K V with format of ND(https://www.hiascend.com/document/detail/zh/Pytorch/700/apiref/apilist/ptaoplist_001232.html). Does this PR require newer vision of torch_npu and CANN?

return to use layout=BNSD in npu_fused_infer_attention_score when KV_NZ disabled

github-actions · 2025-06-05T08:36:50Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-06-06T01:53:14Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wangxiyuan mentioned this pull request Jun 4, 2025

[release] 0.9.0rc1 release checklist #904

Closed

76 tasks

wangxiyuan changed the title ~~【修改说明】kvcache nz~~ kvcache nz Jun 4, 2025

【修改说明】kvcache nz

3730fc1

chenwaner changed the title ~~kvcache nz~~ [WIP]kvcache nz Jun 4, 2025

【修改说明】kvcache nz

5686612

github-actions bot added the module:core label Jun 4, 2025

github-actions bot added the merge-conflicts label Jun 4, 2025

Merge branch 'main' into main

415aff3

github-actions bot removed the merge-conflicts label Jun 5, 2025

chenwaner changed the title ~~[WIP]kvcache nz~~ kvcache nz Jun 5, 2025

Update mla_v1.py

8160ea7

return to use layout=BNSD in npu_fused_infer_attention_score when KV_NZ disabled

github-actions bot added the merge-conflicts label Jun 5, 2025

Merge branch 'main' into main

81a871c

github-actions bot added merge-conflicts and removed merge-conflicts labels Jun 5, 2025

Merge branch 'main' into main

07d57f6

github-actions bot removed the merge-conflicts label Jun 6, 2025

chenwaner closed this by deleting the head repository Jun 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kvcache nz #1054

kvcache nz #1054

Uh oh!

chenwaner commented Jun 3, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 4, 2025

Uh oh!

realliujiaxu commented Jun 5, 2025

Uh oh!

github-actions bot commented Jun 5, 2025

Uh oh!

github-actions bot commented Jun 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kvcache nz #1054

kvcache nz #1054

Uh oh!

Conversation

chenwaner commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Jun 4, 2025

Uh oh!

realliujiaxu commented Jun 5, 2025

Uh oh!

github-actions bot commented Jun 5, 2025

Uh oh!

github-actions bot commented Jun 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chenwaner commented Jun 3, 2025 •

edited

Loading