Skip to content

Conversation

qli88
Copy link
Contributor

@qli88 qli88 commented May 8, 2025

  1. Changes to adapt new AITER MoE API;
  2. Changes to adapt new AITER MLA API (MTP is not enabled)
  3. Some other bug fixes;
    This PR is for AITER API changes introduced by commit 939f741fc37f46694e48c32c7164f49eae2584c4 (merged on 04/20/2025). AITER versions after this require this patch to work.

Signed-off-by: Qiang Li <qiang.li2@amd.com>
Copy link

github-actions bot commented May 8, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@hongxiayang
Copy link
Collaborator

Hi, @qli88:
Regarding "latest AITER(05/07/2025":
Is there a change related to the base docker image for aiter that caused the incompatibility? should that aiter commit change be included in this PR as well?

@hongxiayang hongxiayang added the rocm Related to AMD ROCm label May 8, 2025
@hongxiayang
Copy link
Collaborator

Any tests in this PR?

Signed-off-by: Qiang Li <qiang.li2@amd.com>
@qli88
Copy link
Contributor Author

qli88 commented May 8, 2025

Hi, @qli88: Regarding "latest AITER(05/07/2025": Is there a change related to the base docker image for aiter that caused the incompatibility? should that aiter commit change be included in this PR as well?

The incompatibility is caused by mainly 2 things: 1. AITER MoE interface changes; 2. AITER MLA interface changes. So I think this is irrelevant to base docker image.

@qli88
Copy link
Contributor Author

qli88 commented May 8, 2025

Any tests in this PR?

This change is for ROCm only and I have tested on MI300X host with DeepSeek V3.

@hongxiayang
Copy link
Collaborator

hongxiayang commented May 8, 2025

Hi, @qli88: Regarding "latest AITER(05/07/2025": Is there a change related to the base docker image for aiter that caused the incompatibility? should that aiter commit change be included in this PR as well?

The incompatibility is caused by mainly 2 things: 1. AITER MoE interface changes; 2. AITER MLA interface changes. So I think this is irrelevant to base docker image.

The reason I am asking about from which commit of aiter introduced those changes. If the customer currently only uses upstream to build from source to serve Deepseek model, does the current aiter commit (https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile.rocm_base#L15) in Dockerfile.rocm_base works with these changes?

@qli88
Copy link
Contributor Author

qli88 commented May 8, 2025

Hi, @qli88: Regarding "latest AITER(05/07/2025": Is there a change related to the base docker image for aiter that caused the incompatibility? should that aiter commit change be included in this PR as well?

The incompatibility is caused by mainly 2 things: 1. AITER MoE interface changes; 2. AITER MLA interface changes. So I think this is irrelevant to base docker image.

The reason I am asking about from which commit of aiter introduced those changes. If the customer currently only uses upstream to build from source to serve Deepseek model, does the current aiter commit (https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile.rocm_base#L15) in Dockerfile.rocm_base works with these changes?

Thanks for the context! I think the new Docker image that will be released today (or tomorrow) will use the AITER commit I mentioned above (commit c1debd87ce0391aa27438d9e07e76e4fea7c4b70).

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me but will wait for @SageMoore to sign off

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label May 8, 2025
@mgoin mgoin merged commit 9f64e93 into vllm-project:main May 9, 2025
77 checks passed
@hmellor
Copy link
Member

hmellor commented May 9, 2025

This PR introduces two undefined names qo_indptr and max_seqlen_qo to main:

mla_decode_fwd(q,
kv_buffer.view(-1, 1, 1, q.shape[-1]),
o,
qo_indptr,
kv_indptr,
kv_indices,
kv_last_page_lens,
max_seqlen_qo,
sm_scale=sm_scale,
logit_cap=logit_cap)

https://github.com/vllm-project/vllm/actions/runs/14931789556

@mgoin
Copy link
Member

mgoin commented May 9, 2025

Darn, why did the precommit not show up in the checks?

@hmellor
Copy link
Member

hmellor commented May 9, 2025

Yeah it's all green here, perhaps it was just a race condition with another PR that didn't trigger merge conflicts?

davidxia pushed a commit to davidxia/vllm that referenced this pull request May 9, 2025
…ject#17864)

Signed-off-by: Qiang Li <qiang.li2@amd.com>
Signed-off-by: David Xia <david@davidxia.com>
davidxia added a commit to davidxia/vllm that referenced this pull request May 9, 2025
Looks like vllm-project#17864 had an outdated
branch. So its [merge commit][1] caused `qo_indptr` and `max_seqlen_qo` to go
into the function signature of `aiter_mla_decode_fwd()` where they're not used
and into the body of `mla_decode_fwd_impl()` where they aren't defined.

This PR fixes the discrepancies and call-sites.

[1]: vllm-project@9f64e93#diff-88fd09f50e8cfc77678ade87483ab9a89ce58904203578f8816882763bd577c2

Signed-off-by: David Xia <david@davidxia.com>
princepride pushed a commit to princepride/vllm that referenced this pull request May 10, 2025
…ject#17864)

Signed-off-by: Qiang Li <qiang.li2@amd.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
…ject#17864)

Signed-off-by: Qiang Li <qiang.li2@amd.com>
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025
zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025
…ject#17864)

Signed-off-by: Qiang Li <qiang.li2@amd.com>
Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants