Skip to content

Conversation

@faaany
Copy link
Contributor

@faaany faaany commented Sep 23, 2025

Purpose

  • add dispatch and combine methods in XpuCommunicator to fix the MOE model accuracy issue on XPU
  • add argument to let users disable expert_parallel
  • make naive the default all2all_backend on XPU

Test Plan

VLLM_WORKER_MULTIPROC_METHOD=spawn python examples/offline_inference/data_parallel.py --enforce-eager --model="ibm-research/PowerMoE-3b" --dp-size=2 --tp-size=2 --disable-expert-parallel

Before:

DP rank 1, Prompt: 'Hello, my name is', Generated text: ' a boatatatatatatatatatatatatatatatatatat'
DP rank 1, Prompt: 'The president of the United States is', Generated text: ' from two of his the fact that it is the gratatatatatatatatat'
DP rank 1, Prompt: 'The capital of France is', Generated text: ' not necessarily to to the capital capital of India. India and France have the same capital capital'
DP rank 1, Prompt: 'The future of AI is', Generated text: '. The now is. Future of AI is. Future of AI is. Future of AI is.'
DP rank 1, Prompt: 'Hello, my name is', Generated text: ' my name,’’,’’,’’,’’,’’,’’'
DP rank 0, Prompt: 'Hello, my name is', Generated text: ' is my,, my name, my my my my my my my my my'
DP rank 0, Prompt: 'The president of the United States is', Generated text: ' is is is is is is a0 to the the the the the the the'
DP rank 0, Prompt: 'The capital of France is', Generated text: ' and France is is  58 and and and and and and\n\n'
DP rank 0, Prompt: 'The future of AI is', Generated text: '””””””””””””””’s'
DP rank 0, Prompt: 'Hello, my name is', Generated text: ' is is is’ my,, me, my\xa0, my way my way' 

After:

DP rank 0, Prompt: 'Hello, my name is', Generated text: ' [your name], and I would like to explore the possibility of a career change'
DP rank 0, Prompt: 'The president of the United States is', Generated text: ' the commander in chief of the armed forces of the United States,'
DP rank 0, Prompt: 'The capital of France is', Generated text: ' Paris. The head of state is a president, who is elected'
DP rank 0, Prompt: 'The future of AI is', Generated text: ' bright and exciting. We will use the lessons learned from our models to'
DP rank 0, Prompt: 'Hello, my name is', Generated text: ' Tristan.\nI am a backpacker who has been traveling for'
DP rank 1, Prompt: 'Hello, my name is', Generated text: ' [your name], and I would like to explore the possibility of a career change into the field of'
DP rank 1, Prompt: 'The president of the United States is', Generated text: ' the commander in chief of the armed forces of the United States, including all branches of'
DP rank 1, Prompt: 'The capital of France is', Generated text: ' Paris. The head of state is a president, who is elected by universal suffr'
DP rank 1, Prompt: 'The future of AI is', Generated text: ' bright and exciting. We will use the lessons learned from our models to further enhance our capability'
DP rank 1, Prompt: 'Hello, my name is', Generated text: ' Tristan.\nI am a backpacker who has been traveling for about three years now'
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Fanli Lin <fanli.lin@intel.com>
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
@mergify mergify bot added the documentation Improvements or additions to documentation label Sep 23, 2025
@faaany
Copy link
Contributor Author

faaany commented Sep 23, 2025

cc @jikunshang @yma11 @chaojun-zhang

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an accuracy issue with Mixture-of-Experts (MoE) models on XPU devices when using data parallelism. The core of the fix involves implementing the dispatch and combine communication primitives in the XpuCommunicator, which are essential for expert parallelism. The changes correctly delegate these operations to an all2all_manager. Additionally, the PR makes the all2all backend on XPU robust by defaulting to the naive implementation, which is the only one currently supported, and warns the user if a different backend is configured. The modifications to the data parallelism example script to make enable_expert_parallel a configurable argument is also a good improvement for flexibility. The provided test results clearly demonstrate the effectiveness of the fix. The changes are well-implemented and follow existing patterns in the codebase. Overall, this is a solid contribution to improve XPU support.

Signed-off-by: Fanli Lin <fanli.lin@intel.com>
@jikunshang jikunshang enabled auto-merge (squash) September 23, 2025 09:57
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 23, 2025
@jikunshang jikunshang merged commit 4c966e4 into vllm-project:main Sep 23, 2025
59 checks passed
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: charlifu <charlifu@amd.com>
jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Sep 26, 2025
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
gjc0824 pushed a commit to gjc0824/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: gaojc <1055866782@qq.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants