Skip to content

Conversation

@ZhengWG
Copy link
Contributor

@ZhengWG ZhengWG commented Jun 14, 2025

What this PR does / why we need it?

  • Fixed issue: #122 , support for expert mapping with redundant experts

Does this PR introduce any user-facing change?

No

How was this patch tested?

Test passed when passed a expert_map with redundant_experts == 16:

export VLLM_ENABLE_MC2=1
export VLLM_USE_V1=1
export TASK_QUEUE_ENABLE=1

source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh

export ASCEND_LAUNCH_BLOCKING=0
export VLLM_VERSION=0.9.0
MODEL_PATH=DeepSeek-R1-W8A8-VLLM
python -m vllm.entrypoints.openai.api_server --model=$MODEL_PATH \
    --load-format=prefetch_auto \
    --quantization ascend \
    --served-model-name auto \
    --trust-remote-code \
    --distributed-executor-backend=mp \
    --port 8006 \
    -tp=8 \
    -dp=2 \
    --enable-expert-parallel \
    --max-num-seqs 24 \
    --max-model-len 2048 \
    --max-num-batched-tokens 2048 \
    --block-size 128 \
    --no-enable-prefix-caching \
    --additional-config '{"torchair_graph_config":{"enabled":true,"use_cached_graph":true,"graph_batch_sizes":[24]},"ascend_scheduler_config":{"enabled":true}, "expert_tensor_parallel_size":1, "expert_map_path": "delta_gsm8k_temp0.0_16_16.json"}' \
    --gpu-memory-utilization 0.90

TODO:

  • [☑️] Support EPLB with no redundant experts
  • [☑️] Support EPLB with redundant experts
  • [☑️] Add e2e unit test

@ZhengWG
Copy link
Contributor Author

ZhengWG commented Jun 14, 2025

@wangxiyuan can you help review it~

Copy link
Collaborator

@jianzs jianzs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines -1095 to +1219
local_num_experts = torch.sum(self.expert_map != -1) \
if self.expert_map is not None else num_experts
if self.log2phy is not None:
local_num_experts = self.local_num_experts
else:
local_num_experts = torch.sum(self.expert_map != -1) \
if self.expert_map is not None else num_experts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use the self.local_num_experts value when log2phy is None? It's already set by determine_expert_map.

self.local_num_experts, self.expert_map = determine_expert_map(
self.ep_size,
get_ep_group().rank_in_group, self.global_num_experts)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should return the same value. The current implementation intentionally preserves the original logic.

@wangxiyuan
Copy link
Collaborator

wangxiyuan commented Jun 16, 2025

You should add a e2e test for eplb case. I notice that there is PR for eplb test. #1186 can you combine it together to make sure the feature works as expect.

@songshanhu07
Copy link
Contributor

You need to check whether there is a rank with the same expert number in a JSON file. This may be the reason for your runtime error. The code changes you merged don't seem to make much sense.

@ZhengWG
Copy link
Contributor Author

ZhengWG commented Jun 16, 2025

You need to check whether there is a rank with the same expert number in a JSON file. This may be the reason for your runtime error. The code changes you merged don't seem to make much sense.

Because when num_redundant_experts > 0, multiple experts with identical logic number might be loaded onto a single rank.

@ZhengWG
Copy link
Contributor Author

ZhengWG commented Jun 16, 2025

You should add a e2e test for eplb case. I notice that there is PR for eplb test. #1186 can you combine it together to make sure the feature works as expect.

Ok,I will add it soon.

@ZhengWG ZhengWG force-pushed the eplb-fix-redunt branch 2 times, most recently from 09051ef to 689f6ed Compare June 24, 2025 06:09
@ZhengWG
Copy link
Contributor Author

ZhengWG commented Jun 24, 2025

Hi @wangxiyuan,

I've added the E2E test and verified it locally. Could you please review the changes when you have time?

Let me know if you have any questions or suggestions.

Thanks in advance!

@ZhengWG ZhengWG force-pushed the eplb-fix-redunt branch 3 times, most recently from ff5ae37 to eb29669 Compare June 24, 2025 06:31
@ZhengWG ZhengWG changed the title [EPLB]: Correct local expert number calculation with redundant experts [EPLB]: Correct local expert number calculation with redundant experts && add e2e test Jun 24, 2025
@codecov
Copy link

codecov bot commented Jun 24, 2025

Codecov Report

❌ Patch coverage is 25.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.73%. Comparing base (c30ddb8) to head (d74bd9f).
⚠️ Report is 304 commits behind head on main.

Files with missing lines Patch % Lines
vllm_ascend/ops/fused_moe.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1223       +/-   ##
===========================================
+ Coverage   27.39%   50.73%   +23.34%     
===========================================
  Files          56       77       +21     
  Lines        6191     9413     +3222     
===========================================
+ Hits         1696     4776     +3080     
- Misses       4495     4637      +142     
Flag Coverage Δ
unittests 50.73% <25.00%> (+23.34%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Yikun
Copy link
Collaborator

Yikun commented Jun 24, 2025

is it ready to go? Please do a rebase?

@ZhengWG
Copy link
Contributor Author

ZhengWG commented Jun 25, 2025

is it ready to go? Please do a rebase?

It's ready now~ @Yikun @wangxiyuan

export VLLM_ENABLE_MC2=1
export VLLM_USE_V1=1
export TASK_QUEUE_ENABLE=1
export VLLM_VERSION=0.9.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this

Suggested change
export VLLM_VERSION=0.9.1

def build_expert_map(expert_map_path,
num_redundant_expert=0,
num_layer=58,
num_device=16,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accutally there is only 4 cards on CI now, please reduce num_device to make it work

@MengqingCao
Copy link
Collaborator

@ZhengWG ZhengWG force-pushed the eplb-fix-redunt branch 3 times, most recently from 11eb86d to 0946edf Compare July 3, 2025 02:35
@github-actions github-actions bot added documentation Improvements or additions to documentation ci/build module:quantization merge-conflicts labels Jul 3, 2025
@github-actions
Copy link

github-actions bot commented Jul 3, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

ZhengWG added 6 commits July 3, 2025 22:13
Signed-off-by: ZhengWG <zwg0606@gmail.com>
Signed-off-by: ZhengWG <zwg0606@gmail.com>
Signed-off-by: ZhengWG <zwg0606@gmail.com>
Signed-off-by: ZhengWG <zwg0606@gmail.com>
Signed-off-by: ZhengWG <zwg0606@gmail.com>
@ZhengWG ZhengWG force-pushed the eplb-fix-redunt branch from d9f76dd to d74bd9f Compare July 3, 2025 14:13
@github-actions github-actions bot removed documentation Improvements or additions to documentation ci/build module:quantization labels Jul 3, 2025
@ZhengWG
Copy link
Contributor Author

ZhengWG commented Jul 4, 2025

The same e2e test passes successfully in my local environment but fails on CI. The root cause appears to be a CANN version mismatch affecting EP parallel execution, @MengqingCao can you help check it, here is my local env info:
image

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@wangxiyuan
Copy link
Collaborator

eplb will be refactor, let's close this now.

@wangxiyuan wangxiyuan closed this Aug 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants