- 
                Notifications
    You must be signed in to change notification settings 
- Fork 530
[refactor] Refactoring forward_context and model_runner_v1 #1422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: zzzzwwjj <1183291235@qq.com>
cfd63c5    to
    48fd2a1      
    Compare
  
    …raph_mode Signed-off-by: zzzzwwjj <1183291235@qq.com>
        
          
                vllm_ascend/utils.py
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add another version here corresponding to 310P?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we maintain etp any more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the absence of relevant scenarios, employing EP or full TP is sufficient, for now. We may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the number of nodes exceeds the number of experts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, we do have customer scenarios that require such configurations. While DeepSeek models might not need this, there are use cases involving large-scale MoE (Mixture of Experts) models that require splitting across both Tensor Parallelism (TP) and Expert Parallelism (EP), or sometimes just TP alone. This is exactly the case with the current Jieyue Xingchen models
| LGTM | 
    
      
        1 similar comment
      
    
  
    | LGTM | 
| This solution is not fully aligned with the current ETP solution. For example, EP and ETP cannot be supported at the same time. | 
### What this PR does / why we need it? Remove ETP/EP maintained in branch main. We drop this as there is no relevant scenarios to use ETP now, and we may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the expert is needed to be sliced This is a part of #1422 backport. Fixes #1396 #1154 ### Does this PR introduce _any_ user-facing change? We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in vllm instead. ### How was this patch tested? CI passed with new added and existing test. - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@fe8a2c5 Signed-off-by: MengqingCao <cmq0113@163.com>
Remove ETP/EP maintained in branch main. We drop this as there is no relevant scenarios to use ETP now, and we may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the expert is needed to be sliced This is a part of vllm-project#1422 backport. Fixes vllm-project#1396 vllm-project#1154 We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in vllm instead. CI passed with new added and existing test. - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@fe8a2c5 Signed-off-by: MengqingCao <cmq0113@163.com>
### What this PR does / why we need it? Remove ETP/EP maintained in branch main. We drop this as there is no relevant scenarios to use ETP now, and we may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the expert is needed to be sliced This is a part of vllm-project#1422 backport. Fixes vllm-project#1396 vllm-project#1154 ### Does this PR introduce _any_ user-facing change? We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in vllm instead. ### How was this patch tested? CI passed with new added and existing test. - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@fe8a2c5 Signed-off-by: MengqingCao <cmq0113@163.com>
What this PR does / why we need it?
A refactoring of
forward_contextandmodel_runner_v1, add some context which is necessary in model inference intoforward_context, and refactordummy_runlogic, make it more reasonable.Some details for this PR:
examplesdir;Does this PR introduce any user-facing change?
This PR remove
expert_tensor_parallel_sizeinadditional_config, we will useenable_expert_parallelto control whether expert_parallel is enable, which is consistent with vLLM.How was this patch tested?