- 
                Notifications
    You must be signed in to change notification settings 
- Fork 528
[Bugfix][Model] Fix fusedmoe and make modelrunner_v1 compatible with latest vllm #867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
| max_model_len=self.max_model_len, | ||
| max_num_batched_tokens=self.max_num_tokens, | ||
| device=self.device, | ||
| pin_memory=self.pin_memory, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pin_memory=True
| cache_config.cache_dtype] | ||
|  | ||
| self.attn_metadata_builders: list[AscendAttentionMetadataBuilder] = [] | ||
| self.attn_backends: list[type[AscendAttentionBackend]] = [] | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useless 2L
| self.scheduler_config = vllm_config.scheduler_config | ||
| self.chunked_prefill_enabled = vllm_config.scheduler_config.chunked_prefill_enabled | ||
| self.device = device | ||
| self.pin_memory = True | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useless
|  | ||
| self.is_multimodal_model = self.model_config.is_multimodal_model | ||
| self.block_size = vllm_config.cache_config.block_size | ||
| self.max_model_len = self.model_config.max_model_len | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useless, use self.model_config.max_model_len for InputBatch
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
| LGTM. let's merge this to unblock CI once the CI passed. Thanks for the fix. | 
Signed-off-by: MengqingCao <cmq0113@163.com>
| 
 Thanks, I make a small change in the latest commit, plz help to review it. | 
| self.local_num_experts = self.global_num_experts | ||
| self.expert_map = None | ||
|  | ||
| if vllm_version_is("0.8.5") or vllm_version_is("0.8.5.post1"): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this part of the code may not be needed, refer to the modification of this part in PR 863
However, the most urgent thing at present is to fix CI, which can be considered later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes let' make ci happy first then solve the bug later
…latest vllm (vllm-project#867) ### What this PR does / why we need it? this PR fix CI failure broken by vllm. 1. add moe_config for fused_moe 2. adjust the change for kv cache group from vllm. currently vllm-ascend doesn't support this feature. this is just a quick fix for backward compatibility fix: vllm-project#872 --------- Signed-off-by: MengqingCao <cmq0113@163.com>
…latest vllm (vllm-project#867) ### What this PR does / why we need it? this PR fix CI failure broken by vllm. 1. add moe_config for fused_moe 2. adjust the change for kv cache group from vllm. currently vllm-ascend doesn't support this feature. this is just a quick fix for backward compatibility fix: vllm-project#872 --------- Signed-off-by: MengqingCao <cmq0113@163.com>
What this PR does / why we need it?
this PR fix CI failure broken by vllm.
fix: #872