-
Notifications
You must be signed in to change notification settings - Fork 1
Load tuned fused_moe_lora shrink and expand kernel configs separately… #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Fardin Hoque <kfhfar@amazon.com>
…lm-project#26728) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
…dels (vllm-project#26526) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Ayush Singh <ayush1009208@gmail.com>
…24354) Signed-off-by: Lu Fang <fanglu@fb.com>
…t#26732) Signed-off-by: mgoin <mgoin64@gmail.com>
…cifying compile sizes (vllm-project#26681) Signed-off-by: angelayi <yiangela7@gmail.com>
…NSE (vllm-project#26742) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: bk-201 <joy25810@foxmail.com>
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
…-project#24024) Signed-off-by: n1ck-guo <heng.guo@intel.com> Signed-off-by: Heng Guo <heng.guo@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com>
…t#26602) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Update to default_act_function and pass as callable
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
…m-project#26723) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
…d. (alternative PR) (vllm-project#26718) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
…#26758) Signed-off-by: Ryan Li <ryanli@ryanli.org>
…-project#26684) Signed-off-by: wangyafeng <wangyafeng@baidu.com>
…26750) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
… ` (vllm-project#20983) Signed-off-by: Max Wittig <max.wittig@siemens.com> Signed-off-by: Antoine Auger <antoineauger@users.noreply.github.com> Co-authored-by: Antoine Auger <antoineauger@users.noreply.github.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…llm-project#27085) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com> Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com>
…m-project#27169) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Chen Wu <cntryroa@gmail.com>
enable passing activation func so act_wrapper in lora will be called …
…tural output are enabled (vllm-project#26586) Signed-off-by: southfreebird <yvorott@gmail.com>
…ole` (vllm-project#27166) Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com>
Signed-off-by: uyzhang <yi.zhang.4096@gmail.com> Signed-off-by: Yi Zhang <zhangyi970819@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Andy Lo <andy@mistral.ai> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Natan Bagrov <nbagrov@nvidia.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Natan Bagrov <nbagrov@nvidia.com> Co-authored-by: Roger Wang <hey@rogerw.io>
… H100 (FP8/BF16) (vllm-project#26268) Signed-off-by: Shivam <shivampr.dev@gmail.com>
…llm-project#27195) Signed-off-by: NickLucche <nlucches@redhat.com>
vllm-project#23812) Signed-off-by: n1ck-guo <heng.guo@intel.com> Signed-off-by: Heng Guo <heng.guo@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
fused_moe_lorakernelshrinkandexpandkernel configs in thefused_moe_lorafunctionnum_stagesandnum_warpsparameters in the configsNote: Based on PR vllm-project#21229 and vllm-project#26319
Test Plan
Test Result
Together with vllm-project#26319 we can improve the OTPS 80% - 90% in GPT-OSS-120B when concurrency is 1 or 2.
(Optional) Documentation Update