-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[Feature] use --eplb_config to set eplb param #20562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @lengrongfu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly refactors how Expert Parallelism Load Balancing (EPLB) parameters are managed and passed within the vLLM system. The core change involves introducing a dedicated EPConfig dataclass to encapsulate all EPLB-related settings, moving away from disparate individual parameters. This not only improves the internal organization of the configuration but also streamlines the command-line interface by consolidating multiple arguments into a single, JSON-configurable --ep-config option. The necessary code adaptations have been made across the project to utilize this new, more structured approach.
Highlights
- Configuration Refactoring: All Expert Parallelism Load Balancing (EPLB) related parameters have been consolidated into a new
EPConfigdataclass. This centralizes the configuration for features likeenable_eplb,lb_window_size,lb_step_interval, andlb_log_balancedness. - Simplified CLI Interface: The multiple individual command-line arguments for EPLB parameters (e.g.,
--enable-eplb,--eplb-window-size) have been replaced with a single--ep-configargument. This new argument accepts a JSON string, allowing for more flexible and structured configuration directly from the CLI. - Codebase Adaptation: Various parts of the vLLM codebase, including
ParallelConfig,EngineArgs,eplb_state.py,deepseek_v2.py, andgpu_model_runner.py, have been updated to reflect the new nestedep_configstructure, ensuring seamless integration and continued functionality.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a great step towards better code organization by refactoring the EPLB parameters into a dedicated EPConfig dataclass. The implementation is clean and consistent across the codebase. I've identified one critical issue with the programmatic initialization of ep_config that would cause a runtime failure. The fix is straightforward and included in my comment. Once that's addressed, this PR will be in excellent shape.
a4b836b to
ee1bca0
Compare
|
@abmfy please review when you get the chance, thanks! |
e059426 to
c775cd2
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not left a full review but I have a few questions/suggestions.
The main question is why not add more fields to ParallelConfig?
This is a good suggestion, and it would be better to implement it in ParallelConfig. |
c775cd2 to
b525e3f
Compare
|
@hmellor please take a look, will done modify based on suggest. |
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: rongfu.leng <lenronfu@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Duncan Moss <djm.moss@gmail.com>
### What this PR does / why we need it? 1. use action/checkout@v5 instead of v4 2. remove dbo test case because there is issue with it and will be refactored later 3. make vllm-ascend compatible with vllm v0.10.1.1 and add CI for it 4. fix sampler api changes introduced by vllm-project/vllm#22387 6. fix qwen3 moe config changes intruoduced by vllm-project/vllm#20562 7. fix kvcache block changes introduced by vllm-project/vllm#23262 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@0c6e40b --------- Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: rongfu.leng <lenronfu@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: rongfu.leng <lenronfu@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: rongfu.leng <lenronfu@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: rongfu.leng <lenronfu@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Yiwen Chen <yiwen66@berkeley.edu>
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: rongfu.leng <lenronfu@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
Hi @lengrongfu, thank you again for your contribution! Just wanted to check in on the update for this doc:
|
I will commit a pr in today. thanks tips ~ |
### What this PR does / why we need it? 1. use action/checkout@v5 instead of v4 2. remove dbo test case because there is issue with it and will be refactored later 3. make vllm-ascend compatible with vllm v0.10.1.1 and add CI for it 4. fix sampler api changes introduced by vllm-project/vllm#22387 6. fix qwen3 moe config changes intruoduced by vllm-project/vllm#20562 7. fix kvcache block changes introduced by vllm-project/vllm#23262 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@0c6e40b --------- Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: rongfu.leng <lenronfu@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
### What this PR does / why we need it? 1. use action/checkout@v5 instead of v4 2. remove dbo test case because there is issue with it and will be refactored later 3. make vllm-ascend compatible with vllm v0.10.1.1 and add CI for it 4. fix sampler api changes introduced by vllm-project/vllm#22387 6. fix qwen3 moe config changes intruoduced by vllm-project/vllm#20562 7. fix kvcache block changes introduced by vllm-project/vllm#23262 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@0c6e40b --------- Signed-off-by: MengqingCao <cmq0113@163.com>
### What this PR does / why we need it? 1. use action/checkout@v5 instead of v4 2. remove dbo test case because there is issue with it and will be refactored later 3. make vllm-ascend compatible with vllm v0.10.1.1 and add CI for it 4. fix sampler api changes introduced by vllm-project/vllm#22387 6. fix qwen3 moe config changes intruoduced by vllm-project/vllm#20562 7. fix kvcache block changes introduced by vllm-project/vllm#23262 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@0c6e40b --------- Signed-off-by: MengqingCao <cmq0113@163.com>
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
Test Plan
Test Result
(Optional) Documentation Update