-
-
Couldn't load subscription status.
- Fork 10.9k
[Perf] Use small max_num_batched_tokens for A100 #17885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
|
It might make sense to move all these heuristics into the respective platform file? As it is right now we have side effects where non-GPU systems (except TPU) are having their defaults set based on memory heuristics meant for GPUs. i.e. if you have a CPU platform with >70GB RAM you're get the large GPU defaults for edit: I've just noticed this is V1 only (no CPU support yet), but this will probably cause weird behaviour once we do support CPU in V1 |
|
Can you merge from main to fix pre-commit? |
|
Would |
|
Let me fix the precommit |
|
The CI seems to stall, rebuild the CI to try to make it green. |
Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
|
@KuntaiDu Hello Kuntai, Why do we need to use smaller max_num_batched_tokens on A100? What is the reason for this? Looking forward to your guidance🙏 |
This PR solves a performance regression issue on A100. Related PR: #17073
vLLM launching command:
TP=2:
TP=4:
Benchmarking command: