-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[Hardware][Power] Enable compressed tensor W8A8 INT8 quantization for POWER #17153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hardware][Power] Enable compressed tensor W8A8 INT8 quantization for POWER #17153
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
49f09d6 to
c93da8a
Compare
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
c93da8a to
f89176e
Compare
|
Can you make a test for this or at least confirm you've manually tested it? Specifically a kernel test would be great |
|
Hi @mgoin,
I’ll plan to add kernel-level or architecture-specific tests as a follow-up. Logs from model tests on POWERThe output of `python collect_env.py` |
|
Hi @mgoin @DarkLight1337, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
|
Hi @mgoin , Thanks for approving the changes. |
… POWER (vllm-project#17153) Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: mgoin <mgoin64@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
… POWER (vllm-project#17153) Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: mgoin <mgoin64@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
… POWER (vllm-project#17153) Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: mgoin <mgoin64@gmail.com>
… POWER (vllm-project#17153) Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: mgoin <mgoin64@gmail.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
This PR adds support for compressed tensor W8A8 INT8 quantization on POWER architecture using oneDNN.
Key changes include: