-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[torch.compile] initial integration #8949
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
simple test on H100: throughput: $ # main branch
$ python benchmarks/benchmark_throughput.py --input-len 256 --output-len 256 --model meta-llama/Meta-Llama-3-8B
Throughput: 28.99 requests/s, 14843.59 tokens/s
$ # this branch
$ python benchmarks/benchmark_throughput.py --input-len 256 --output-len 256 --model meta-llama/Meta-Llama-3-8B
Throughput: 28.89 requests/s, 14792.03 tokens/s
$ # this branch
$ VLLM_TORCH_COMPILE_LEVEL=2 python benchmarks/benchmark_throughput.py --input-len 256 --output-len 256 --model meta-llama/Meta-Llama-3-8B
Throughput: 29.90 requests/s, 15309.14 tokens/s about 3.5% throughput improvement single request serving (Output token throughput (tok/s)):
|
pipeline parallelwhen I enable pipeline parallel, there's a dynamo error:
cc @anijain2305 it turns out to be caused by Line 1152 in f13a07b
when I change it to normal tensor parallelwhen I enable tensor parallel, it runs but the output is wrong. I'm still investigating. |
Seen in vllm-project/vllm#8949 [ghstack-poisoned]
Seen in vllm-project/vllm#8949 ghstack-source-id: 9772ad284d8cbe809147943d2f39da701cd85686 Pull Request resolved: #137044
Seen in vllm-project/vllm#8949 ghstack-source-id: 785c59a2b4c04c5bab91eefc0fbb25f946dbe96d Pull Request resolved: #137044
…taclass has untouched __new__" Seen in vllm-project/vllm#8949 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
…uched __new__" Seen in vllm-project/vllm#8949 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
…taclass has untouched __new__" Seen in vllm-project/vllm#8949 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
…uched __new__" Seen in vllm-project/vllm#8949 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
Seen in vllm-project/vllm#8949 ghstack-source-id: 70445f10233dadf74abe0e31b3151a374bea711b Pull Request resolved: #137044
…rch#137044) Seen in vllm-project/vllm#8949 Pull Request resolved: pytorch#137044 Approved by: https://github.com/jansel
close as it has been moved to #9058 |
TODOs (can be future PRs):