- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.9k
Integrate quick allreduce and select the best allreduce implementation #18473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate quick allreduce and select the best allreduce implementation #18473
Conversation
| 👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run  Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add  🚀 | 
5a6a5a8    to
    e272658      
    Compare
  
    | what's the relationship between this and #16804 ? | 
| 
 Aha, maybe we are in competition. We're from amd. We recently spent some time trying to integrate qr into vllm (because qr is very suitable for rocm) Integrating qr makes the two pr have many similarities, but it seems that the pr you mentioned #16804 only supports Q8 and Q 4. There are no obvious boundary conditions, quantization seems to have some problems, and lack of experimental data. Maybe we can work together to finish the work. | 
08caa03    to
    0989304      
    Compare
  
    | This pull request has merge conflicts that must be resolved before it can be | 
0989304    to
    84b2ca1      
    Compare
  
    | This pull request has merge conflicts that must be resolved before it can be | 
d280d21    to
    f194cac      
    Compare
  
    | This pull request has merge conflicts that must be resolved before it can be | 
f194cac    to
    50bd787      
    Compare
  
    | This pull request has merge conflicts that must be resolved before it can be | 
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
50bd787    to
    3458abc      
    Compare
  
    | @youkaichao hi, kaichao, | 
| This pull request has merge conflicts that must be resolved before it can be | 
| 
 I wish qr itself contains the logic of selecting qr or custom allreduce, since their interface is quite the same. My request is that we don't touch the cuda code path, so that people reading the code will not need to think about quick reduce. graph mode allreduce is necessary for some low-latency workloads where the batchsize is small. | 
| 
 Hi, @youkaichao | 
| closing in favor of #19744 | 
Uh oh!
There was an error while loading. Please reload this page.