- 
                Notifications
    
You must be signed in to change notification settings  - Fork 13.5k
 
Vulkan: MMVQ Integer Dot K-Quant and MUL_MAT_ID support #16900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
d5192bf    to
    d2f8f00      
    Compare
  
    
          AMD Radeon Pro VII
 AMD Radeon RX 6800 XT
 Intel A770
 RTX 3090
  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only did a quick read through. I'll do some perf testing soon.
| 
           As usual, I appear to have caused an llvmpipe issue. I'll look into it.  | 
    
| 
           Some initial perf results: I reran some of the models with the biggest deltas. Most seem to be noise, except the improvement for gpt-oss MXFP4 is real:  | 
    
Add k-quant mul_mat_vec support, and enable MUL_MAT_ID integer dot vector path.
Tuning this is quite difficult. I've included an attempt, but I'm not done. I'll add performance numbers later.
Q3_K and Q6_K currently don't work well at all, I'm still trying to figure out why.