Add `MultiQueryAttention` & `GroupedQueryAttention` #18402

awsaf49 · 2023-09-18T03:25:09Z

MultiQueryAttention (MQA) [Used in Falcon LLM] and GroupedQueryAttention (GQA) [Used in Llama 2 LLM] are alternatives to MultiHeadAttention (MHA) but they are a lot faster. Here's the speed comparison in my naive implementation,

===================================
          TensorFlow - GPU
===================================
Attention                 : 0.004 sec
Multi Head Attention      : 0.035 sec
Multi Query Attention     : 0.018 sec ( 50.17% faster than MHA )
Grouped Query Attention   : 0.030 sec ( 15.02% faster than MHA )

I think it would be nice to have these layers in keras-core.

Reference Papers:

MQA: Paper
GQA: Paper

The text was updated successfully, but these errors were encountered:

mattdangerw · 2023-09-19T02:30:27Z

Probably easiest to just write GroupedQueryAttention, and consider MultiQueryAttention a special case of it. We can expose MultiQueryAttention, as subclass of GroupedQueryAttention that sets a single init value num_key_value_heads=1 on the base class. Somewhat similar to our AdamW class with weight_decay.

This is also some discussion in #18423 for more context. I definitely think adding support here makes sense. And probably clearer to have this standalone from MultiHeadAttention instead of just throwing more parameters at that (already quite complex) class.

Thanks for filing!

awsaf49 · 2023-09-19T03:09:30Z

We can expose MultiQueryAttention, as subclass of GroupedQueryAttention that sets a single init value num_key_value_heads=1 on the base class.

I was thinking the same thing.

awsaf49 · 2023-09-19T03:10:22Z

Should I open a PR for this??

mattdangerw · 2023-09-19T21:20:53Z

Sounds good! Thank you!

google-ml-butler · 2023-10-22T13:20:23Z

Are you satisfied with the resolution of your issue?
Yes
No

SuryanarayanaY added the type:performance label Sep 19, 2023

mattdangerw assigned awsaf49 Sep 19, 2023

fchollet transferred this issue from keras-team/keras-core Sep 22, 2023

awsaf49 mentioned this issue Sep 25, 2023

Add GroupedQueryAttention layer #18488

Merged

awsaf49 closed this as completed Oct 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `MultiQueryAttention` & `GroupedQueryAttention` #18402

Add `MultiQueryAttention` & `GroupedQueryAttention` #18402

awsaf49 commented Sep 18, 2023 •

edited

Loading

mattdangerw commented Sep 19, 2023

awsaf49 commented Sep 19, 2023

awsaf49 commented Sep 19, 2023

mattdangerw commented Sep 19, 2023

google-ml-butler bot commented Oct 22, 2023

Add MultiQueryAttention & GroupedQueryAttention #18402

Add MultiQueryAttention & GroupedQueryAttention #18402

Comments

awsaf49 commented Sep 18, 2023 • edited Loading

mattdangerw commented Sep 19, 2023

awsaf49 commented Sep 19, 2023

awsaf49 commented Sep 19, 2023

mattdangerw commented Sep 19, 2023

google-ml-butler bot commented Oct 22, 2023

Add `MultiQueryAttention` & `GroupedQueryAttention` #18402

Add `MultiQueryAttention` & `GroupedQueryAttention` #18402

awsaf49 commented Sep 18, 2023 •

edited

Loading