adds softmax_scale to flash attention #209

codestar12 · 2023-03-02T16:56:47Z

extends the option to scale softmax to flash attention in addition to triton flash attention

vchiley

lgtm

Hopefully you resolve the GPU count issue described offline

bcui19

LGTM! Thanks for adding this!

bmosaicml · 2023-03-02T18:22:42Z

examples/llm/src/models/layers/attention.py

        self.d_model = cfg.d_model
        self.n_heads = cfg.n_heads

-        if self.attn_qk_ln or self.clip_qkv:
+        if self.attn_qk_ln or self.clip_qkv or self.softmax_scale:
            self.W_qkv = nn.Linear(self.d_model,
                                   3 * self.d_model,
                                   bias=True,
                                   device=device)
            self.inner_attn = FlashAttention(attention_dropout=cfg.attn_pdrop,


I see we can now scale the attention by any custom value...what is the best value and how should we keep track of it?

1/srqt(d/n_heads) is standard.
1/(d/n_heads) is the recommended muP mod.
Since its part of the model config, it'll get dumped into wandb if we ever need to check what we used.

adds softmax_scale to flash attention

b83635a

codestar12 requested review from bcui19, dakinggg, abhi-mosaic and bmosaicml March 2, 2023 16:56

vchiley approved these changes Mar 2, 2023

View reviewed changes

bcui19 approved these changes Mar 2, 2023

View reviewed changes

codestar12 merged commit 0a800ab into mosaicml:main Mar 2, 2023

bmosaicml reviewed Mar 2, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adds softmax_scale to flash attention #209

adds softmax_scale to flash attention #209

codestar12 commented Mar 2, 2023

vchiley left a comment

bcui19 left a comment

bmosaicml Mar 2, 2023

vchiley Mar 2, 2023

adds softmax_scale to flash attention #209

adds softmax_scale to flash attention #209

Conversation

codestar12 commented Mar 2, 2023

vchiley left a comment

Choose a reason for hiding this comment

bcui19 left a comment

Choose a reason for hiding this comment

bmosaicml Mar 2, 2023

Choose a reason for hiding this comment

vchiley Mar 2, 2023

Choose a reason for hiding this comment