Deprecate attention patching for llama #1047

dakinggg · 2024-03-21T17:10:19Z

Now that flash attention 2 is integrated into transformers, we don't need to monkeypatch llama.

Also changes our versioned deprecation warning to a userwarning so that it shows up in the logs.

See llama-patch-triton-after-6-MYjOo9 for a run that sets attention patch type, and changes to flash attention 2 and emits the warning.

irenedea

thanks!

dakinggg added 7 commits March 21, 2024 10:09

deprecate patch type

acdf96d

pc

0fb4e45

fix

09fcdfb

prints

1a5a7c7

unfilter dep warnings

93696dd

back to user warning

c06742b

remove prints

bdcb3b4

dakinggg marked this pull request as ready for review March 21, 2024 18:19

dakinggg requested a review from irenedea March 21, 2024 18:20

fix

4ea417f

irenedea approved these changes Mar 21, 2024

View reviewed changes

dakinggg enabled auto-merge (squash) March 21, 2024 18:33

dakinggg merged commit 3348b59 into mosaicml:main Mar 21, 2024
10 checks passed

jjanezhang pushed a commit that referenced this pull request Mar 25, 2024

Deprecate attention patching for llama (#1047)

e07f1cf

KuuCi pushed a commit that referenced this pull request Apr 18, 2024

Deprecate attention patching for llama (#1047)

9dfe080

dakinggg deleted the remove-triton-patch branch June 22, 2024 20:47

Provide feedback