Gemma capping #34282

ArthurZucker · 2024-10-21T13:03:53Z

What does this PR do?

Adds capping for gemma2, fixes #32877

src/transformers/models/gemma2/modeling_gemma2.py

Cyrilvallez

There are a lot of edge cases in imports which are very hard to deal with with the proposed approach. I think a simpler and more general approach is to do it the other way around:

dump all imports from the modular_xxx.py as is
dump all imports from the dependency files as is (this is currently the case)
Then, in the PostModularConverterCleaner, clean the imports (may even only clean the protected imports, and let ruff remove the other unused, non-protected imports)

This approach is much easier and versatile because in the Cleaner, we have access to the final source code, which is not the case when visiting the modular_xxx.py file (we only see the modular + the dependencies, and it is hard to check imports relative to only the part of the dependency files that we copy in the final file). Thus, it would ensure that all needed imports are present (i.e. we will never reach a weird edge-case when trying to match the imports as we do currently), and we can correctly remove imports that were wrongly added from the dependency files (i.e. see duplicate import in Glm due to Phi3 dependency).
This would greatly simplify the code complexity as well in my opinion.

utils/modular_model_converter.py

vasqu · 2024-10-22T02:27:03Z

src/transformers/models/gemma2/modular_gemma2.py


-        attn_output = torch.nn.functional.scaled_dot_product_attention(
+        attn_output = flex_attention(


Isn't it a bit misleading to use flex attn when we have attn_implementation="sdpa"? My concerns would be

People that previously used sdpa (forced or not) will suddenly have different torch requirements

Sdpa != Flexattn imo, it's a different API, name, and potentially slightly different behaviour

Are the slow tests still passing? We should ensure that it's still behaving the same ish in comparison to eager

Wdyt about making another attn implementation option for flex attn specifically? Not sure if this goes over the goal but control over the specific implementation is always appreciated.

Overall excited to see this, great work!

SDPA version of gemma never "worked" TBH!
I'll probably add a new class for flex attention, this was simpler for testing

ArthurZucker · 2024-10-22T06:36:00Z

Okay @Cyrilvallez good point regarding cleaning! Makes more sense indeed, will update to fix 😉

Cyrilvallez

Very nice approach! Much simpler IMO 🤗 just added some nits for clarity

utils/modular_model_converter.py

HuggingFaceDocBuilderDev · 2024-10-22T12:12:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…haviour (for our tests as well :))

Cyrilvallez

LGTM, I actually love it, I think it's much better to use different attention functions instead of different attention classes (clearer, less duplicated code, and we can easily switch between implementations even after the model has been instantiated)

src/transformers/models/gemma2/modular_gemma2.py

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

…-capping

…nto gemma-capping

* softcapping * soft cap before the mask * style * ... * super nit * update * fixes * update * small issue with modular * fix modular imports * update * fixup * simplify a hell lot * simplify cleaning imports * finish fixing * update our design * nits * use a deprecation cycle * updates * Fix modular (recursive deps need to always be computed after merges!) * push * fix * update * fix modular order * make fix-copies * updates * update * ? * don't compile for now * ? * fix some stuff * donc! * fix copies * update * fixup * ? * fix two tests * fix? * for now, don't use head info * eager when output attentoin and sdpa or flash as it's the simplest behaviour (for our tests as well :)) * fix-copies * revert sdpa check * Apply suggestions from code review Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> * rebase, fix-copies and push * add a slow integration test * update the test * fix left padding issue * fix test * remove duplicate scaling * quality * add a small test and make sure it works * 2b --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

kashif reviewed Oct 21, 2024

View reviewed changes

src/transformers/models/gemma2/modeling_gemma2.py Outdated Show resolved Hide resolved

ArthurZucker force-pushed the gemma-capping branch from b0ace40 to 9515b4d Compare October 21, 2024 13:08

ArthurZucker marked this pull request as ready for review October 21, 2024 15:41

ArthurZucker requested a review from Cyrilvallez October 21, 2024 15:42

ArthurZucker added 13 commits October 21, 2024 17:55

softcapping

85d549a

soft cap before the mask

eba5191

style

b9e4a54

...

514a839

super nit

7544feb

update

be1b8c3

fixes

0e0511f

update

03ccc22

small issue with modular

bdda724

fix modular imports

a2b6b12

update

9365c1b

fixup

2108ee3

simplify a hell lot

520120a

ArthurZucker force-pushed the gemma-capping branch from 5d7d66e to 520120a Compare October 21, 2024 15:55

Cyrilvallez reviewed Oct 21, 2024

View reviewed changes

utils/modular_model_converter.py Outdated Show resolved Hide resolved

utils/modular_model_converter.py Outdated Show resolved Hide resolved

utils/modular_model_converter.py Outdated Show resolved Hide resolved

utils/modular_model_converter.py Outdated Show resolved Hide resolved

vasqu reviewed Oct 22, 2024

View reviewed changes

simplify cleaning imports

314ed1f

Cyrilvallez reviewed Oct 22, 2024

View reviewed changes

finish fixing

8830473

ArthurZucker added 2 commits October 22, 2024 14:29

update our design

e4c19d7

nits

7922210

This was referenced Oct 23, 2024

Fix gemma2 with sdpa? #32403

Closed

Add SDPA support for T5 Style Models #30375

Closed

Tests: upgrade test_eager_matches_sdpa_generate #34386

Merged

ArthurZucker added 7 commits November 4, 2024 12:24

?

006e869

fix two tests

159c65a

fix?

56ea5b9

for now, don't use head info

4c3deb9

eager when output attentoin and sdpa or flash as it's the simplest be…

9e3609d

…haviour (for our tests as well :))

fix-copies

21edaed

revert sdpa check

b5d9819

ArthurZucker requested a review from Cyrilvallez November 5, 2024 09:37

Cyrilvallez approved these changes Nov 5, 2024

View reviewed changes

ArthurZucker and others added 4 commits November 6, 2024 08:23

Apply suggestions from code review

5a3dade

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

Merge branch 'main' of github.com:huggingface/transformers into gemma…

faf433b

…-capping

rebase, fix-copies and push

1da75e1

add a slow integration test

aca9120

Cyrilvallez mentioned this pull request Nov 19, 2024

Modular fix #34802

Merged

ArthurZucker and others added 9 commits November 19, 2024 13:01

update the test

8f1fc5e

fix left padding issue

5be3bab

fix test

3e5b87a

remove duplicate scaling

0513aff

quality

480aff8

Merge branch 'main' into gemma-capping

603fce8

add a small test and make sure it works

2a765d6

Merge branch 'gemma-capping' of github.com:huggingface/transformers i…

fb184be

…nto gemma-capping

2b

6aba68c

ArthurZucker merged commit 4bff54f into main Nov 19, 2024
27 checks passed

ArthurZucker deleted the gemma-capping branch November 19, 2024 12:52

ArthurZucker mentioned this pull request Nov 19, 2024

Flex attention + refactor #34809

Open

7 tasks

vasqu mentioned this pull request Nov 23, 2024

[GPTNeoX] Flex Attention + Refactor #34896

Merged

5 tasks

ArthurZucker mentioned this pull request Nov 28, 2024

[ Core] Refactor modeling code #34987

Draft

jla524 mentioned this pull request Dec 1, 2024

Add type hints for forward functions in Gemma2 #35034

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemma capping #34282

Gemma capping #34282

ArthurZucker commented Oct 21, 2024 •

edited

Loading

Cyrilvallez left a comment

vasqu Oct 22, 2024

ArthurZucker Oct 22, 2024

ArthurZucker commented Oct 22, 2024

Cyrilvallez left a comment

HuggingFaceDocBuilderDev commented Oct 22, 2024

Cyrilvallez left a comment


		attn_output = torch.nn.functional.scaled_dot_product_attention(
		attn_output = flex_attention(

Gemma capping #34282

Gemma capping #34282

Conversation

ArthurZucker commented Oct 21, 2024 • edited Loading

What does this PR do?

Cyrilvallez left a comment

Choose a reason for hiding this comment

vasqu Oct 22, 2024

Choose a reason for hiding this comment

ArthurZucker Oct 22, 2024

Choose a reason for hiding this comment

ArthurZucker commented Oct 22, 2024

Cyrilvallez left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Oct 22, 2024

Cyrilvallez left a comment

Choose a reason for hiding this comment

ArthurZucker commented Oct 21, 2024 •

edited

Loading