🚨All attention refactor🚨 #35235

ArthurZucker · 2024-12-12T13:39:35Z

What does this PR do?

Todo in this PR:

ArthurZucker · 2024-12-13T18:26:41Z

src/transformers/modeling_utils.py

+)
+
+
+class GradientCheckpointLayer(torch.nn.Module):


This should help with kwargs as well

poedator · 2025-01-14T17:16:05Z

My friends use a GPT2Model in production and want to compile it with StaticCache. With the maintainers blessing, I would try to create a PR with DynamicCache / StaticCache support in GPT2Model.
I am quite familiar with Cache class, I already coded some and made the DynamicCache work.

Please let me know if there are any hidden obstacles in Cache implementation for GPT2? Which tests to run or add?
@ArthurZucker

Rocketknight1 · 2025-01-15T13:20:05Z

cc @gante to that question!

gante · 2025-01-15T13:52:58Z

I've chatted to @poedator offline -- I couldn't think of any obstacle in particular, and suggested a) to ensure we leave a deprecation warning regarding the old cache format b) use RUN_SLOW=1 py.test tests/models/gpt2/test_modeling_gpt2.py as a correctness check (gpt2 is fairly well tested, especially wrt text generation)

poedator · 2025-01-18T02:07:43Z

It looks like test_flash_attn_2_from_config is broken - it expects attention layer to have flashattention in its name,
if "FlashAttention" in module.__class__.__name__:...
but after this refactoring, the attention classes are named differently.

ref

transformers/tests/test_modeling_common.py

Line 4641 in 5fa3534

if "FlashAttention" in module.__class__.__name__:

please fix or suspend the test.
@ArthurZucker

ArthurZucker · 2025-01-21T14:07:53Z

indeed gimme a min!

@Isotr0py

# Adds support for `transformers` as a backend Following huggingface/transformers#35235, a bunch of models should already be supported, we are ramping up support for more models. Thanks @Isotr0py for the TP support, and @hmellor for his help as well! This includes: - `trust_remote_code=True` support: any model on the hub, if it implements attention the correct way can be natively supported!! - tensor parallel support --------- Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <41363108+Isotr0py@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* update * update * update * dev-ci * more changes * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Breaking change in transformers is huggingface/transformers#35235. Need to make changes to unpin nv-a6000 workflow. Signed-off-by: siqi <siqi@tecorigin.com>

@Isotr0py

# Adds support for `transformers` as a backend Following huggingface/transformers#35235, a bunch of models should already be supported, we are ramping up support for more models. Thanks @Isotr0py for the TP support, and @hmellor for his help as well! This includes: - `trust_remote_code=True` support: any model on the hub, if it implements attention the correct way can be natively supported!! - tensor parallel support --------- Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <41363108+Isotr0py@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Felix Marty <felmarty@amd.com>

Breaking change in transformers is huggingface/transformers#35235. Need to make changes to unpin nv-a6000 workflow.

@Isotr0py

# Adds support for `transformers` as a backend Following huggingface/transformers#35235, a bunch of models should already be supported, we are ramping up support for more models. Thanks @Isotr0py for the TP support, and @hmellor for his help as well! This includes: - `trust_remote_code=True` support: any model on the hub, if it implements attention the correct way can be natively supported!! - tensor parallel support --------- Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <41363108+Isotr0py@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

…ngface#36024) * update * update * update * dev-ci * more changes * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Breaking change in transformers is huggingface/transformers#35235. Need to make changes to unpin nv-a6000 workflow. Signed-off-by: gyou2021 <ganmei.you@intel.com>

@Isotr0py

# Adds support for `transformers` as a backend Following huggingface/transformers#35235, a bunch of models should already be supported, we are ramping up support for more models. Thanks @Isotr0py for the TP support, and @hmellor for his help as well! This includes: - `trust_remote_code=True` support: any model on the hub, if it implements attention the correct way can be natively supported!! - tensor parallel support --------- Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <41363108+Isotr0py@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

@Isotr0py

# Adds support for `transformers` as a backend Following huggingface/transformers#35235, a bunch of models should already be supported, we are ramping up support for more models. Thanks @Isotr0py for the TP support, and @hmellor for his help as well! This includes: - `trust_remote_code=True` support: any model on the hub, if it implements attention the correct way can be natively supported!! - tensor parallel support --------- Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <41363108+Isotr0py@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

ArthurZucker force-pushed the all-attention-refactor branch from 0dc9253 to d1aa9ce Compare December 12, 2024 13:49

ArthurZucker commented Dec 13, 2024

View reviewed changes

ArthurZucker mentioned this pull request Dec 16, 2024

Add ModernBERT to Transformers #35158

Merged

ArthurZucker and others added 17 commits December 16, 2024 10:14

refactor LlamaAttention

79cb53c

minimal changes

4bb485b

fix llama

f370907

update

d3ef539

modular gemmas

45eac58

modular nits

e52af49

modular updates

5ed37ae

nits

38cafc1

simplify

a862eac

gpt2

5639b81

more modualr and fixes

452d8ed

granite

81a0b66

modular modular modular

bc72c3f

nits

48caa89

update

df68dd0

qwen2 + starcoder2

0325dc4

mostly gemma2

ecd814b

Cyrilvallez force-pushed the all-attention-refactor branch from 8b56823 to ecd814b Compare December 16, 2024 11:28

Cyrilvallez and others added 9 commits December 16, 2024 12:39

Update image_processing_auto.py

f5fc638

fix

5e56d9c

Update modular_starcoder2.py

598b7bb

fix

0f565fb

remove all copied from attentions

c9ac84d

remove gcv

d189fe7

make fix-copies

9c83d96

oups

138368e

oups2.0

7225a4f

poedator mentioned this pull request Jan 18, 2025

GPT2Model StaticCache support #35761

Open

imangohari1 mentioned this pull request Jan 31, 2025

fea(): Applied changes in HF #35235 huggingface/optimum-habana#1738

Merged

3 tasks

Miking98 mentioned this pull request Feb 2, 2025

Model loading error with Llama som-shahlab/long_context_clues#14

Closed

ydshieh mentioned this pull request Feb 4, 2025

Update tests regarding attention types after #35235 #36024

Merged

ydshieh added a commit that referenced this pull request Feb 4, 2025

Update tests regarding attention types after #35235 (#36024)

fe52679

* update * update * update * dev-ci * more changes * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Rocketknight1 mentioned this pull request Feb 4, 2025

Transformers are untraceable with FX after 4.38 #36022

Closed

4 tasks

wizeng23 mentioned this pull request Feb 4, 2025

Update transformers version to 4.48 oumi-ai/oumi#1372

Merged

4 tasks

This was referenced Feb 5, 2025

[Bug] Llama-3.2-11B-Vision-Instruct (mllama) FSDP fails if grad checkpointing is enabled oumi-ai/oumi#1376

Closed

Llama-3.2-11B-Vision-Instruct (mllama) FSDP fails if grad checkpointing is enabled #36040

Open

ydshieh mentioned this pull request Feb 5, 2025

Remove type hint Unpack[FlashAttentionKwargs] #36049

Open

traincheck-team pushed a commit to traincheck-team/DeepSpeed that referenced this pull request Feb 9, 2025

Pin nv-a6000 workflow (deepspeedai#6938)

6f2bbf7

Breaking change in transformers is huggingface/transformers#35235. Need to make changes to unpin nv-a6000 workflow.

This was referenced Feb 12, 2025

4.48.1 breaks sliding window in eager attention for qwen2 #35924

Closed

flash-attention-3 #33522

Draft

hlky mentioned this pull request Feb 14, 2025

Flash Attention v3 #36190

Draft

kevinstephano mentioned this pull request Feb 20, 2025

Support packing multiple sequences with Flash Attention without cross-contamination Lightning-AI/lightning-thunder#1758

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚨All attention refactor🚨 #35235

🚨All attention refactor🚨 #35235

ArthurZucker commented Dec 12, 2024 •

edited by Cyrilvallez

Loading

ArthurZucker Dec 13, 2024

poedator commented Jan 14, 2025 •

edited

Loading

Rocketknight1 commented Jan 15, 2025

gante commented Jan 15, 2025

poedator commented Jan 18, 2025

ArthurZucker commented Jan 21, 2025

		)


		class GradientCheckpointLayer(torch.nn.Module):

🚨All attention refactor🚨 #35235

🚨All attention refactor🚨 #35235

Conversation

ArthurZucker commented Dec 12, 2024 • edited by Cyrilvallez Loading

What does this PR do?

ArthurZucker Dec 13, 2024

Choose a reason for hiding this comment

poedator commented Jan 14, 2025 • edited Loading

Rocketknight1 commented Jan 15, 2025

gante commented Jan 15, 2025

poedator commented Jan 18, 2025

ArthurZucker commented Jan 21, 2025

ArthurZucker commented Dec 12, 2024 •

edited by Cyrilvallez

Loading

poedator commented Jan 14, 2025 •

edited

Loading