Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
124 commits
Select commit Hold shift + click to select a range
7433c44
just update 2 files
ArthurZucker Jun 30, 2025
37b4ef0
update other models as well just making fix-copies
ArthurZucker Jun 30, 2025
7f113b4
also add the changes needed to modeling utils
ArthurZucker Jun 30, 2025
abf9d39
put this on the pretrained model instead
ArthurZucker Jun 30, 2025
eb6747b
nits and fixes
ArthurZucker Jun 30, 2025
0f1d7e0
update generic, fix to use config value
ArthurZucker Jun 30, 2025
e437edd
update other modelings
ArthurZucker Jun 30, 2025
96aabd7
use transformers kwargs instead
ArthurZucker Jun 30, 2025
63df15b
update
ArthurZucker Jun 30, 2025
98f402c
update
ArthurZucker Jun 30, 2025
a7e0ce2
update other models
ArthurZucker Jun 30, 2025
c9bb39e
update
ArthurZucker Jun 30, 2025
cb5da53
updates
ArthurZucker Jun 30, 2025
0dc0826
update
ArthurZucker Jun 30, 2025
fca73ad
update
ArthurZucker Jun 30, 2025
98739ba
update
ArthurZucker Jun 30, 2025
124cd82
fix
ArthurZucker Jun 30, 2025
4a14287
finally
ArthurZucker Jun 30, 2025
ea87eb7
very small nits
ArthurZucker Jun 30, 2025
8c66f4d
this fixes more tests
ArthurZucker Jun 30, 2025
3caf7d7
fix other models as well!
ArthurZucker Jun 30, 2025
113219b
update modularqwen2
ArthurZucker Jun 30, 2025
e7705c9
update models based on qwen2
ArthurZucker Jun 30, 2025
a74974d
update
ArthurZucker Jun 30, 2025
3fb6b71
update
ArthurZucker Jun 30, 2025
7266aaf
remove the **flash stuff in favor of noraml kwargs
ArthurZucker Jun 30, 2025
c7d195f
update
ArthurZucker Jun 30, 2025
e63ef64
propagate gemma?
ArthurZucker Jun 30, 2025
1303470
remove output attentions
ArthurZucker Jun 30, 2025
063e510
propagate
ArthurZucker Jun 30, 2025
8c96926
Merge branch 'main' of github.com:huggingface/transformers into clean…
ArthurZucker Jul 1, 2025
01d4da8
support cross attention edge case
ArthurZucker Jul 1, 2025
780141c
same
ArthurZucker Jul 1, 2025
3c0c56b
test this
ArthurZucker Jul 1, 2025
7a0512a
fixes
ArthurZucker Jul 1, 2025
a13a98c
more fix
ArthurZucker Jul 1, 2025
15a8ff4
update
ArthurZucker Jul 1, 2025
2242373
update
ArthurZucker Jul 1, 2025
2748b99
update
ArthurZucker Jul 1, 2025
da50ccc
fix conflicts
ArthurZucker Jul 1, 2025
209d502
update
ArthurZucker Jul 1, 2025
10fb88a
fix emu3
ArthurZucker Jul 1, 2025
00afce9
fix emu3
ArthurZucker Jul 1, 2025
3ac6c52
move the fix a bit
ArthurZucker Jul 1, 2025
0b119ff
quel enfer
ArthurZucker Jul 1, 2025
f7a1f0d
some fixes, loss_kwargs should never had been
ArthurZucker Jul 1, 2025
6a132a0
finish fixing gemma3n
ArthurZucker Jul 1, 2025
9fa5f26
fix small lm3
ArthurZucker Jul 1, 2025
aaae861
fix another one
ArthurZucker Jul 1, 2025
5e5ae84
fix csm now
ArthurZucker Jul 1, 2025
075bd0c
fux csm and mistral
ArthurZucker Jul 1, 2025
d04c2b1
fix mistral now
ArthurZucker Jul 1, 2025
5065b9a
small fixes
ArthurZucker Jul 1, 2025
6a5f410
fix janusss
ArthurZucker Jul 1, 2025
4834aec
only for some models
ArthurZucker Jul 1, 2025
d8ee27e
fixup
ArthurZucker Jul 1, 2025
e297344
phix phi3
ArthurZucker Jul 1, 2025
0c9f6de
more fixes?
ArthurZucker Jul 1, 2025
501aead
dose this fix it?
ArthurZucker Jul 1, 2025
253307a
update
ArthurZucker Jul 1, 2025
a267d8d
holy shit it was just graph breaks
ArthurZucker Jul 2, 2025
17cf542
protect torch
ArthurZucker Jul 2, 2025
c4d43c5
updates
ArthurZucker Jul 3, 2025
4fc83fa
fix samhq?
ArthurZucker Jul 3, 2025
499ae87
fix moonshine
ArthurZucker Jul 3, 2025
b3c8641
more moonshine fixes, 3 failures left!
ArthurZucker Jul 3, 2025
b81df9b
nits
ArthurZucker Jul 3, 2025
cfe62b6
generic needs to support more
ArthurZucker Jul 3, 2025
6eb5e53
more fixes to moonshine!
ArthurZucker Jul 3, 2025
a9690f4
fix cross attention outputs!
ArthurZucker Jul 3, 2025
d462a8e
fix csm!
ArthurZucker Jul 3, 2025
0f3c368
nits
ArthurZucker Jul 3, 2025
3cba8ac
fix stupid kosmos2
ArthurZucker Jul 3, 2025
5af5bcc
current updates
ArthurZucker Jul 3, 2025
9968c85
fixes
ArthurZucker Jul 3, 2025
fbfaf04
use output recorder?
ArthurZucker Jul 3, 2025
1f559c6
nicer!
ArthurZucker Jul 3, 2025
cd63172
a little bit of magic
ArthurZucker Jul 3, 2025
cf2e98c
update
ArthurZucker Jul 3, 2025
c278e1c
fix protect
ArthurZucker Jul 3, 2025
e3c82cb
fix
ArthurZucker Jul 3, 2025
c5592be
small fixes
ArthurZucker Jul 3, 2025
f6190cb
protect import
ArthurZucker Jul 3, 2025
d0be331
fix a bunch of more models
ArthurZucker Jul 3, 2025
22f0eae
fix fixups
ArthurZucker Jul 3, 2025
422122d
fix some of the last ones
ArthurZucker Jul 3, 2025
feba9a0
nit
ArthurZucker Jul 3, 2025
9a3708a
partly fix phi
ArthurZucker Jul 3, 2025
7a0f14a
update
ArthurZucker Jul 3, 2025
c4f314b
fix import path
ArthurZucker Jul 3, 2025
c6c5efb
Merge branch 'main' of github.com:huggingface/transformers into clean…
ArthurZucker Jul 3, 2025
5f3722c
make something that is fullgraph compatible just to be sure
ArthurZucker Jul 4, 2025
7781368
typing was wrong on llama so the rest was wrong as well
ArthurZucker Jul 4, 2025
c949308
fucking ugly but at least it is still exportable
ArthurZucker Jul 4, 2025
eaa7392
syle
ArthurZucker Jul 4, 2025
4b6a535
supposed to fix moonshine, it still breaks
ArthurZucker Jul 4, 2025
9976ed8
fix some default
ArthurZucker Jul 4, 2025
6d72398
fix the last bits of sam
ArthurZucker Jul 4, 2025
ddea683
update samhq
ArthurZucker Jul 4, 2025
f021967
Merge branch 'main' of github.com:huggingface/transformers into clean…
ArthurZucker Jul 4, 2025
2e296b5
more fixes to am hq
ArthurZucker Jul 4, 2025
8aaa10e
nit
ArthurZucker Jul 4, 2025
b8d6666
fix all output+hidden states and output_attentions!
ArthurZucker Jul 4, 2025
cb16ef8
fix?
ArthurZucker Jul 4, 2025
faf2a42
fix diffllama
ArthurZucker Jul 4, 2025
6c83dcc
updates to fix initialization on the sam pips
ArthurZucker Jul 4, 2025
bd56729
ups there was a bug
ArthurZucker Jul 4, 2025
4213b18
fix the last sam hq test
ArthurZucker Jul 4, 2025
df76604
fix gotocr
ArthurZucker Jul 4, 2025
a50382b
fix gotocr2!
ArthurZucker Jul 4, 2025
73d7450
fixes
ArthurZucker Jul 4, 2025
59ba6fa
skip stupid tests
ArthurZucker Jul 4, 2025
e9a3e47
there was one left :)
ArthurZucker Jul 4, 2025
141a01f
fixup
ArthurZucker Jul 4, 2025
cb7a881
fix fix copies issues with this test file
ArthurZucker Jul 4, 2025
90e36aa
fix copies for sam_hq
ArthurZucker Jul 4, 2025
459062f
rm some comments
ArthurZucker Jul 4, 2025
2614116
skip 2 more failing tests
ArthurZucker Jul 4, 2025
f5695c0
fix
ArthurZucker Jul 4, 2025
da4875a
fix everything
ArthurZucker Jul 4, 2025
44da848
Apply suggestions from code review
ArthurZucker Jul 4, 2025
c209f7e
add more doc!
ArthurZucker Jul 4, 2025
4ae2049
fix public init
ArthurZucker Jul 4, 2025
7548ec2
fix modular qwen3
ArthurZucker Jul 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
446 changes: 0 additions & 446 deletions examples/modular-transformers/modeling_dummy.py

This file was deleted.

446 changes: 0 additions & 446 deletions examples/modular-transformers/modeling_multimodal1.py

This file was deleted.

15 changes: 0 additions & 15 deletions examples/modular-transformers/modular_dummy.py

This file was deleted.

6 changes: 0 additions & 6 deletions examples/modular-transformers/modular_multimodal1.py

This file was deleted.

49 changes: 47 additions & 2 deletions src/transformers/modeling_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@
logging,
strtobool,
)
from .utils.generic import GeneralInterface
from .utils.generic import _CAN_RECORD_REGISTRY, GeneralInterface, OutputRecorder
from .utils.hub import create_and_tag_model_card, get_checkpoint_shard_files
from .utils.import_utils import (
ENV_VARS_TRUE_VALUES,
Expand Down Expand Up @@ -1925,7 +1925,7 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin, PushToHubMixin, PeftAdapterMi
- **is_parallelizable** (`bool`) -- A flag indicating whether this model supports model parallelization.
- **main_input_name** (`str`) -- The name of the principal input to the model (often `input_ids` for NLP
models, `pixel_values` for vision models and `input_values` for speech models).
"""
- **can_record_outputs** (dict):"""

config_class = None
base_model_prefix = ""
Expand Down Expand Up @@ -2006,6 +2006,50 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin, PushToHubMixin, PeftAdapterMi
# In practice, it means that they support attention interface functions, fully pass the kwargs
# through all modules up to the Attention layer, can slice logits with Tensor, and have a default TP plan
_supports_attention_backend = False
_can_record_outputs = None

@property
@torch._dynamo.allow_in_graph
def can_record_outputs(self) -> dict[str, OutputRecorder]:
"""
Maps output names (e.g., "attentions", "hidden_states")
to either:
- A module class (e.g., `LlamaDecoderLayer`), using default index conventions:
* index=0 for "hidden_states"
* index=1 for "attentions"
- Or an `OutputRecorder(...)` with `target_class`, optional `index`, and `layer_name`.

Examples:
These two are equivalent:

```python
_can_record_outputs = {
"attentions": LlamaAttention,
"hidden_states": LlamaDecoderLayer
}

_can_record_outputs = {
"attentions": OutputRecorder(LlamaAttention, index=1),
"hidden_states": OutputRecorder(LlamaDecoderLayer, index=0)
}
```

This means you can record outputs from the same class, by specifying a layer name. Before
collecting outputs, we check that they come from this layer.

If you have cross attention that come from `LlamaAttention` and self attention that also
come from `LlamaAttention` but from `self_attn` you can do this:

```python
class LlamaModel(PreTrainedModel):
_can_record_outputs = {
"attentions": OutputRecorder(LlamaAttention, index=1, layer-name="self_attn"),
"cross_attentions": OutputRecorder(LlamaAttention, index=1, layer_name="cross_attn")
}

```
"""
return self._can_record_outputs or {}

@property
def dummy_inputs(self) -> dict[str, torch.Tensor]:
Expand Down Expand Up @@ -2056,6 +2100,7 @@ def __init__(self, config: PretrainedConfig, *inputs, **kwargs):
self._keep_in_fp32_modules_strict = copy.copy(self.__class__._keep_in_fp32_modules_strict)

self._no_split_modules = self._no_split_modules or []
_CAN_RECORD_REGISTRY[self] = self._can_record_outputs # added for executorch support only

def post_init(self):
"""
Expand Down
Loading