Hard error when ignoring tensors. (#27484) #29906

Narsil · 2024-03-27T15:02:41Z

[WIP] Hard error when ignoring tensors.
Better selection/error when saving a checkpoint.

Find all names we should normally drop (those are in the transformers
config)
Find all disjoint tensors (for those we can safely trigger a copy to
get rid of the sharing before saving)
Clone those disjoint tensors getting rid of the issue
Find all identical names (those should be declared in the config
but we try to find them all anyway.)
For all identical names:
- If they are in the config, just ignore them everything is fine
- If they are not, warn about them.
For all remainder tensors which are shared yet neither identical NOR
disjoint. raise a hard error.

Adding a failing test on main that passes here.
We don't need to keep the subfolder logic in this test.
Apply suggestions from code review

Co-authored-by: Arthur 48595927+ArthurZucker@users.noreply.github.com

Should fix #29903, fixes #28293

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

* [WIP] Hard error when ignoring tensors. * Better selection/error when saving a checkpoint. - Find all names we should normally drop (those are in the transformers config) - Find all disjoint tensors (for those we can safely trigger a copy to get rid of the sharing before saving) - Clone those disjoint tensors getting rid of the issue - Find all identical names (those should be declared in the config but we try to find them all anyway.) - For all identical names: - If they are in the config, just ignore them everything is fine - If they are not, warn about them. - For all remainder tensors which are shared yet neither identical NOR disjoint. raise a hard error. * Adding a failing test on `main` that passes here. * We don't need to keep the subfolder logic in this test. * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

HuggingFaceDocBuilderDev · 2024-03-27T15:39:38Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Narsil · 2024-03-27T16:57:47Z

src/transformers/models/bert/modeling_bert.py

@@ -1128,7 +1127,7 @@ def forward(
    """Bert Model with a `language modeling` head on top for CLM fine-tuning.""", BERT_START_DOCSTRING
 )
 class BertLMHeadModel(BertPreTrainedModel):
-    _tied_weights_keys = ["predictions.decoder.bias", "cls.predictions.decoder.weight"]
+    _tied_weights_keys = ["cls.predictions.decoder.bias", "cls.predictions.decoder.weight"]


Seems like this was a bug, predictions does not exist onthis model, only cls.predictions.

Narsil · 2024-03-27T17:06:20Z

src/transformers/modeling_utils.py

@@ -1667,15 +1742,19 @@ def tie_encoder_to_decoder_recursively(
            module_name: str,
            uninitialized_encoder_weights: List[str],
            depth=0,
+            total_decoder_name="",


This is important since the module_name is a generic name, and encoder_name and decoder_name can differ ( when there's a ignored cross_attn layer in the tying)

ArthurZucker

As talked offline, now that the name of the encoder is passed, LGTM.

.github/workflows/self-scheduled.yml

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

amyeroberts

Very nice handling of something quite tricky - thanks for adding tests! ❤️

Only concern is that we are still vulnerable to _tied_weights being modified after instance creation, but I don't really see an easy way to prevent this other than giving warnings here.

amyeroberts · 2024-04-02T13:20:08Z

src/transformers/modeling_utils.py

+                        total_encoder_name=f"{total_encoder_name}.{encoder_name}",
+                        total_decoder_name=f"{total_decoder_name}.{decoder_name}",


Here - do we want to account for when the string is empty?

Suggested change

total_encoder_name=f"{total_encoder_name}.{encoder_name}",

total_decoder_name=f"{total_decoder_name}.{decoder_name}",

total_encoder_name=f"{total_encoder_name}.{encoder_name}" if total_encoder_name else encoder_name,

total_decoder_name=f"{total_decoder_name}.{decoder_name}" if total_decoder_name else decoder_name,

amyeroberts · 2024-04-02T13:30:23Z

src/transformers/modeling_utils.py

        ):
            assert isinstance(decoder_pointer, nn.Module) and isinstance(
                encoder_pointer, nn.Module
            ), f"{decoder_pointer} and {encoder_pointer} have to be of type nn.Module"
            if hasattr(decoder_pointer, "weight"):
                assert hasattr(encoder_pointer, "weight")
                encoder_pointer.weight = decoder_pointer.weight
+                tied_weights.append(f"{base_encoder_name}{total_encoder_name}.weight")


(not sure at all) but should there be a dot here between the names?

Suggested change

tied_weights.append(f"{base_encoder_name}{total_encoder_name}.weight")

tied_weights.append(f"{base_encoder_name}.{total_encoder_name}.weight")

No, the encode already has the leading dot from the way the recursive calls are made.

Forcing it here means adding extra logic in the recursive descent.
I can do it to make the code more readable (but in general in such complex code I don't like adding too many ifs especially on dependant varibles in recursive calls)

Agreed - I'd rather no if statements

amyeroberts · 2024-04-02T13:30:38Z

src/transformers/modeling_utils.py

                if hasattr(decoder_pointer, "bias"):
                    assert hasattr(encoder_pointer, "bias")
+                    tied_weights.append(f"{base_encoder_name}{total_encoder_name}.bias")


and possibly here?

Suggested change

tied_weights.append(f"{base_encoder_name}{total_encoder_name}.bias")

tied_weights.append(f"{base_encoder_name}.{total_encoder_name}.bias")

Narsil · 2024-04-02T14:16:10Z

Only concern is that we are still vulnerable to _tied_weights being modified after instance creation, but I don't really see an easy way to prevent this other than giving warnings here.

Those are private therefore it should be OK. You can make it immutable through @Property but that seems a bit too much at this point.

* Hard error when ignoring tensors. (#27484) * [WIP] Hard error when ignoring tensors. * Better selection/error when saving a checkpoint. - Find all names we should normally drop (those are in the transformers config) - Find all disjoint tensors (for those we can safely trigger a copy to get rid of the sharing before saving) - Clone those disjoint tensors getting rid of the issue - Find all identical names (those should be declared in the config but we try to find them all anyway.) - For all identical names: - If they are in the config, just ignore them everything is fine - If they are not, warn about them. - For all remainder tensors which are shared yet neither identical NOR disjoint. raise a hard error. * Adding a failing test on `main` that passes here. * We don't need to keep the subfolder logic in this test. * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add small tests. * Dead variable. * Fixup. * Fixing tied_Weights_keys on generic models. * Fixup + T5 encoder/decoder tying (with different layers) * Code quality. * Dynamic member. * trigger * Fixing encoder name for other types of encoder/decoder combos. * Fix scoping. * Update .github/workflows/self-scheduled.yml Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Fixing the tied_weights after the call. --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Narsil and others added 4 commits March 27, 2024 15:02

Add small tests.

528ad72

Dead variable.

5c4aaf5

Fixup.

270e6ea

Narsil requested a review from ArthurZucker March 27, 2024 15:24

Narsil added 2 commits March 27, 2024 15:51

Fixing tied_Weights_keys on generic models.

7cd1593

Fixup + T5 encoder/decoder tying (with different layers)

d561bef

Narsil commented Mar 27, 2024

View reviewed changes

Code quality.

e5bec8f

Narsil commented Mar 27, 2024

View reviewed changes

Dynamic member.

71f1f67

Narsil requested a review from ydshieh March 27, 2024 18:10

ydshieh and others added 2 commits March 28, 2024 18:01

trigger

d915cc5

Fixing encoder name for other types of encoder/decoder combos.

e7a3186

ArthurZucker approved these changes Mar 29, 2024

View reviewed changes

.github/workflows/self-scheduled.yml Outdated Show resolved Hide resolved

Narsil and others added 3 commits March 29, 2024 22:44

Fix scoping.

455b478

Update .github/workflows/self-scheduled.yml

a053ec7

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Fixing the tied_weights after the call.

7974f97

ArthurZucker mentioned this pull request Mar 30, 2024

Tensors removed on GPU when saving model #29903

Closed

4 tasks

amyeroberts approved these changes Apr 2, 2024

View reviewed changes

Narsil merged commit 9b0a8ea into main Apr 2, 2024
21 checks passed

Narsil deleted the hard_error_safetensors branch April 2, 2024 14:59

tomaarsen mentioned this pull request May 1, 2024

Bug in DenoisingAutoEncoderLoss.py UKPLab/sentence-transformers#2619

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hard error when ignoring tensors. (#27484) #29906

Hard error when ignoring tensors. (#27484) #29906

Narsil commented Mar 27, 2024 •

edited by ArthurZucker

Loading

HuggingFaceDocBuilderDev commented Mar 27, 2024

Narsil Mar 27, 2024

Narsil Mar 27, 2024

ArthurZucker left a comment

amyeroberts left a comment

amyeroberts Apr 2, 2024

amyeroberts Apr 2, 2024

Narsil Apr 2, 2024

amyeroberts Apr 2, 2024

amyeroberts Apr 2, 2024

Narsil commented Apr 2, 2024

		total_encoder_name=f"{total_encoder_name}.{encoder_name}",
		total_decoder_name=f"{total_decoder_name}.{decoder_name}",

	tied_weights.append(f"{base_encoder_name}{total_encoder_name}.weight")
	tied_weights.append(f"{base_encoder_name}.{total_encoder_name}.weight")

Hard error when ignoring tensors. (#27484) #29906

Hard error when ignoring tensors. (#27484) #29906

Conversation

Narsil commented Mar 27, 2024 • edited by ArthurZucker Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Mar 27, 2024

Narsil Mar 27, 2024

Choose a reason for hiding this comment

Narsil Mar 27, 2024

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Apr 2, 2024

Choose a reason for hiding this comment

amyeroberts Apr 2, 2024

Choose a reason for hiding this comment

Narsil Apr 2, 2024

Choose a reason for hiding this comment

amyeroberts Apr 2, 2024

Choose a reason for hiding this comment

amyeroberts Apr 2, 2024

Choose a reason for hiding this comment

Narsil commented Apr 2, 2024

Narsil commented Mar 27, 2024 •

edited by ArthurZucker

Loading