Fix weight tying in TF-ESM #22839

Rocketknight1 · 2023-04-18T16:57:40Z

TF ESM cloned weights instead of tying, which worked when loading from PT but broke when loading from safetensors. This resolves the issue by correctly tying weights when this is enabled in the config. Fixes an ongoing CI error raised by @ydshieh

Rocketknight1 · 2023-04-18T17:09:11Z

Also cc @gante in case he hates how I handled weight tying here, I don't want to break TF convention too much!

HuggingFaceDocBuilderDev · 2023-04-18T17:12:46Z

The documentation is not available anymore as the PR was closed or merged.

gante · 2023-04-18T17:16:01Z

@Rocketknight1 I'm cool with this :D

sgugger

LGTM but @amyeroberts might have more insight in the TF subtleties of this code.

amyeroberts

All looks good to me! Thanks for adding this :)

I just have a small question about the tests.

amyeroberts · 2023-04-19T09:50:14Z

tests/models/esm/test_modeling_tf_esm.py

+        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
+
+        for model_class in self.all_model_classes:
+            model = model_class(config)


The modification of the modeling code is controlled by self.config.tie_word_embeddings, but all the models here are using the same config. Is the toggling of tying or not tying the weights tested elsewhere?

I'm quite likely missing something just judging on the diff here though

The reason for overriding this test is that the common test expects model.get_output_embedding() to return a tf.keras.layers.Layer, but in this case I'm using a simple shared matrix created with add_weight, so I had to tweak the test a little. I'm actually not testing the effect of the tie_word_embeddings parameter anywhere, but maybe I should, unless it's already tested somewhere in the common tests!

amyeroberts · 2023-04-19T09:50:48Z

src/transformers/models/esm/modeling_tf_esm.py

@@ -1102,6 +1103,11 @@ def __init__(self, config):

        self.esm = TFEsmMainLayer(config, add_pooling_layer=False, name="esm")
        self.lm_head = TFEsmLMHead(config, name="lm_head")
+        if config.tie_word_embeddings:
+            # Ensure word embeddings are built so that we actually have something to tie
+            with tf.name_scope(os.path.join(self._name_scope(), "esm", "embeddings", "word_embeddings")):


Cries in TensorFlow 😭

I know, but it's the only option when we want to create a weight for a distant sublayer! TF only 'walks' the name scope hierarchy via the call stack when weights are built during call(); every other time you have to explicitly enter the tf.name_scope() you want.

Fix weight tying in ESM

Fix weight tying in ESM

be5f52d

Rocketknight1 requested review from amyeroberts and sgugger April 18, 2023 16:57

sgugger approved these changes Apr 18, 2023

View reviewed changes

amyeroberts approved these changes Apr 19, 2023

View reviewed changes

Rocketknight1 merged commit 6dc0a84 into main Apr 20, 2023

Rocketknight1 deleted the fix_esm_tied_weights branch April 20, 2023 14:50

novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023

Fix weight tying in TF-ESM (huggingface#22839)

718211f

Fix weight tying in ESM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix weight tying in TF-ESM #22839

Fix weight tying in TF-ESM #22839

Rocketknight1 commented Apr 18, 2023

Rocketknight1 commented Apr 18, 2023

HuggingFaceDocBuilderDev commented Apr 18, 2023 •

edited

Loading

gante commented Apr 18, 2023

sgugger left a comment

amyeroberts left a comment

amyeroberts Apr 19, 2023

Rocketknight1 Apr 19, 2023 •

edited

Loading

amyeroberts Apr 19, 2023

Rocketknight1 Apr 19, 2023

Fix weight tying in TF-ESM #22839

Fix weight tying in TF-ESM #22839

Conversation

Rocketknight1 commented Apr 18, 2023

Rocketknight1 commented Apr 18, 2023

HuggingFaceDocBuilderDev commented Apr 18, 2023 • edited Loading

gante commented Apr 18, 2023

sgugger left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Apr 19, 2023

Choose a reason for hiding this comment

Rocketknight1 Apr 19, 2023 • edited Loading

Choose a reason for hiding this comment

amyeroberts Apr 19, 2023

Choose a reason for hiding this comment

Rocketknight1 Apr 19, 2023

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Apr 18, 2023 •

edited

Loading

Rocketknight1 Apr 19, 2023 •

edited

Loading