TF: TFMarianMTModel final logits bias as a layer #18833

gante · 2022-08-31T12:44:33Z

What does this PR do?

As stated in the issue above, final_logits_bias in TFMarianMTModel are not being loaded at from_pretrained(...) time. The PT model has this variable defined, and thus the outputs of the model in the two frameworks are very different (>1e-1).

Actually, these weights are also not being stored when the TF version is saved, for the same reason -- only layers are stored/loaded with the functions we are using (.save_weights and .load_weights), and this bias weight is not inside a layer.

As a solution, this PR moves the bias to a layer and creates an alias for it, resulting in no interface changes. After this change, the models from Helsinki-NLP can be converted with the pt-to-tf CLI, passing all the quality checks.

⚠️ Other models have this pattern, so I will apply the change to them in a separate PR if this one gets approved.

HuggingFaceDocBuilderDev · 2022-08-31T13:05:50Z

The documentation is not available anymore as the PR was closed or merged.

ydshieh · 2022-08-31T14:02:19Z

@gante Thanks a lot. It looks like it works well!

However, there is one thing I don't understand quite well.

(Pdb) [x.name for x in model.non_trainable_weights]
['final_logits_bias:0']

and this is good as it makes loading correctly. But I was thinking I will see ['final_logits_bias.final_logits_bias:0'], as you pass the name to the layer as well as in add_weight.

Is it true that when we use add_weight inside a layer, that layer name won't appear in the variable name for that weight?

(I set a breakpoint at in src/transformers/modeling_tf_utils.py at line 847)

ydshieh

Looks good to me, as it works. But I left a question.
Thanks a lot @gante

gante · 2022-08-31T15:28:22Z

@ydshieh hah, I had the same question but I tried, it worked, and I forgot to dig deeper to understand why :D

After some digging, I found that it is poorly documented -- variables created with .add_weight are set without any name scope, i.e. their name consists of the name set in name. This is opposed to the weights from layers, such as tf.keras.layers.Dense, that automatically get a scoped name according to the name of the layers (e.g. foo/bar/weights:0).

This implies that initializing BiasLayer with a name has no effect whatsoever regarding weight storing/loading. If we wanted the weights to have a scoped name (we don't here), we could either hardcode it in name (example) or use tf.name_scope (example).

I'm adding a link to this comment in the code, for future reference.

ydshieh · 2022-08-31T15:34:34Z

Thanks a lot @gante , you are the best!

patrickvonplaten

Looks good to me!

* bias as a layer * alias the bias (hah, it rhymes) * add comment with info

gante added 4 commits August 31, 2022 11:50

bias as a layer

996a6c5

alias the bias (hah, it rhymes)

8a3057a

no need for a call()

472c62b

closer to the original code

d5cb65d

gante requested review from patrickvonplaten and ydshieh August 31, 2022 12:54

ydshieh approved these changes Aug 31, 2022

View reviewed changes

call() is needed to register the weights in the used params

9117302

add comment with info

5b1dcbe

patrickvonplaten approved these changes Sep 2, 2022

View reviewed changes

gante merged commit 7f27e00 into huggingface:main Sep 5, 2022

gante deleted the tf_fix_marian branch September 5, 2022 08:20

gante mentioned this pull request Sep 6, 2022

TF: final bias as a layer in seq2seq models (replicate TFMarian fix) #18903

Merged

oneraghavan pushed a commit to oneraghavan/transformers that referenced this pull request Sep 26, 2022

TF: TFMarianMTModel final logits bias as a layer (huggingface#18833)

29711e0

* bias as a layer * alias the bias (hah, it rhymes) * add comment with info

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TF: TFMarianMTModel final logits bias as a layer #18833

TF: TFMarianMTModel final logits bias as a layer #18833

gante commented Aug 31, 2022

HuggingFaceDocBuilderDev commented Aug 31, 2022 •

edited

Loading

ydshieh commented Aug 31, 2022 •

edited

Loading

ydshieh left a comment

gante commented Aug 31, 2022 •

edited

Loading

ydshieh commented Aug 31, 2022

patrickvonplaten left a comment

TF: TFMarianMTModel final logits bias as a layer #18833

TF: TFMarianMTModel final logits bias as a layer #18833

Conversation

gante commented Aug 31, 2022

What does this PR do?

HuggingFaceDocBuilderDev commented Aug 31, 2022 • edited Loading

ydshieh commented Aug 31, 2022 • edited Loading

ydshieh left a comment

Choose a reason for hiding this comment

gante commented Aug 31, 2022 • edited Loading

ydshieh commented Aug 31, 2022

patrickvonplaten left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Aug 31, 2022 •

edited

Loading

ydshieh commented Aug 31, 2022 •

edited

Loading

gante commented Aug 31, 2022 •

edited

Loading