Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF: TFMarianMTModel final logits bias as a layer #18833

Merged
merged 6 commits into from
Sep 5, 2022

Conversation

gante
Copy link
Member

@gante gante commented Aug 31, 2022

What does this PR do?

Fixes #18802

As stated in the issue above, final_logits_bias in TFMarianMTModel are not being loaded at from_pretrained(...) time. The PT model has this variable defined, and thus the outputs of the model in the two frameworks are very different (>1e-1).

Actually, these weights are also not being stored when the TF version is saved, for the same reason -- only layers are stored/loaded with the functions we are using (.save_weights and .load_weights), and this bias weight is not inside a layer.

As a solution, this PR moves the bias to a layer and creates an alias for it, resulting in no interface changes. After this change, the models from Helsinki-NLP can be converted with the pt-to-tf CLI, passing all the quality checks.

⚠️ Other models have this pattern, so I will apply the change to them in a separate PR if this one gets approved.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 31, 2022

The documentation is not available anymore as the PR was closed or merged.

@ydshieh
Copy link
Collaborator

ydshieh commented Aug 31, 2022

@gante Thanks a lot. It looks like it works well!

However, there is one thing I don't understand quite well.

(Pdb) [x.name for x in model.non_trainable_weights]
['final_logits_bias:0']

and this is good as it makes loading correctly. But I was thinking I will see ['final_logits_bias.final_logits_bias:0'], as you pass the name to the layer as well as in add_weight.

Is it true that when we use add_weight inside a layer, that layer name won't appear in the variable name for that weight?

(I set a breakpoint at in src/transformers/modeling_tf_utils.py at line 847)

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, as it works. But I left a question.
Thanks a lot @gante

@gante
Copy link
Member Author

gante commented Aug 31, 2022

@ydshieh hah, I had the same question but I tried, it worked, and I forgot to dig deeper to understand why :D

After some digging, I found that it is poorly documented -- variables created with .add_weight are set without any name scope, i.e. their name consists of the name set in name. This is opposed to the weights from layers, such as tf.keras.layers.Dense, that automatically get a scoped name according to the name of the layers (e.g. foo/bar/weights:0).

This implies that initializing BiasLayer with a name has no effect whatsoever regarding weight storing/loading. If we wanted the weights to have a scoped name (we don't here), we could either hardcode it in name (example) or use tf.name_scope (example).

I'm adding a link to this comment in the code, for future reference.

@ydshieh
Copy link
Collaborator

ydshieh commented Aug 31, 2022

Thanks a lot @gante , you are the best!

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@gante gante merged commit 7f27e00 into huggingface:main Sep 5, 2022
@gante gante deleted the tf_fix_marian branch September 5, 2022 08:20
oneraghavan pushed a commit to oneraghavan/transformers that referenced this pull request Sep 26, 2022
* bias as a layer

* alias the bias (hah, it rhymes)

* add comment with info
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

load_tf_weights doesn't handle the weights added to the TF models at the top level
4 participants