DPR pooler weights not loading correctly #19111

maxmatical · 2022-09-19T18:33:07Z

System Info

tested on multiple versions

transformers version: 4.12.3
Platform: Linux-4.14.281-212.502.amzn2.x86_64-x86_64-with-glibc2.10
Python version: 3.8.10
PyTorch version (GPU?): 1.11.0+cu102 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes
Using distributed or parallel set-up in script?:no

another environment

transformers version: 4.16.2
Platform: macOS-12.6-x86_64-i386-64bit
Python version: 3.9.7
PyTorch version (GPU?): 1.11.0+cu102 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: No
Using distributed or parallel set-up in script?: no

Who can help?

@patrickvonplaten @lhoestq

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

from transformers import DPRContextEncoder, DPRQuestionEncoder
question_encoder_path = "facebook/dpr-question_encoder-single-nq-base"  # can also be a custom checkpoint
answer_encoder_path = "facebook/dpr-ctx_encoder-single-nq-base"

DPRQuestionEncoder.from_pretrained(question_encoder_path)
DPRContextEncoder.from_pretrained(answer_encoder_path)

results in the following message

Some weights of the model checkpoint at facebook/dpr-question_encoder-single-nq-base were not used when initializing DPRQuestionEncoder: ['question_encoder.bert_model.pooler.dense.weight', 'question_encoder.bert_model.pooler.dense.bias']
- This IS expected if you are initializing DPRQuestionEncoder from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DPRQuestionEncoder from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Some weights of the model checkpoint at facebook/dpr-ctx_encoder-single-nq-base were not used when initializing DPRContextEncoder: ['ctx_encoder.bert_model.pooler.dense.weight', 'ctx_encoder.bert_model.pooler.dense.bias']
- This IS expected if you are initializing DPRContextEncoder from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DPRContextEncoder from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Expected behavior

Model loads successfully without re-intitializing weights

The text was updated successfully, but these errors were encountered:

patrickvonplaten · 2022-09-27T08:44:42Z

@ArthurZucker could you take a look here (happy to answer questions about the model)

ArthurZucker · 2022-10-17T13:13:29Z

on it!

sammys377 · 2022-10-18T14:44:50Z

Hi @ArthurZucker , do you have any updates on this by chance? I'm getting the same issue, but the results I get when benchmarking do not suggest random initialization of the weights.

ArthurZucker · 2022-10-19T16:39:04Z

Hey, it seems that the issue comes from the following line, where it is hardcoded that add_pooling_layer=False. Setting it to True fixes the issue. Now I am not very familiar with the model, but a few test seems to be link to that checkpoint. Let me open a PR for a fix !

ArthurZucker · 2022-10-20T07:42:09Z

Hey! So as mentioned here in a previous PR, the optional pooling layer was removed as no checkpoints use it.

My first question would then be : do you need to have the BERTPoolerLayer? It is a bit confusing indeed that the pooling output do not come from the BertPoolerLayer. Have a look at #14486, I think it explains pretty well what's going on here.

We have to ways to go about this :

We add an argument in the config of DPR, and take care about updating the online config to have no breaking changes.
If you don't need it, then we just add a warning/update the online weights doing from_pretrained then push_to_hub and the checkpoints will then not include the pooler weights 😄

sammys377 · 2022-10-20T14:51:25Z

Hi Arthur, I think I understand. Thanks for getting back so quickly! Its performance suggests that the model is loading correctly so that must be it! Thanks!

…

On Thu, Oct 20, 2022 at 3:42 AM Arthur ***@***.***> wrote: Hey! So as mentioned here in a previous PR <95eaf44>, the optional pooling layer was removed as no checkpoints use it. My first question would then be : do you need to have the BERTPoolerLayer? It is a bit confusing indeed that the pooling output do not come from the BertPoolerLayer. Have a look at #14486 <#14486>, I think it explains pretty well what's going on here. We have to ways to go about this : 1. We add an argument in the config of DPR, and take care about updating the online config to have no breaking changes. 2. If you don't need it, then we just add a warning/update the online weights doing from_pretrained then push_to_hub and the checkpoints will then not include the pooler weights 😄 — Reply to this email directly, view it on GitHub <#19111 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AXTXAGMEKV3HT2J3QSWTHF3WEDZVZANCNFSM6AAAAAAQQLMVMQ> . You are receiving this because you commented.Message ID: ***@***.***>

maxmatical added the bug label Sep 19, 2022

ArthurZucker self-assigned this Oct 21, 2022

ArthurZucker mentioned this issue Nov 14, 2022

[DPR] fix unexpected "pooler" keys #20210

Merged

ArthurZucker closed this as completed in #20210 Nov 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPR pooler weights not loading correctly #19111

DPR pooler weights not loading correctly #19111

maxmatical commented Sep 19, 2022 •

edited

Loading

patrickvonplaten commented Sep 27, 2022

ArthurZucker commented Oct 17, 2022

sammys377 commented Oct 18, 2022

ArthurZucker commented Oct 19, 2022

ArthurZucker commented Oct 20, 2022

sammys377 commented Oct 20, 2022 via email

DPR pooler weights not loading correctly #19111

DPR pooler weights not loading correctly #19111

Comments

maxmatical commented Sep 19, 2022 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

patrickvonplaten commented Sep 27, 2022

ArthurZucker commented Oct 17, 2022

sammys377 commented Oct 18, 2022

ArthurZucker commented Oct 19, 2022

ArthurZucker commented Oct 20, 2022

sammys377 commented Oct 20, 2022 via email

maxmatical commented Sep 19, 2022 •

edited

Loading