Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DPR pooler weights not loading correctly #19111

Closed
3 of 4 tasks
maxmatical opened this issue Sep 19, 2022 · 6 comments · Fixed by #20210
Closed
3 of 4 tasks

DPR pooler weights not loading correctly #19111

maxmatical opened this issue Sep 19, 2022 · 6 comments · Fixed by #20210
Assignees
Labels

Comments

@maxmatical
Copy link

maxmatical commented Sep 19, 2022

System Info

tested on multiple versions

  • transformers version: 4.12.3
  • Platform: Linux-4.14.281-212.502.amzn2.x86_64-x86_64-with-glibc2.10
  • Python version: 3.8.10
  • PyTorch version (GPU?): 1.11.0+cu102 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?:no

another environment

  • transformers version: 4.16.2
  • Platform: macOS-12.6-x86_64-i386-64bit
  • Python version: 3.9.7
  • PyTorch version (GPU?): 1.11.0+cu102 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: no

Who can help?

@patrickvonplaten @lhoestq

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import DPRContextEncoder, DPRQuestionEncoder
question_encoder_path = "facebook/dpr-question_encoder-single-nq-base"  # can also be a custom checkpoint
answer_encoder_path = "facebook/dpr-ctx_encoder-single-nq-base"

DPRQuestionEncoder.from_pretrained(question_encoder_path)
DPRContextEncoder.from_pretrained(answer_encoder_path)

results in the following message

Some weights of the model checkpoint at facebook/dpr-question_encoder-single-nq-base were not used when initializing DPRQuestionEncoder: ['question_encoder.bert_model.pooler.dense.weight', 'question_encoder.bert_model.pooler.dense.bias']
- This IS expected if you are initializing DPRQuestionEncoder from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DPRQuestionEncoder from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Some weights of the model checkpoint at facebook/dpr-ctx_encoder-single-nq-base were not used when initializing DPRContextEncoder: ['ctx_encoder.bert_model.pooler.dense.weight', 'ctx_encoder.bert_model.pooler.dense.bias']
- This IS expected if you are initializing DPRContextEncoder from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DPRContextEncoder from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Expected behavior

Model loads successfully without re-intitializing weights

@maxmatical maxmatical added the bug label Sep 19, 2022
@patrickvonplaten
Copy link
Contributor

@ArthurZucker could you take a look here (happy to answer questions about the model)

@ArthurZucker
Copy link
Collaborator

on it!

@sammys377
Copy link

Hi @ArthurZucker , do you have any updates on this by chance? I'm getting the same issue, but the results I get when benchmarking do not suggest random initialization of the weights.

@ArthurZucker
Copy link
Collaborator

Hey, it seems that the issue comes from the following line, where it is hardcoded that add_pooling_layer=False. Setting it to True fixes the issue. Now I am not very familiar with the model, but a few test seems to be link to that checkpoint. Let me open a PR for a fix !

@ArthurZucker
Copy link
Collaborator

Hey! So as mentioned here in a previous PR, the optional pooling layer was removed as no checkpoints use it.

My first question would then be : do you need to have the BERTPoolerLayer? It is a bit confusing indeed that the pooling output do not come from the BertPoolerLayer. Have a look at #14486, I think it explains pretty well what's going on here.

We have to ways to go about this :

  1. We add an argument in the config of DPR, and take care about updating the online config to have no breaking changes.
  2. If you don't need it, then we just add a warning/update the online weights doing from_pretrained then push_to_hub and the checkpoints will then not include the pooler weights 😄

@sammys377
Copy link

sammys377 commented Oct 20, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants