-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What to do about this warning message: "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification" #5421
Comments
Not sure what's happening with the multiple duplicate opened issues, @ohmeow? Is GitHub flaky again? :) |
I am also encountering the same warning. When loading the model
When attempting to fine tune it:
Is the model correctly fine-tuning? Are the pre-trained model weights also getting updated (fine-tuned) or only the layers outside(above) the pre-trained model are changing their weights while training? |
I noticed the same thing. Not sure what is going on ... but I swear I only opened this one :) |
@ohmeow you're loading the This means that:
This is expected, and tells you that you won't have good performance with your @fliptrail this warning means that during your training, you're not using the |
@LysandreJik Thank you for your response.
I am only using the Warning while loading model:
While attempting to train:
This warning only started to appear from yesterday in all my codes and other sample codes given. |
Hello everyone,
I have been using this same code for more than 2 weeks and no problem till yesterday. |
Thanks @LysandreJik
Makes sense. Now, how do we know what checkpoints are available that were trained on |
@fliptrail in your code you have the following: embedding = encoder(input_ids, token_type_ids=token_type_ids, attention_mask=attention_mask)[0] which means you're only getting the first output of the model, and using that to compute the loss. The first output of the model is the hidden states:
You're ignoring the second value which is the pooler output. The warnings are normal in your case. |
@VaibhavBhatnagar17, these are warnings, not errors. What exact warning are you not understanding? |
@ohmeow that really depends on what you want to do! Sequence classification is a large subject, with many different tasks. Here's a list of all available checkpoints fine-tuned on sequence classification (not all are for BERT, though!) Please be aware that if you have a specific task in mind, you should fine-tune your model to that task. |
@LysandreJik Hey, What I am not able to understand is that I was using this code for more than 2 weeks and no warning came up till yesterday. I haven't changed anything but suddenly this warning came up is confusing. |
The warning came up yesterday because version 3.0.0 was released yesterday. It's weird that you saw an output dimension changed since yesterday. What's the error you get? |
I see this same warning when initializing
Note that my module imports/initializations essentially duplicate the snippet demonstrating cloze task usage at https://huggingface.co/bert-large-uncased-whole-word-masking?text=Paris+is+the+%5BMASK%5D+of+France.
Am I correct in assuming that nothing has changed in the behavior of the relevant model, but that perhaps this warning should have been being printed all along? |
You're right, this has always been the behavior of the models. It wasn't clear enough before, so we've clarified it with this warning. |
Thanks, @LysandreJik . |
Anyone knows how to suppress this warning? I am aware that the model needs fine-tuning and I am fine-tuning it so, it becomes annoying to see this over and over again. |
You can manage the warnings with the from transformers import logging
logging.set_verbosity_warning() |
@LysandreJik Thanks for the rapid response, I set it with set_verbosity_error() |
@LysandreJik - So , by default bert-base-uncased loading from But when loading from
So that means , these 5 variables are randomly initialising right. OR can we take output token embeddings ( before passing to mlm___cls ) |
@LysandreJik I'm having a slightly different issue here - I'm loading a sequence classification checkpoint in a
Output:
I believe it's NOT expected because I'm indeed initializing from a model that I expect to be exactly identical. I'm only starting to get this warning after upgrading to transformers v3 as well. I'm using 3.3.1 currently. Could you please help? Thanks! |
@s4sarath I'm not sure I understand your question. @veronica320, the pooler layer is not used when doing sequence classification, so there's nothing to be worried about. The pooler is the second output of the But only the first output is used in the sequence classification model: |
Thanks a lot! |
@LysandreJik - Sorry to make you confused .
The above 4 variables are randomly initialising right, means they were not a part of official BERT . |
Thank you for your explanation. Actually these four variables shouldn't be initialized randomly, as they're part of BERT. The official BERT checkpoints contain two heads: the MLM head and the NSP head. You can see it here: >>> from transformers import TFBertForMaskedLM
>>> model = TFBertForMaskedLM.from_pretrained("bert-base-cased") Among the logging, you should find this:
This tells you two things:
If you're getting those variables randomly initialized:
then it means you're using a checkpoint that does not contain these variables. These are the MLM layers, so you're probably loading a checkpoint that was saved using an architecture that does not contain these layers. This can happen if you do the following: >>> from transformers import TFBertModel, TFBertForMaskedLM
>>> model = TFBertModel.from_pretrained("bert-base-cased")
>>> model.save_pretrained(directory)
>>> mlm_model = TFBertForMaskedLM.from_pretrained(directory) I hope this answers your question! |
Oh okay. Thank you so much for the clarification. When I looked at bert
models from tf-hub , these 4 variables were not present. That was the
reason for the confusion .
…On Tue, Oct 27, 2020, 7:02 PM Lysandre Debut ***@***.***> wrote:
Thank you for your explanation.
Actually these four variables shouldn't be initialized randomly, as
they're part of BERT. The official BERT checkpoints contain two heads: the
MLM head and the NSP head.
You can see it here:
>>> from transformers import TFBertForMaskedLM>>> model = TFBertForMaskedLM.from_pretrained("bert-base-cased")
Among the logging, you should find this:
Some layers from the model checkpoint at bert-base-cased were not used when initializing TFBertForMaskedLM: ['nsp___cls']
- This IS expected if you are initializing TFBertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing TFBertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertForMaskedLM were initialized from the model checkpoint at bert-base-cased.
This tells you two things:
- Some layers of the checkpoints are not used. These are ['nsp___cls'],
corresponding to the CLS head. Since we're using a ***ForMaskedLM, it
makes sense not to use the CLS head
- All the layers of the model were initialized from the model
checkpoint, as both the transformer layers and the MLM head were present in
the checkpoint.
If you're getting those variables randomly initialized:
tf_bert_for_masked_lm_1/mlm___cls/predictions/bias:0
tf_bert_for_masked_lm_1/mlm___cls/predictions/transform/dense/kernel:0
tf_bert_for_masked_lm_1/mlm___cls/predictions/transform/dense/bias:0
tf_bert_for_masked_lm_1/mlm___cls/predictions/transform/LayerNorm/gamma:0
tf_bert_for_masked_lm_1/mlm___cls/predictions/transform/LayerNorm/beta:0
then it means you're using a checkpoint that does not contain these
variables. These are the MLM layers, so you're probably loading a
checkpoint that was saved using an architecture that does not contain these
layers. This can happen if you do the following:
>>> from transformers import TFBertModel, TFBertForMaskedLM>>> model = TFBertModel.from_pretrained("bert-base-cased")>>> model.save_pretrained(directory)>>> mlm_model = TFBertForMaskedLM.from_pretrained(directory)
I hope this answers your question!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5421 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACRE6KEEQACWSAEO3GK3CL3SM3DYNANCNFSM4OM5S2SQ>
.
|
Hi, is there any solution? the warning as below:
the learner still can fit and predict, but the prediction is not consistent every time |
I don't know brother.
I really can't understand those warnings Because it doesn't make sense.
Check github.com/legacyai/tf-tranaformers . A new and improved version is
on the way.
…On Tue, Apr 13, 2021, 2:48 PM TingNLP ***@***.***> wrote:
Hi, is there any solution?
I have a same problem.
#339 <#339> #18
<#18> #132
<#132>
the warning as below:
Some weights of the model checkpoint at bert-base-uncased were not used
when initializing BertForMultiLabelSequenceClassification:
['cls.predictions.bias', 'cls.predictions.transform.dense.weight',
'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight',
'cls.seq_relationship.weight', 'cls.seq_relationship.bias',
'cls.predictions.transform.LayerNorm.weight',
'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing
BertForMultiLabelSequenceClassification from the checkpoint of a model
trained on another task or with another architecture (e.g. initializing a
BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing
BertForMultiLabelSequenceClassification from the checkpoint of a model that
you expect to be exactly identical (initializing a
BertForSequenceClassification model from a BertForSequenceClassification
model).
Some weights of BertForMultiLabelSequenceClassification were not
initialized from the model checkpoint at bert-base-uncased and are newly
initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able
to use it for predictions and inference.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5421 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACRE6KEATEJTCCLXBKPFVZDTIQD7ZANCNFSM4OM5S2SQ>
.
|
All of the In the
Since @s4sarath Anytime you use code like @TingNLP You are getting different predictions each time because each time you instantiate the model using |
Absolutely agree .
Task specific heads has to be randomly initialised. Because, it is not a
part of official Bert Model.
I agree with that.
…On Tue, Apr 13, 2021, 5:49 PM Raguvir Kunani ***@***.***> wrote:
All of the BertForXXX models consist of a BERT model
<https://huggingface.co/transformers/model_doc/bert.html#bertmodel>
followed by some head which is task-specific. For sequence classification
tasks, the head is just a linear layer which maps the BERT transformer
hidden state vector to a vector of length num_labels, where num_labels is
the number of classes for your classification task (for example,
positive/negative sentiment analysis has 2 labels). If you're familiar with
logits, this final vector contains the logits.
In the transformers source code, you can see this linear layer (assigned
to self.classifier) initialized in the constructor
<https://huggingface.co/transformers/_modules/transformers/models/bert/modeling_bert.html#BertForSequenceClassification>
for BertForSequenceClassification:
class BertForSequenceClassification(BertPreTrainedModel):
def __init__(self, config):
super().__init__(config)
self.num_labels = config.num_labels
self.bert = BertModel(config)
self.dropout = nn.Dropout(config.hidden_dropout_prob)
self.classifier = nn.Linear(config.hidden_size, config.num_labels)
self.init_weights()
Since self.classifier is not part of the pre-trained BERT model, its
parameters must be initialized randomly (done automatically by the
nn.Linear constructor).
@s4sarath <https://github.com/s4sarath> Anytime you use code like model =
BertForSequenceClassification.from_pretrained("bert-base-cased"), the
self.classifier linear layer will have to be initialized randomly.
@TingNLP <https://github.com/TingNLP> You are getting different
predictions each time because each time you instantiate the model using
.from_pretrained(), the self.classifier parameters will be different.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5421 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACRE6KH7VKR7VKJFIDZC33LTIQZDXANCNFSM4OM5S2SQ>
.
|
OK... So... the problem is the parameters. I think if it can be fixed, the prediction will not be inconsistent every time. |
there is no point doing that right.
because once the model is trained we will be having fixed set of parameters
. :)
…On Wed, Apr 14, 2021, 10:45 AM TingNLP ***@***.***> wrote:
OK... So... the problem is the parameters.
Is it possible for us to fix the value?
I think if it can be fixed, the prediction will not be inconsistent every
time.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5421 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACRE6KGVO6PVJZVFHV2TOBTTIUQHNANCNFSM4OM5S2SQ>
.
|
@s4sarath Thanks for your immediate reply |
I will explain bro.
Assume classification. Last classification layer is initialised randomly
right now. Now, it's okay, because you haven't trained it yet.
But once you train the model and save the checkpoint, at the time of
inference you are loading that checkpoint. So the prediction remains
consistent.
…On Wed, Apr 14, 2021, 12:11 PM TingNLP ***@***.***> wrote:
@s4sarath <https://github.com/s4sarath> Thanks for your immediate reply
I am still a little confused.
If the prediction is different each time, is that still a reasonable
result??
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5421 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACRE6KESSFYPPKW2A5AYEG3TIU2JPANCNFSM4OM5S2SQ>
.
|
It is said that BERT is a pre-trained model. Why then, it is needed to be trained again? |
It does not need to be trained again to be used for a task that it was trained on: e.g., masked language modeling over a very large, general corpus of books and web text in the case of BERT. However, to perform more specific tasks like classification and question answering, such a model must be re-trained, which is called fine-tuning. Since many popular tasks fall in this latter category, it is assumed that most developers will be fine-tuning the models, and hence the developers of Huggingface included this warning message to ensure developers are aware when the model does not appear to have been fine-tuned. See Advantages of Fine-Tuning at this tutorial: https://mccormickml.com/2019/07/22/BERT-fine-tuning/#12-installing-the-hugging-face-library Or check out this page from the documentation: https://huggingface.co/transformers/training.html |
Thank you. Now it is a bit more clear. |
I am facing a similar error while creating an entity extraction model using bert-base-uncased. Here is the code for my model
Error
How to reslove this? |
Note that this warning is sensitive to a Transformers version used for model training vs. a version used for inference. |
An interesting edge case -- when I created and fine-tuned my custom classification model |
I've been using suppressing the warning with this helper: from transformers import CLIPTextModel, logging
class log_level:
orig_log_level: int
log_level: int
def __init__(self, log_level: int):
self.log_level = log_level
self.orig_log_level = logging.get_verbosity()
def __enter__(self):
logging.set_verbosity(self.log_level)
def __exit__(self):
logging.set_verbosity(self.orig_log_level)
with log_level(logging.ERROR):
text_encoder: CLIPTextModel = CLIPTextModel.from_pretrained('openai/clip-vit-large-patch14') |
Coming here from Google, this was happening when I called I figured out that you can get the correct model type using the pipeline API instead: In this case, this means I could also use |
For those who want to suppress the warning for the latest transformers version, try this, hope this helps :D
|
I guess the simple solution is to use from transformers import AutoModelForMaskedLM
model = AutoModelForMaskedLM.from_pretrained('deps/distilbert-base-uncased') |
This means you are not using Pooler, you need to set add_pooling_layer=True. |
I am confused after reading the whole issue. Is it possible to load a custom transformer architecture using HF or not? If not, is it possible to load a custom HF architecture (and weights) from a trainer checkpoint using torch? |
Hey @Maximo-Rulli, you can of course load a custom architecture using the |
#################################
when i was loading pretrained weight ,it's appear:Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertLMHeadModel: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'bert.embeddings.token_type_embeddings.weight'] This IS expected if you are initializing BertLMHeadModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). |
Hey 🤗 thanks for the follow up, could you ask your question on the forum instead? I'm sure the community will be of help and we can't debug your custom code for you. |
Hello, import pandas as pd
import numpy as np
import tensorflow as tf
from transformers import TFAutoModel, AutoTokenizer
train_df = pd.read_csv('train.csv', encoding='latin-1')
test_df = pd.read_csv('test.csv', encoding='latin-1')
model = TFAutoModel.from_pretrained("bert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
def preprocess_text(text):
return tokenizer(text, padding=True, truncation=True, return_tensors='tf')
train_inputs = preprocess_text(train_df['STATUS'].tolist())
test_inputs = preprocess_text(test_df['STATUS'].tolist())
label_mapping = {'n': 0, 'y': 1}
train_labels = train_df[['cEXT', 'cNEU', 'cAGR', 'cCON', 'cOPN']].replace(label_mapping).values.astype(np.float32)
test_labels = test_df[['cEXT', 'cNEU', 'cAGR', 'cCON', 'cOPN']].replace(label_mapping).values.astype(np.float32)
train_dataset = tf.data.Dataset.from_tensor_slices(({
'input_ids': train_inputs['input_ids'],
'token_type_ids': train_inputs['token_type_ids'],
'attention_mask': train_inputs['attention_mask']
}, train_labels))
test_dataset = tf.data.Dataset.from_tensor_slices(({
'input_ids': test_inputs['input_ids'],
'token_type_ids': test_inputs['token_type_ids'],
'attention_mask': test_inputs['attention_mask']
}, test_labels))
BATCH_SIZE = 64
train_dataset = train_dataset.shuffle(len(train_df)).batch(BATCH_SIZE)
test_dataset = test_dataset.batch(BATCH_SIZE)
class BERTForClassification(tf.keras.Model):
def __init__(self, bert_model, num_classes):
super().__init__()
self.bert = bert_model
self.dropout = tf.keras.layers.Dropout(0.1)
self.fc = tf.keras.layers.Dense(num_classes, activation='softmax')
def call(self, inputs, training=False):
outputs = self.bert(inputs, training=training)[0] # Use all hidden states
pooled_output = outputs[:, 0, :] # Use the [CLS] token representation
pooled_output = self.dropout(pooled_output, training=training)
return self.fc(pooled_output)
num_classes = len(train_df[['cEXT', 'cNEU', 'cAGR', 'cCON', 'cOPN']].columns)
classifier = BERTForClassification(model, num_classes=num_classes)
loss = tf.keras.losses.CategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-5)
classifier.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])
history = classifier.fit(
train_dataset,
epochs=3
)
classifier.evaluate(test_dataset)
error
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning:
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFBertModel: ['cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing TFBertModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFBertModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.
Epoch 1/3
WARNING:tensorflow:Gradients do not exist for variables ['tf_bert_model/bert/pooler/dense/kernel:0', 'tf_bert_model/bert/pooler/dense/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
WARNING:tensorflow:Gradients do not exist for variables ['tf_bert_model/bert/pooler/dense/kernel:0', 'tf_bert_model/bert/pooler/dense/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
WARNING:tensorflow:Gradients do not exist for variables ['tf_bert_model/bert/pooler/dense/kernel:0', 'tf_bert_model/bert/pooler/dense/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
WARNING:tensorflow:Gradients do not exist for variables ['tf_bert_model/bert/pooler/dense/kernel:0', 'tf_bert_model/bert/pooler/dense/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument? |
tokenizer_config.json: 100%
1.27k/1.27k [00:00<00:00, 32.0kB/s]
vocab.txt: 100%
232k/232k [00:00<00:00, 3.49MB/s]
tokenizer.json: 100%
711k/711k [00:00<00:00, 3.66MB/s]
special_tokens_map.json: 100%
125/125 [00:00<00:00, 1.57kB/s]
config.json: 100%
913/913 [00:00<00:00, 18.6kB/s]
model.safetensors: 100%
436M/436M [00:08<00:00, 52.1MB/s]
Some weights of BertForMaskedLM were not initialized from the model checkpoint at judithrosell/BC5CDR_BlueBERT_NER and are newly initialized: ['cls.predictions.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
[<ipython-input-10-6a4cfd3451b6>](https://localhost:8080/#) in <cell line: 78>()
76
77 # Create datasets
---> 78 train_dataset = create_dataset_from_file("/content/drive/My Drive/Model_Training/fold2/train.tsv", tokenizer)
79 eval_dataset = create_dataset_from_file("/content/drive/My Drive/Model_Training/fold2/dev.tsv", tokenizer)
IndexError: list index out of range what does this mean? how to correct it? |
Sorry but I think you should ask your question on the forum, you are trying to use custom code, and it does not seem related to this thread 😉 |
This works, except you need to make the exit have 4 parameters, and from transformers import CLIPTextModel, logging should reflect that logging has moved to transformers.util.logging |
returns this warning message:
This just started popping up with v.3 so I'm not sure what is the recommended action to take here. Please advise if you can. Basically, any of my code using the
AutoModelFor<X>
is throwing up this warning now.Thanks.
The text was updated successfully, but these errors were encountered: