Skip to content

BertForMultipleChoice QuickTour issue with weights? #1789

@ChrisPalmerNZ

Description

@ChrisPalmerNZ

🐛 Bug

Model I am using (BertForMultipleChoice):

Language I am using the model on (English.):

The problem arise when using:

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details)

To Reproduce

Steps to reproduce the behavior:

  1. Run the sample code from the last loop of the Quick Tour
  2. Need to supply directory names, see my code (below) for how I did that...
  3. When BertForMultipleChoice runs the line reshaped_logits = logits.view(-1, num_choices) in modeling_bert.py we get a runtime error RuntimeError: shape '[-1, 16]' is invalid for input of size 1
# Each architecture is provided with several class for fine-tuning on down-stream tasks, e.g.
BERT_MODEL_CLASSES = [BertModel, BertForPreTraining, BertForMaskedLM, BertForNextSentencePrediction,
                      BertForSequenceClassification, BertForMultipleChoice, BertForTokenClassification,
                      BertForQuestionAnswering]
# All the classes for an architecture can be initiated from pretrained weights for this architecture
# Note that additional weights added for fine-tuning are only initialized
# and need to be trained on the down-stream task
pretrained_weights = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(pretrained_weights)
for model_class in BERT_MODEL_CLASSES:
    
    print("Processing", model_class.__name__, "...")
    
    # Store class name as target directory
    model_dir_name = model_class.__name__+"/"
    
    # Load pretrained model/tokenizer
    model = model_class.from_pretrained(pretrained_weights)

    # Models can return full list of hidden-states & attentions weights at each layer
    model = model_class.from_pretrained(pretrained_weights,
                                        output_hidden_states=True,
                                        output_attentions=True)
    input_ids = torch.tensor([tokenizer.encode("Let's see all hidden-states and attentions on this text")])
    all_hidden_states, all_attentions = model(input_ids)[-2:]

    # Models are compatible with Torchscript
    model = model_class.from_pretrained(pretrained_weights, torchscript=True)
    traced_model = torch.jit.trace(model, (input_ids,))
    
    save_directory = 'BERT_test/'+ model_dir_name
    if not os.path.isdir(save_directory):
        os.mkdir(save_directory)
        
    # Simple serialization for models and tokenizers
    model.save_pretrained(save_directory)  # save
    model = model_class.from_pretrained(save_directory)  # re-load
    tokenizer.save_pretrained(save_directory)  # save
    tokenizer = BertTokenizer.from_pretrained(save_directory)  # re-load

    # SOTA examples for GLUE, SQUAD, text generation...

The error:

I1111 20:47:30.383128 21676 modeling_utils.py:383] loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-pytorch_model.bin from cache at C:\Users\User\.cache\torch\transformers\aa1ef1aede4482d0dbcd4d52baad8ae300e60902e88fcb0bebdec09afd232066.36ca03ab34a1a5d5fa7bc3d03d55c4fa650fed07220e2eeebc06ce58d0e9a157
I1111 20:47:33.363116 21676 modeling_utils.py:453] Weights of BertForMultipleChoice not initialized from pretrained model: ['classifier.weight', 'classifier.bias']
I1111 20:47:33.365122 21676 modeling_utils.py:456] Weights from pretrained model not used in BertForMultipleChoice: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-10-da6597bd484d> in <module>()
     19                                         output_attentions=True)
     20     input_ids = torch.tensor([tokenizer.encode("Let's see all hidden-states and attentions on this text")])
---> 21     all_hidden_states, all_attentions = model(input_ids)[-2:]
     22 
     23     # Models are compatible with Torchscript

G:\Anaconda3\envs\pytorch1\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    539             result = self._slow_forward(*input, **kwargs)
    540         else:
--> 541             result = self.forward(*input, **kwargs)
    542         for hook in self._forward_hooks.values():
    543             hook_result = hook(self, input, result)

g:\deeplearning\huggingface\transformers\transformers\modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels)
   1096         pooled_output = self.dropout(pooled_output)
   1097         logits = self.classifier(pooled_output)
-> 1098         reshaped_logits = logits.view(-1, num_choices)
   1099 
   1100         outputs = (reshaped_logits,) + outputs[2:]  # add hidden states and attention if they are here

RuntimeError: shape '[-1, 16]' is invalid for input of size 1

Expected behavior

Should run without error!

Environment

  • OS: Windows 10
  • Python version: 3.6.6
  • PyTorch version: 1.3.0
  • PyTorch Transformers version (or branch): 2.1.1
  • Using GPU ? Yes
  • Distributed of parallel setup ? No
  • Any other relevant information:

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions