run_mlm.py : Missing key(s) in state_dict & Unexpected key(s) in state_dict #11485

TingNLP · 2021-04-28T07:29:22Z

Environment info

transformers version: 4.6.0.dev0
Platform: Ubuntu 16.04.3 LTS
Python version: Python 3.6.13 :: Anaconda, Inc.
PyTorch version (GPU?): 1.8.1+cu102
Tensorflow version (GPU?):
Using GPU in script?: YES
Using distributed or parallel set-up in script?: YES

Who can help

@sgugger

Information

Model I am using roberta:

The problem arises when using:

the official example scripts: run_mlm.py

The tasks I am working on is:

my own task or dataset: wikitext-2-raw-txt
(https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/)

To reproduce

Steps to reproduce the behavior:

I follow the example
https://github.com/huggingface/transformers/tree/master/examples/pytorch/language-modeling

When I run

python run_mlm.py \
    --output_dir tmp/test-mlm \
    --model_name_or_path roberta-base \
    --do_train \
    --train_file wikitext-2-raw-txt/wiki.train.txt \
    --do_eval \
    --validation_file wikitext-2-raw-txt/wiki.valid.txt \
    --line_by_line

and the error occurs

2021-04-28 16:18:24.068938: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
04/28/2021 16:18:25 - WARNING - __main__ -   Process rank: -1, device: cuda:0, n_gpu: 4distributed training: False, 16-bits training: False
04/28/2021 16:18:25 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(output_dir=tmp/test-mlm, overwrite_output_dir=False, do_train=True, do_eval=True, do_predict=False, evaluation_strategy=IntervalStrategy.NO, prediction_loss_only=False, per_device_train_batch_size=8, per_device_eval_batch_size=8, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, lr_scheduler_type=SchedulerType.LINEAR, warmup_ratio=0.0, warmup_steps=0, logging_dir=runs/Apr28_16-18-25_Devbox4, logging_strategy=IntervalStrategy.STEPS, logging_first_step=False, logging_steps=500, save_strategy=IntervalStrategy.STEPS, save_steps=500, save_total_limit=None, no_cuda=False, seed=42, fp16=False, fp16_opt_level=O1, fp16_backend=auto, fp16_full_eval=False, local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name=tmp/test-mlm, disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None, ignore_data_skip=False, sharded_ddp=[], deepspeed=None, label_smoothing_factor=0.0, adafactor=False, group_by_length=False, length_column_name=length, report_to=['tensorboard', 'wandb'], ddp_find_unused_parameters=None, dataloader_pin_memory=True, skip_memory_metrics=False, use_legacy_prediction_loop=False, push_to_hub=False, _n_gpu=4, mp_parameters=)
04/28/2021 16:18:26 - WARNING - datasets.builder -   Using custom data configuration default-b1467a68ec9fe52f
04/28/2021 16:18:27 - WARNING - datasets.builder -   Reusing dataset text (/home/A50442/.cache/huggingface/datasets/text/default-b1467a68ec9fe52f/0.0.0/e16f44aa1b321ece1f87b07977cc5d70be93d69b20486d6dacd62e12cf25c9a5)
[INFO|configuration_utils.py:498] 2021-04-28 16:18:27,029 >> loading configuration file roberta-base/config.json
[INFO|configuration_utils.py:536] 2021-04-28 16:18:27,029 >> Model config RobertaConfig {
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.6.0.dev0",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

[INFO|configuration_utils.py:498] 2021-04-28 16:18:27,030 >> loading configuration file roberta-base/config.json
[INFO|configuration_utils.py:536] 2021-04-28 16:18:27,030 >> Model config RobertaConfig {
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.6.0.dev0",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

[INFO|tokenization_utils_base.py:1649] 2021-04-28 16:18:27,030 >> Didn't find file roberta-base/added_tokens.json. We won't load it.
[INFO|tokenization_utils_base.py:1649] 2021-04-28 16:18:27,030 >> Didn't find file roberta-base/special_tokens_map.json. We won't load it.
[INFO|tokenization_utils_base.py:1649] 2021-04-28 16:18:27,030 >> Didn't find file roberta-base/tokenizer_config.json. We won't load it.
[INFO|tokenization_utils_base.py:1713] 2021-04-28 16:18:27,030 >> loading file roberta-base/vocab.json
[INFO|tokenization_utils_base.py:1713] 2021-04-28 16:18:27,030 >> loading file roberta-base/merges.txt
[INFO|tokenization_utils_base.py:1713] 2021-04-28 16:18:27,031 >> loading file roberta-base/tokenizer.json
[INFO|tokenization_utils_base.py:1713] 2021-04-28 16:18:27,031 >> loading file None
[INFO|tokenization_utils_base.py:1713] 2021-04-28 16:18:27,031 >> loading file None
[INFO|tokenization_utils_base.py:1713] 2021-04-28 16:18:27,031 >> loading file None
[INFO|modeling_utils.py:1111] 2021-04-28 16:18:27,103 >> loading weights file roberta-base/pytorch_model.bin
[INFO|modeling_utils.py:1257] 2021-04-28 16:18:30,300 >> All model checkpoint weights were used when initializing RobertaForMaskedLM.

[INFO|modeling_utils.py:1266] 2021-04-28 16:18:30,300 >> All the weights of RobertaForMaskedLM were initialized from the model checkpoint at roberta-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use RobertaForMaskedLM for predictions without further training.
100%|██████████████████████████████████████████████████████████████████████████████████████| 37/37 [00:01<00:00, 18.82ba/s]
100%|████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 20.73ba/s]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[INFO|trainer.py:1027] 2021-04-28 16:18:34,809 >> Loading model from roberta-base).
Traceback (most recent call last):
  File "run_mlm.py", line 496, in <module>
    main()
  File "run_mlm.py", line 459, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/A50442/anaconda3/envs/transformer/lib/python3.6/site-packages/transformers/trainer.py", line 1046, in train
    self.model.load_state_dict(state_dict)
  File "/home/A50442/anaconda3/envs/transformer/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1224, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for RobertaForMaskedLM:
	Missing key(s) in state_dict: "roberta.embeddings.position_ids", "lm_head.decoder.bias". 
	Unexpected key(s) in state_dict: "roberta.pooler.dense.weight", "roberta.pooler.dense.bias".

Expected behavior

The expected behavior is that I will get a new pretrain language model based on my dataset

The text was updated successfully, but these errors were encountered:

sgugger · 2021-04-28T12:05:35Z

The command runs for me and according to your logs, the Trainer is loading a local checkpoint named roberta-base. Do you have a local folder named roberta-base? It looks like it contains a checkpoint different from the actual roberta-base model, which messes up and creates the error. Could you move that folder and try again?

TingNLP · 2021-04-29T03:52:06Z

@sgugger
Yes, I create a local folder named roberta-base, but the roberta-base folder contents is download from huggingface (https://huggingface.co/roberta-base/tree/main)

the language-modeling folder screenshot as shown below:

the roberta-base folder screenshot as shown below:

so i am confused...

sgugger · 2021-04-29T11:34:49Z

I think it's linked to the bug #11492 is fixing. Should be merged today and then you can try on a source install!

sgugger mentioned this issue Apr 28, 2021

Split checkpoint from model_name_or_path in examples #11492

Merged

sgugger closed this as completed in #11492 Apr 29, 2021

sgugger mentioned this issue May 24, 2021

Switch mem metrics flag #11851

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run_mlm.py : Missing key(s) in state_dict & Unexpected key(s) in state_dict #11485

run_mlm.py : Missing key(s) in state_dict & Unexpected key(s) in state_dict #11485

TingNLP commented Apr 28, 2021 •

edited

Loading

sgugger commented Apr 28, 2021

TingNLP commented Apr 29, 2021

sgugger commented Apr 29, 2021

run_mlm.py : Missing key(s) in state_dict & Unexpected key(s) in state_dict #11485

run_mlm.py : Missing key(s) in state_dict & Unexpected key(s) in state_dict #11485

Comments

TingNLP commented Apr 28, 2021 • edited Loading

Environment info

Who can help

Information

To reproduce

Expected behavior

sgugger commented Apr 28, 2021

TingNLP commented Apr 29, 2021

sgugger commented Apr 29, 2021

TingNLP commented Apr 28, 2021 •

edited

Loading