Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_mlm.py : Missing key(s) in state_dict & Unexpected key(s) in state_dict #11485

Closed
2 tasks done
TingNLP opened this issue Apr 28, 2021 · 3 comments · Fixed by #11492
Closed
2 tasks done

run_mlm.py : Missing key(s) in state_dict & Unexpected key(s) in state_dict #11485

TingNLP opened this issue Apr 28, 2021 · 3 comments · Fixed by #11492

Comments

@TingNLP
Copy link

TingNLP commented Apr 28, 2021

Environment info

  • transformers version: 4.6.0.dev0
  • Platform: Ubuntu 16.04.3 LTS
  • Python version: Python 3.6.13 :: Anaconda, Inc.
  • PyTorch version (GPU?): 1.8.1+cu102
  • Tensorflow version (GPU?):
  • Using GPU in script?: YES
  • Using distributed or parallel set-up in script?: YES

Who can help

@sgugger

Information

Model I am using roberta:

The problem arises when using:

  • the official example scripts: run_mlm.py

The tasks I am working on is:

To reproduce

Steps to reproduce the behavior:

I follow the example
https://github.com/huggingface/transformers/tree/master/examples/pytorch/language-modeling

When I run

python run_mlm.py \
    --output_dir tmp/test-mlm \
    --model_name_or_path roberta-base \
    --do_train \
    --train_file wikitext-2-raw-txt/wiki.train.txt \
    --do_eval \
    --validation_file wikitext-2-raw-txt/wiki.valid.txt \
    --line_by_line

and the error occurs

2021-04-28 16:18:24.068938: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
04/28/2021 16:18:25 - WARNING - __main__ -   Process rank: -1, device: cuda:0, n_gpu: 4distributed training: False, 16-bits training: False
04/28/2021 16:18:25 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(output_dir=tmp/test-mlm, overwrite_output_dir=False, do_train=True, do_eval=True, do_predict=False, evaluation_strategy=IntervalStrategy.NO, prediction_loss_only=False, per_device_train_batch_size=8, per_device_eval_batch_size=8, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, lr_scheduler_type=SchedulerType.LINEAR, warmup_ratio=0.0, warmup_steps=0, logging_dir=runs/Apr28_16-18-25_Devbox4, logging_strategy=IntervalStrategy.STEPS, logging_first_step=False, logging_steps=500, save_strategy=IntervalStrategy.STEPS, save_steps=500, save_total_limit=None, no_cuda=False, seed=42, fp16=False, fp16_opt_level=O1, fp16_backend=auto, fp16_full_eval=False, local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name=tmp/test-mlm, disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None, ignore_data_skip=False, sharded_ddp=[], deepspeed=None, label_smoothing_factor=0.0, adafactor=False, group_by_length=False, length_column_name=length, report_to=['tensorboard', 'wandb'], ddp_find_unused_parameters=None, dataloader_pin_memory=True, skip_memory_metrics=False, use_legacy_prediction_loop=False, push_to_hub=False, _n_gpu=4, mp_parameters=)
04/28/2021 16:18:26 - WARNING - datasets.builder -   Using custom data configuration default-b1467a68ec9fe52f
04/28/2021 16:18:27 - WARNING - datasets.builder -   Reusing dataset text (/home/A50442/.cache/huggingface/datasets/text/default-b1467a68ec9fe52f/0.0.0/e16f44aa1b321ece1f87b07977cc5d70be93d69b20486d6dacd62e12cf25c9a5)
[INFO|configuration_utils.py:498] 2021-04-28 16:18:27,029 >> loading configuration file roberta-base/config.json
[INFO|configuration_utils.py:536] 2021-04-28 16:18:27,029 >> Model config RobertaConfig {
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.6.0.dev0",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

[INFO|configuration_utils.py:498] 2021-04-28 16:18:27,030 >> loading configuration file roberta-base/config.json
[INFO|configuration_utils.py:536] 2021-04-28 16:18:27,030 >> Model config RobertaConfig {
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.6.0.dev0",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

[INFO|tokenization_utils_base.py:1649] 2021-04-28 16:18:27,030 >> Didn't find file roberta-base/added_tokens.json. We won't load it.
[INFO|tokenization_utils_base.py:1649] 2021-04-28 16:18:27,030 >> Didn't find file roberta-base/special_tokens_map.json. We won't load it.
[INFO|tokenization_utils_base.py:1649] 2021-04-28 16:18:27,030 >> Didn't find file roberta-base/tokenizer_config.json. We won't load it.
[INFO|tokenization_utils_base.py:1713] 2021-04-28 16:18:27,030 >> loading file roberta-base/vocab.json
[INFO|tokenization_utils_base.py:1713] 2021-04-28 16:18:27,030 >> loading file roberta-base/merges.txt
[INFO|tokenization_utils_base.py:1713] 2021-04-28 16:18:27,031 >> loading file roberta-base/tokenizer.json
[INFO|tokenization_utils_base.py:1713] 2021-04-28 16:18:27,031 >> loading file None
[INFO|tokenization_utils_base.py:1713] 2021-04-28 16:18:27,031 >> loading file None
[INFO|tokenization_utils_base.py:1713] 2021-04-28 16:18:27,031 >> loading file None
[INFO|modeling_utils.py:1111] 2021-04-28 16:18:27,103 >> loading weights file roberta-base/pytorch_model.bin
[INFO|modeling_utils.py:1257] 2021-04-28 16:18:30,300 >> All model checkpoint weights were used when initializing RobertaForMaskedLM.

[INFO|modeling_utils.py:1266] 2021-04-28 16:18:30,300 >> All the weights of RobertaForMaskedLM were initialized from the model checkpoint at roberta-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use RobertaForMaskedLM for predictions without further training.
100%|██████████████████████████████████████████████████████████████████████████████████████| 37/37 [00:01<00:00, 18.82ba/s]
100%|████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 20.73ba/s]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[INFO|trainer.py:1027] 2021-04-28 16:18:34,809 >> Loading model from roberta-base).
Traceback (most recent call last):
  File "run_mlm.py", line 496, in <module>
    main()
  File "run_mlm.py", line 459, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/A50442/anaconda3/envs/transformer/lib/python3.6/site-packages/transformers/trainer.py", line 1046, in train
    self.model.load_state_dict(state_dict)
  File "/home/A50442/anaconda3/envs/transformer/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1224, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for RobertaForMaskedLM:
	Missing key(s) in state_dict: "roberta.embeddings.position_ids", "lm_head.decoder.bias". 
	Unexpected key(s) in state_dict: "roberta.pooler.dense.weight", "roberta.pooler.dense.bias".

Expected behavior

The expected behavior is that I will get a new pretrain language model based on my dataset

@sgugger
Copy link
Collaborator

sgugger commented Apr 28, 2021

The command runs for me and according to your logs, the Trainer is loading a local checkpoint named roberta-base. Do you have a local folder named roberta-base? It looks like it contains a checkpoint different from the actual roberta-base model, which messes up and creates the error. Could you move that folder and try again?

@TingNLP
Copy link
Author

TingNLP commented Apr 29, 2021

@sgugger
Yes, I create a local folder named roberta-base, but the roberta-base folder contents is download from huggingface (https://huggingface.co/roberta-base/tree/main)

the language-modeling folder screenshot as shown below:
image

the roberta-base folder screenshot as shown below:
image

so i am confused...

@sgugger
Copy link
Collaborator

sgugger commented Apr 29, 2021

I think it's linked to the bug #11492 is fixing. Should be merged today and then you can try on a source install!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants