You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2021-04-28 16:18:24.068938: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
04/28/2021 16:18:25 - WARNING - __main__ - Process rank: -1, device: cuda:0, n_gpu: 4distributed training: False, 16-bits training: False
04/28/2021 16:18:25 - INFO - __main__ - Training/evaluation parameters TrainingArguments(output_dir=tmp/test-mlm, overwrite_output_dir=False, do_train=True, do_eval=True, do_predict=False, evaluation_strategy=IntervalStrategy.NO, prediction_loss_only=False, per_device_train_batch_size=8, per_device_eval_batch_size=8, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, lr_scheduler_type=SchedulerType.LINEAR, warmup_ratio=0.0, warmup_steps=0, logging_dir=runs/Apr28_16-18-25_Devbox4, logging_strategy=IntervalStrategy.STEPS, logging_first_step=False, logging_steps=500, save_strategy=IntervalStrategy.STEPS, save_steps=500, save_total_limit=None, no_cuda=False, seed=42, fp16=False, fp16_opt_level=O1, fp16_backend=auto, fp16_full_eval=False, local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name=tmp/test-mlm, disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None, ignore_data_skip=False, sharded_ddp=[], deepspeed=None, label_smoothing_factor=0.0, adafactor=False, group_by_length=False, length_column_name=length, report_to=['tensorboard', 'wandb'], ddp_find_unused_parameters=None, dataloader_pin_memory=True, skip_memory_metrics=False, use_legacy_prediction_loop=False, push_to_hub=False, _n_gpu=4, mp_parameters=)
04/28/2021 16:18:26 - WARNING - datasets.builder - Using custom data configuration default-b1467a68ec9fe52f
04/28/2021 16:18:27 - WARNING - datasets.builder - Reusing dataset text (/home/A50442/.cache/huggingface/datasets/text/default-b1467a68ec9fe52f/0.0.0/e16f44aa1b321ece1f87b07977cc5d70be93d69b20486d6dacd62e12cf25c9a5)
[INFO|configuration_utils.py:498] 2021-04-28 16:18:27,029 >> loading configuration file roberta-base/config.json
[INFO|configuration_utils.py:536] 2021-04-28 16:18:27,029 >> Model config RobertaConfig {
"architectures": [
"RobertaForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 0,
"eos_token_id": 2,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 514,
"model_type": "roberta",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 1,
"position_embedding_type": "absolute",
"transformers_version": "4.6.0.dev0",
"type_vocab_size": 1,
"use_cache": true,
"vocab_size": 50265
}
[INFO|configuration_utils.py:498] 2021-04-28 16:18:27,030 >> loading configuration file roberta-base/config.json
[INFO|configuration_utils.py:536] 2021-04-28 16:18:27,030 >> Model config RobertaConfig {
"architectures": [
"RobertaForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 0,
"eos_token_id": 2,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 514,
"model_type": "roberta",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 1,
"position_embedding_type": "absolute",
"transformers_version": "4.6.0.dev0",
"type_vocab_size": 1,
"use_cache": true,
"vocab_size": 50265
}
[INFO|tokenization_utils_base.py:1649] 2021-04-28 16:18:27,030 >> Didn't find file roberta-base/added_tokens.json. We won't load it.
[INFO|tokenization_utils_base.py:1649] 2021-04-28 16:18:27,030 >> Didn't find file roberta-base/special_tokens_map.json. We won't load it.
[INFO|tokenization_utils_base.py:1649] 2021-04-28 16:18:27,030 >> Didn't find file roberta-base/tokenizer_config.json. We won't load it.
[INFO|tokenization_utils_base.py:1713] 2021-04-28 16:18:27,030 >> loading file roberta-base/vocab.json
[INFO|tokenization_utils_base.py:1713] 2021-04-28 16:18:27,030 >> loading file roberta-base/merges.txt
[INFO|tokenization_utils_base.py:1713] 2021-04-28 16:18:27,031 >> loading file roberta-base/tokenizer.json
[INFO|tokenization_utils_base.py:1713] 2021-04-28 16:18:27,031 >> loading file None
[INFO|tokenization_utils_base.py:1713] 2021-04-28 16:18:27,031 >> loading file None
[INFO|tokenization_utils_base.py:1713] 2021-04-28 16:18:27,031 >> loading file None
[INFO|modeling_utils.py:1111] 2021-04-28 16:18:27,103 >> loading weights file roberta-base/pytorch_model.bin
[INFO|modeling_utils.py:1257] 2021-04-28 16:18:30,300 >> All model checkpoint weights were used when initializing RobertaForMaskedLM.
[INFO|modeling_utils.py:1266] 2021-04-28 16:18:30,300 >> All the weights of RobertaForMaskedLM were initialized from the model checkpoint at roberta-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use RobertaForMaskedLM for predictions without further training.
100%|██████████████████████████████████████████████████████████████████████████████████████| 37/37 [00:01<00:00, 18.82ba/s]
100%|████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 20.73ba/s]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[INFO|trainer.py:1027] 2021-04-28 16:18:34,809 >> Loading model from roberta-base).
Traceback (most recent call last):
File "run_mlm.py", line 496, in <module>
main()
File "run_mlm.py", line 459, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/A50442/anaconda3/envs/transformer/lib/python3.6/site-packages/transformers/trainer.py", line 1046, in train
self.model.load_state_dict(state_dict)
File "/home/A50442/anaconda3/envs/transformer/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1224, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for RobertaForMaskedLM:
Missing key(s) in state_dict: "roberta.embeddings.position_ids", "lm_head.decoder.bias".
Unexpected key(s) in state_dict: "roberta.pooler.dense.weight", "roberta.pooler.dense.bias".
Expected behavior
The expected behavior is that I will get a new pretrain language model based on my dataset
The text was updated successfully, but these errors were encountered:
The command runs for me and according to your logs, the Trainer is loading a local checkpoint named roberta-base. Do you have a local folder named roberta-base? It looks like it contains a checkpoint different from the actual roberta-base model, which messes up and creates the error. Could you move that folder and try again?
Environment info
transformers
version: 4.6.0.dev0Who can help
@sgugger
Information
Model I am using roberta:
The problem arises when using:
The tasks I am working on is:
(https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/)
To reproduce
Steps to reproduce the behavior:
I follow the example
https://github.com/huggingface/transformers/tree/master/examples/pytorch/language-modeling
When I run
and the error occurs
Expected behavior
The expected behavior is that I will get a new pretrain language model based on my dataset
The text was updated successfully, but these errors were encountered: