What type of file should this path leads to: '-pm_p': '--pretrained_model_path' #2

wanxinxie · 2021-02-24T06:34:30Z

Hi,
Thank you so much for this library! When I tried with a bert model downloaded in google-search/bert, I don't know which file should I use for '-pm_p' argument.

Ther are 4 files in the model folder:
bert_config.json
bert_model.ckpt.data-00000-of-00001
bert_model.ckpt.index
vocab.txt

I tried:

python3 Pretraining.py -e 1   -b 256   -sp ./storage   -dv 0 1 2 3 -lr 1e-04   -str exBERT   -config ./uncased_L-2_H-128_A-2/bert_config.json ./config_and_vocab/exBERT/bert_config_ex_s3.json   -vocab ./config_and_vocab/exBERT/exBERT_vocab.txt   -pm_p ./uncased_L-2_H-128_A-2/bert_model.ckpt.index   -dp ./example_data.pkl   -ls 128   -p 1

It gives an error:

Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
epochs: 1
batchsize: 256
save_path: ./storage
device: [0, 1, 2, 3]
learning_rate: 0.0001
strategy: exBERT
config: ['./uncased_L-2_H-128_A-2/bert_config.json', './config_and_vocab/exBERT/bert_config_ex_s3.json']
vocab: ./config_and_vocab/exBERT/exBERT_vocab.txt
pretrained_model_path: ./uncased_L-2_H-128_A-2/bert_model.ckpt.index
datat_path: ./example_data.pkl
longest_sentence: 128
percentage: 1.0
sep: 1
warmup: -1
train_extension_only: True
training with GPU: [0, 1, 2, 3]
Building PyTorch model from configuration: {
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 128,
  "initializer_range": 0.02,
  "intermediate_size": 512,
  "max_position_embeddings": 512,
  "num_attention_heads": 2,
  "num_hidden_layers": 2,
  "type_vocab_size": 2,
  "vocab_size": 30522
}

Building PyTorch model from configuration: {
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 252,
  "initializer_range": 0.02,
  "intermediate_size": 1024,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "type_vocab_size": 2,
  "vocab_size": 17658
}

Traceback (most recent call last):
  File "Pretraining.py", line 120, in <module>
    stat_dict = t.load(args['pretrained_model_path'], map_location='cpu')
  File "/Users/catina/opt/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/Users/catina/opt/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 692, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\x00'.

I also tried with "bert_model.ckpt.data-00000-of-00001", but it gives a similar error ending with

_pickle.UnpicklingError: invalid load key, '\x9d'.

I really appreciate any help! Thanks in advance!

The text was updated successfully, but these errors were encountered:

taiwen97 · 2021-03-29T03:20:28Z

Hi, the previous version only supports loading pre-trained "torch" weight, a little update is made to enable loading pre-trained "tf" weight with the input argument "-pm_p_tf"
pleas try:
python3 Pretraining.py -e 1 -b 256 -sp ./storage -dv 0 1 2 3 -lr 1e-04 -str exBERT -config ./uncased_L-2_H-128_A-2/bert_config.json ./config_and_vocab/exBERT/bert_config_ex_s3.json -vocab ./config_and_vocab/exBERT/exBERT_vocab.txt -pm_p ./uncased_L-2_H-128_A-2/bert_model.ckpt -dp ./example_data.pkl -ls 128 -p 1
thanks!

Llin1785361283 · 2022-04-07T01:59:29Z

@taiwen97 Hello, thank you very much for providing this library! Like the questioner's question above, I tried to run it with the command you provided above, but the bert model downloaded does not contain bert_ model. ckpt，so which file should be used as the "- pm_p" parameter

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What type of file should this path leads to: '-pm_p': '--pretrained_model_path' #2

What type of file should this path leads to: '-pm_p': '--pretrained_model_path' #2

wanxinxie commented Feb 24, 2021

taiwen97 commented Mar 29, 2021

Llin1785361283 commented Apr 7, 2022

What type of file should this path leads to: '-pm_p': '--pretrained_model_path' #2

What type of file should this path leads to: '-pm_p': '--pretrained_model_path' #2

Comments

wanxinxie commented Feb 24, 2021

taiwen97 commented Mar 29, 2021

Llin1785361283 commented Apr 7, 2022