You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.
Bug description
I'm trying to fine-tune the poly-encoder from the poly_model_huge_reddit model. I'm getting a size mismatch error on the embeddings because (I think) a new dictionary is being generated instead of reusing dictionary from the init model.
Reproduction steps
Here is the command I'm using to run:
Expected behavior
This should reuse the dictionary from data/models/pretrained_transformers/poly_model_huge_reddit/model.dict.
Logs
It looks like a new dictionary is being built (with 18899 words instead of the ~55k words from the reddit model):
2020-07-30 08:55:24,058 INFO | building dictionary first...
2020-07-30 08:55:24,066 INFO | creating task(s): convai2
2020-07-30 08:55:24,074 INFO | loading fbdialog data: ../aloha/data/sdlong_config1/fold1_l4390/ConvAI2/train_self_original.txt
Building dictionary: 95%|██████020-07-30 08:56:00,031 INFO | Saving dictionary to ../aloha/data/paper_model_no_finetune/model.dict
2020-07-30 08:56:00,082 INFO | dictionary built with 18899 tokens in 0.0s
2020-07-30 08:56:00,084 INFO | No model with opt yet at: ../aloha/data/paper_model_no_finetune/model(.opt)
2020-07-30 08:56:00,210 INFO | Using CUDA
2020-07-30 08:56:00,213 ERROR | You set --fp16 true with --fp16-impl apex, but fp16 with apex is unavailable. To use apex fp16, please install APEX from https://github.com/NVIDIA/apex.
2020-07-30 08:56:00,213 INFO | loading dictionary from data/models/pretrained_transformers/poly_model_huge_reddit/model.dict
2020-07-30 08:56:00,267 INFO | num words = 18899
2020-07-30 08:56:03,271 INFO | Total parameters: 200,724,480 (200,724,480 trainable)
2020-07-30 08:56:03,271 INFO | Loading existing model parameters from data/models/pretrained_transformers/poly_model_huge_reddit/model
Building dictionary: 100%|██████████| 131k/131k [00:08<00:00, 14.7kex/s]
no pair has frequency >= 2. Stopping
Traceback (most recent call last):
File "/auto/nlg-05/naitian/parlai/parlai/core/torch_agent.py", line 1809, in load_state_dict
self.model.load_state_dict(state_dict)
File "/home/nlg-05/naitian/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 847, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for PolyEncoderModule:
size mismatch for encoder_ctxt.embeddings.weight: copying a param with shape torch.Size([54944, 768]) from checkpoint, the shape in current model is torch.Size([18904, 768]).
size mismatch for encoder_cand.embeddings.weight: copying a param with shape torch.Size([54944, 768]) from checkpoint, the shape in current model is torch.Size([18904, 768]).
This seems to be a very similar issue to #2539, but explicitly setting the dict file path didn't work for me.
Thanks!
The text was updated successfully, but these errors were encountered:
Bug description
I'm trying to fine-tune the poly-encoder from the
poly_model_huge_reddit
model. I'm getting a size mismatch error on the embeddings because (I think) a new dictionary is being generated instead of reusing dictionary from the init model.Reproduction steps
Here is the command I'm using to run:
Expected behavior
This should reuse the dictionary from
data/models/pretrained_transformers/poly_model_huge_reddit/model.dict
.Logs
It looks like a new dictionary is being built (with 18899 words instead of the ~55k words from the reddit model):
This seems to be a very similar issue to #2539, but explicitly setting the dict file path didn't work for me.
Thanks!
The text was updated successfully, but these errors were encountered: