Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

AttributeError: 'function' object has no attribute 'retriever' - Training ReGReT model #3927

Closed
fabrahman opened this issue Aug 11, 2021 · 8 comments
Assignees
Labels

Comments

@fabrahman
Copy link

Hi,

when trying to train a regret model using the below command (using a different model for initialization), I get the following error. Should I change anything in the train_model command? Thanks

command:

python parlai/scripts/train_model.py \
--model fid --task ranked_sim \
--indexer-type exact \
--generation-model bart --init-opt arch/bart_large \
--batchsize 16 --fp16 True --gradient-clip 0.1 --label-truncate 128 \
--log-every-n-secs 30 --lr-scheduler reduceonplateau --lr-scheduler-patience 1 \
--model-parallel True --optimizer adam --text-truncate 512 --truncate 512 \
--learningrate 1e-05 --validation-metric-mode min --validation-every-n-epochs 0.25 \
--validation-max-exs 1000 --validation-metric ppl --validation-patience 4 --n-docs 5 \
--model-file ${OUT_DIR}/model \
--regret True --regret-intermediate-maxlen 64 --regret-model-file dpr_outputs/fid_dpr_ranked/model \
--path-to-index /data/backup_tmp/passage_embeddings/my_passages \
--path-to-dpr-passages /data/retrieval_corpus_json/passages.tsv \

error:

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
2021-08-11 06:34:56,586 INFO     | Total parameters: 515,177,984 (514,784,768 trainable)
2021-08-11 06:34:56,586 INFO     | Loading existing model params from /amltdc91a6f0837d50a3bdf062e980f3fe7c/ParlAI/data/models/bart/bart_large/model
2021-08-11 06:35:09,196 WARNING  | Detected a fine-tune run. Resetting the optimizer.
2021-08-11 06:35:09,197 WARNING  | Optimizer was reset. Also resetting LR scheduler.
2021-08-11 06:35:09,732 WARNING  | Sharing retrievers between model and regret model!
Traceback (most recent call last):
  File "parlai/scripts/train_model.py", line 937, in <module>
    TrainModel.main()
  File "/mnt/code/ParlAI/parlai/core/script.py", line 129, in main
    return cls._run_args(None)
  File "/mnt/code/ParlAI/parlai/core/script.py", line 101, in _run_args
    return cls._run_from_parser_and_opt(opt, parser)
  File "/mnt/code/ParlAI/parlai/core/script.py", line 108, in _run_from_parser_and_opt
    return script.run()
  File "parlai/scripts/train_model.py", line 932, in run
    self.train_loop = TrainLoop(self.opt)
  File "parlai/scripts/train_model.py", line 347, in __init__
    self.agent = create_agent(opt)
  File "/mnt/code/ParlAI/parlai/core/agents.py", line 479, in create_agent
    model = model_class(opt)
  File "/mnt/code/ParlAI/parlai/agents/rag/rag.py", line 180, in __init__
    self.regret_model = self.build_regret_model()
  File "/mnt/code/ParlAI/parlai/agents/rag/rag.py", line 213, in build_regret_model
    retriever_shared = self.model.encoder.retriever.share()
AttributeError: 'function' object has no attribute 'retriever'
@klshuster
Copy link
Contributor

Hi, this is indeed a bug, fixing in #3934

@fabrahman
Copy link
Author

fabrahman commented Aug 13, 2021

@klshuster Thank you for the prompt fix. Regarding the newly added argument , I was confused by the help description.

    regret_group.add_argument(
         '--regret-override-index',
         type='bool',
         default=False,
         help='Overrides the index used with the ReGReT model, if using separate models. '
         'I.e., the initial round of retrieval uses the same index as specified for the '
         'second round of retrieval',
     )

If we want to use a separate model for the initial round (using --regret-model-file), then we should set this --regret-override-index arg True?

@klshuster
Copy link
Contributor

This argument will override the index used by the separate model. For example, suppose you have a model trained with a wikipedia index that you want to use via --regret-model-file, but then you want your full model to use a separate index (e.g., a subset of wikipedia). Setting this flag to True will ensure that both models use the same index, as opposed to the --regret-model-file using all of wikipedia and the full model using a subset.

Does that make sense?

@fabrahman
Copy link
Author

Thank @klshuster that make sense now.
btw, I was not able to run the regret model because of some device error (shown below). I tried adding `.to("cuda:0") in the function producing the error but that doesn't solve the issue and I guess that should be added somewhere else. I haven't dig into it. If you already know where this should be added please lmk.

  File "/mnt/code/ParlAI/parlai/core/torch_generator_agent.py", line 734, in train_step
    loss = self.compute_loss(batch)
  File "/mnt/code/ParlAI/parlai/agents/rag/rag.py", line 904, in compute_loss
    model_output = self.get_model_output(batch)
  File "/mnt/code/ParlAI/parlai/agents/rag/rag.py", line 886, in get_model_output
    new_batch = self._regret_rebatchify(batch, regret_preds)  # type: ignore
  File "/mnt/code/ParlAI/parlai/agents/rag/rag.py", line 766, in _regret_rebatchify
    query_i = torch.cat([query_vec[i][: query_lens[i]], query_i], dim=0)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking arugment for argument tensors in method wrapper__cat)

Thanks

@klshuster
Copy link
Contributor

can you please share your command?

@fabrahman
Copy link
Author

can you please share your command?

Sure, here it is:

python parlai/scripts/train_model.py \
--model rag --task my_task \
--indexer-type exact \
--generation-model bart --init-opt arch/bart_large \
--batchsize 16 --fp16 True --gradient-clip 0.1 --label-truncate 128 \
--log-every-n-secs 30 --lr-scheduler reduceonplateau --lr-scheduler-patience 1 \
--model-parallel True --optimizer adam --text-truncate 512 --truncate 512 \
--learningrate 1e-05 --validation-metric-mode min --validation-every-n-epochs 0.25 \
--validation-max-exs 1000 --validation-metric ppl --validation-patience 4 --n-docs 5 \
--model-file ${OUT_DIR}/model \
--regret True --regret-intermediate-maxlen 64 --regret-model-file dpr_outputs/rag_tok/model --regret-override-index True \
--path-to-index /mnt/data/backup_tmp/passage_embeddings/my_passages \
--path-to-dpr-passages /mnt/data/retrieval_corpus_json/my_passages.tsv \

@klshuster
Copy link
Contributor

Sorry for the late reply - I have identified the bug and will be putting up a fix shortly

@github-actions
Copy link

This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants