AttributeError: 'function' object has no attribute 'retriever' - Training ReGReT model #3927

fabrahman · 2021-08-11T06:49:55Z

Hi,

when trying to train a regret model using the below command (using a different model for initialization), I get the following error. Should I change anything in the train_model command? Thanks

command:

python parlai/scripts/train_model.py \
--model fid --task ranked_sim \
--indexer-type exact \
--generation-model bart --init-opt arch/bart_large \
--batchsize 16 --fp16 True --gradient-clip 0.1 --label-truncate 128 \
--log-every-n-secs 30 --lr-scheduler reduceonplateau --lr-scheduler-patience 1 \
--model-parallel True --optimizer adam --text-truncate 512 --truncate 512 \
--learningrate 1e-05 --validation-metric-mode min --validation-every-n-epochs 0.25 \
--validation-max-exs 1000 --validation-metric ppl --validation-patience 4 --n-docs 5 \
--model-file ${OUT_DIR}/model \
--regret True --regret-intermediate-maxlen 64 --regret-model-file dpr_outputs/fid_dpr_ranked/model \
--path-to-index /data/backup_tmp/passage_embeddings/my_passages \
--path-to-dpr-passages /data/retrieval_corpus_json/passages.tsv \

error:

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
2021-08-11 06:34:56,586 INFO     | Total parameters: 515,177,984 (514,784,768 trainable)
2021-08-11 06:34:56,586 INFO     | Loading existing model params from /amltdc91a6f0837d50a3bdf062e980f3fe7c/ParlAI/data/models/bart/bart_large/model
2021-08-11 06:35:09,196 WARNING  | Detected a fine-tune run. Resetting the optimizer.
2021-08-11 06:35:09,197 WARNING  | Optimizer was reset. Also resetting LR scheduler.
2021-08-11 06:35:09,732 WARNING  | Sharing retrievers between model and regret model!
Traceback (most recent call last):
  File "parlai/scripts/train_model.py", line 937, in <module>
    TrainModel.main()
  File "/mnt/code/ParlAI/parlai/core/script.py", line 129, in main
    return cls._run_args(None)
  File "/mnt/code/ParlAI/parlai/core/script.py", line 101, in _run_args
    return cls._run_from_parser_and_opt(opt, parser)
  File "/mnt/code/ParlAI/parlai/core/script.py", line 108, in _run_from_parser_and_opt
    return script.run()
  File "parlai/scripts/train_model.py", line 932, in run
    self.train_loop = TrainLoop(self.opt)
  File "parlai/scripts/train_model.py", line 347, in __init__
    self.agent = create_agent(opt)
  File "/mnt/code/ParlAI/parlai/core/agents.py", line 479, in create_agent
    model = model_class(opt)
  File "/mnt/code/ParlAI/parlai/agents/rag/rag.py", line 180, in __init__
    self.regret_model = self.build_regret_model()
  File "/mnt/code/ParlAI/parlai/agents/rag/rag.py", line 213, in build_regret_model
    retriever_shared = self.model.encoder.retriever.share()
AttributeError: 'function' object has no attribute 'retriever'

The text was updated successfully, but these errors were encountered:

klshuster · 2021-08-12T19:16:30Z

Hi, this is indeed a bug, fixing in #3934

fabrahman · 2021-08-13T18:57:32Z

@klshuster Thank you for the prompt fix. Regarding the newly added argument , I was confused by the help description.

    regret_group.add_argument(
         '--regret-override-index',
         type='bool',
         default=False,
         help='Overrides the index used with the ReGReT model, if using separate models. '
         'I.e., the initial round of retrieval uses the same index as specified for the '
         'second round of retrieval',
     )

If we want to use a separate model for the initial round (using --regret-model-file), then we should set this --regret-override-index arg True?

klshuster · 2021-08-16T16:45:48Z

This argument will override the index used by the separate model. For example, suppose you have a model trained with a wikipedia index that you want to use via --regret-model-file, but then you want your full model to use a separate index (e.g., a subset of wikipedia). Setting this flag to True will ensure that both models use the same index, as opposed to the --regret-model-file using all of wikipedia and the full model using a subset.

Does that make sense?

fabrahman · 2021-08-20T21:22:24Z

Thank @klshuster that make sense now.
btw, I was not able to run the regret model because of some device error (shown below). I tried adding `.to("cuda:0") in the function producing the error but that doesn't solve the issue and I guess that should be added somewhere else. I haven't dig into it. If you already know where this should be added please lmk.

  File "/mnt/code/ParlAI/parlai/core/torch_generator_agent.py", line 734, in train_step
    loss = self.compute_loss(batch)
  File "/mnt/code/ParlAI/parlai/agents/rag/rag.py", line 904, in compute_loss
    model_output = self.get_model_output(batch)
  File "/mnt/code/ParlAI/parlai/agents/rag/rag.py", line 886, in get_model_output
    new_batch = self._regret_rebatchify(batch, regret_preds)  # type: ignore
  File "/mnt/code/ParlAI/parlai/agents/rag/rag.py", line 766, in _regret_rebatchify
    query_i = torch.cat([query_vec[i][: query_lens[i]], query_i], dim=0)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking arugment for argument tensors in method wrapper__cat)

Thanks

klshuster · 2021-08-20T21:59:25Z

can you please share your command?

fabrahman · 2021-08-20T22:08:20Z

can you please share your command?

Sure, here it is:

python parlai/scripts/train_model.py \
--model rag --task my_task \
--indexer-type exact \
--generation-model bart --init-opt arch/bart_large \
--batchsize 16 --fp16 True --gradient-clip 0.1 --label-truncate 128 \
--log-every-n-secs 30 --lr-scheduler reduceonplateau --lr-scheduler-patience 1 \
--model-parallel True --optimizer adam --text-truncate 512 --truncate 512 \
--learningrate 1e-05 --validation-metric-mode min --validation-every-n-epochs 0.25 \
--validation-max-exs 1000 --validation-metric ppl --validation-patience 4 --n-docs 5 \
--model-file ${OUT_DIR}/model \
--regret True --regret-intermediate-maxlen 64 --regret-model-file dpr_outputs/rag_tok/model --regret-override-index True \
--path-to-index /mnt/data/backup_tmp/passage_embeddings/my_passages \
--path-to-dpr-passages /mnt/data/retrieval_corpus_json/my_passages.tsv \

klshuster · 2021-09-16T18:59:23Z

Sorry for the late reply - I have identified the bug and will be putting up a fix shortly

github-actions · 2021-10-17T00:05:23Z

This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.

fabrahman mentioned this issue Aug 12, 2021

How can I train a BART FiD model on custom data with gold retrieved passages? #3872

Closed

klshuster self-assigned this Aug 12, 2021

klshuster mentioned this issue Aug 12, 2021

[RAG] Fix ReGReT #3934

Merged

klshuster mentioned this issue Sep 16, 2021

[RAG] Fix ReGReT Cuda Issue #4022

Merged

github-actions bot added the stale label Oct 17, 2021

github-actions bot closed this as completed Oct 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: 'function' object has no attribute 'retriever' - Training ReGReT model #3927

AttributeError: 'function' object has no attribute 'retriever' - Training ReGReT model #3927

fabrahman commented Aug 11, 2021

klshuster commented Aug 12, 2021

fabrahman commented Aug 13, 2021 •

edited

Loading

klshuster commented Aug 16, 2021

fabrahman commented Aug 20, 2021

klshuster commented Aug 20, 2021

fabrahman commented Aug 20, 2021

klshuster commented Sep 16, 2021

github-actions bot commented Oct 17, 2021

AttributeError: 'function' object has no attribute 'retriever' - Training ReGReT model #3927

AttributeError: 'function' object has no attribute 'retriever' - Training ReGReT model #3927

Comments

fabrahman commented Aug 11, 2021

klshuster commented Aug 12, 2021

fabrahman commented Aug 13, 2021 • edited Loading

klshuster commented Aug 16, 2021

fabrahman commented Aug 20, 2021

klshuster commented Aug 20, 2021

fabrahman commented Aug 20, 2021

klshuster commented Sep 16, 2021

github-actions bot commented Oct 17, 2021

fabrahman commented Aug 13, 2021 •

edited

Loading