Training script is not working as is #221

localcitizen · 2024-06-07T15:01:19Z

The training script from the main documentation does not work

if __name__ == '__main__':
    from ragatouille import RAGTrainer
    from ragatouille.utils import get_wikipedia_page

    pairs = [
        ("What is the meaning of life ?", "The meaning of life is 42"),
        ("What is Neural Search?", "Neural Search is a terms referring to a family of ...")
    ]

    my_full_corpus = [get_wikipedia_page("Hayao_Miyazaki"), get_wikipedia_page("Studio_Ghibli")]

    trainer = RAGTrainer(model_name="MyFineTunedColBERT",
                         pretrained_model_name="colbert-ir/colbertv1.9",
                         n_usable_gpus=-1)  # In this example, we run fine-tuning

    # This step handles all the data processing, check the examples for more details!
    trainer.prepare_training_data(raw_data=pairs,
                                  data_out_path="./data/",
                                  all_documents=my_full_corpus)

    trainer.train(batch_size=32)  # Train with the default hyperparams

The output is following:

/Users/user/Documents/dev/clbt_env/bin/python /Users/user/Documents/dev/data/Colbert_training.py 
Loading Hard Negative SimpleMiner dense embedding model BAAI/bge-small-en-v1.5...
/Users/user/Documents/dev/clbt_env/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Building hard negative index for 4 documents...
All documents embedded, now adding to index...
save_index set to False, skipping saving hard negative index
Hard negative index generated
Warning: No training triplets were generated with setting mine_hard_negatives=='True'. This may be due to the data being too small or the hard negative miner not being able to find enough hard negatives.
#> Starting...
/Users/user/Documents/dev/clbt_env/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
{

    "query_token_id": "[unused0]",
    "doc_token_id": "[unused1]",
    "query_token": "[Q]",
    "doc_token": "[D]",
    "ncells": null,
    "centroid_score_threshold": null,
    "ndocs": null,
    "load_index_with_mmap": false,
    "index_path": null,
    "index_bsize": 64,
    "nbits": 2,
    "kmeans_niters": 20,
    "resume": false,
    "similarity": "cosine",
    "bsize": 32,
    "accumsteps": 1,
    "lr": 5e-6,
    "maxsteps": 500000,
    "save_every": 0,
    "warmup": 0,
    "warmup_bert": null,
    "relu": false,
    "nway": 2,
    "use_ib_negatives": true,
    "reranker": false,
    "distillation_alpha": 1.0,
    "ignore_scores": false,
    "model_name": "MyFineTunedColBERT",
    "query_maxlen": 32,
    "attend_to_mask_tokens": false,
    "interaction": "colbert",
    "dim": 128,
    "doc_maxlen": 256,
    "mask_punctuation": true,
    "checkpoint": "colbert-ir\/colbertv2.0",
    "triples": "data\/triples.train.colbert.jsonl",
    "collection": "data\/corpus.train.colbert.tsv",
    "queries": "data\/queries.train.colbert.tsv",
    "index_name": null,
    "overwrite": false,
    "root": ".ragatouille\/",
    "experiment": "colbert",
    "index_root": null,
    "name": "2024-06\/07\/16.39.09",
    "rank": 0,
    "nranks": 1,
    "amp": true,
    "gpus": 0,
    "avoid_fork_if_possible": false
}
Using config.bsize = 32 (per process) and config.accumsteps = 1
[Jun 07, 16:39:16] #> Loading the queries from data/queries.train.colbert.tsv ...
[Jun 07, 16:39:16] #> Got 2 queries. All QIDs are unique.

[Jun 07, 16:39:16] #> Loading collection...
0M 
[Jun 07, 16:39:18] Loading segmented_maxsim_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...
Process Process-1:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/user/Documents/dev/clbt_env/lib/python3.10/site-packages/colbert/infra/launcher.py", line 134, in setup_new_process
    return_val = callee(config, *args)
  File "/Users/user/Documents/dev/clbt_env/lib/python3.10/site-packages/colbert/training/training.py", line 55, in train
    colbert = torch.nn.parallel.DistributedDataParallel(colbert, device_ids=[config.rank],
  File "/Users/user/Documents/dev/clbt_env/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 705, in __init__
    self._log_and_throw(
  File "/Users/user/Documents/dev/clbt_env/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1089, in _log_and_throw
    raise err_type(err_msg)
ValueError: DistributedDataParallel device_ids and output_device arguments only work with single-device/multiple-device GPU modules or CPU modules, but got device_ids [0], output_device 0, and module parameters {device(type='cpu')}.

I found the couple points related to the ValueError problem:

Unfortunately, I could not find how it might help to tackle the problem

python libs:
RAGatouille==0.0.8.post2
colbert-ai==0.2.19
transformers==4.41.2
torch==2.3.0

Workstation:
MacBook Pro
Chip Apple M1 Pro
MacOS 14.5 (23F79)

Could you share how it could be resolved, please?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training script is not working as is #221

Training script is not working as is #221

localcitizen commented Jun 7, 2024

Training script is not working as is #221

Training script is not working as is #221

Comments

localcitizen commented Jun 7, 2024