You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The training script from the main documentation does not work
if __name__ == '__main__':
from ragatouille import RAGTrainer
from ragatouille.utils import get_wikipedia_page
pairs = [
("What is the meaning of life ?", "The meaning of life is 42"),
("What is Neural Search?", "Neural Search is a terms referring to a family of ...")
]
my_full_corpus = [get_wikipedia_page("Hayao_Miyazaki"), get_wikipedia_page("Studio_Ghibli")]
trainer = RAGTrainer(model_name="MyFineTunedColBERT",
pretrained_model_name="colbert-ir/colbertv1.9",
n_usable_gpus=-1) # In this example, we run fine-tuning
# This step handles all the data processing, check the examples for more details!
trainer.prepare_training_data(raw_data=pairs,
data_out_path="./data/",
all_documents=my_full_corpus)
trainer.train(batch_size=32) # Train with the default hyperparams
The output is following:
/Users/user/Documents/dev/clbt_env/bin/python /Users/user/Documents/dev/data/Colbert_training.py
Loading Hard Negative SimpleMiner dense embedding model BAAI/bge-small-en-v1.5...
/Users/user/Documents/dev/clbt_env/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Building hard negative index for 4 documents...
All documents embedded, now adding to index...
save_index set to False, skipping saving hard negative index
Hard negative index generated
Warning: No training triplets were generated with setting mine_hard_negatives=='True'. This may be due to the data being too small or the hard negative miner not being able to find enough hard negatives.
#> Starting...
/Users/user/Documents/dev/clbt_env/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
{
"query_token_id": "[unused0]",
"doc_token_id": "[unused1]",
"query_token": "[Q]",
"doc_token": "[D]",
"ncells": null,
"centroid_score_threshold": null,
"ndocs": null,
"load_index_with_mmap": false,
"index_path": null,
"index_bsize": 64,
"nbits": 2,
"kmeans_niters": 20,
"resume": false,
"similarity": "cosine",
"bsize": 32,
"accumsteps": 1,
"lr": 5e-6,
"maxsteps": 500000,
"save_every": 0,
"warmup": 0,
"warmup_bert": null,
"relu": false,
"nway": 2,
"use_ib_negatives": true,
"reranker": false,
"distillation_alpha": 1.0,
"ignore_scores": false,
"model_name": "MyFineTunedColBERT",
"query_maxlen": 32,
"attend_to_mask_tokens": false,
"interaction": "colbert",
"dim": 128,
"doc_maxlen": 256,
"mask_punctuation": true,
"checkpoint": "colbert-ir\/colbertv2.0",
"triples": "data\/triples.train.colbert.jsonl",
"collection": "data\/corpus.train.colbert.tsv",
"queries": "data\/queries.train.colbert.tsv",
"index_name": null,
"overwrite": false,
"root": ".ragatouille\/",
"experiment": "colbert",
"index_root": null,
"name": "2024-06\/07\/16.39.09",
"rank": 0,
"nranks": 1,
"amp": true,
"gpus": 0,
"avoid_fork_if_possible": false
}
Using config.bsize = 32 (per process) and config.accumsteps = 1
[Jun 07, 16:39:16] #> Loading the queries from data/queries.train.colbert.tsv ...
[Jun 07, 16:39:16] #> Got 2 queries. All QIDs are unique.
[Jun 07, 16:39:16] #> Loading collection...
0M
[Jun 07, 16:39:18] Loading segmented_maxsim_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...
Process Process-1:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/user/Documents/dev/clbt_env/lib/python3.10/site-packages/colbert/infra/launcher.py", line 134, in setup_new_process
return_val = callee(config, *args)
File "/Users/user/Documents/dev/clbt_env/lib/python3.10/site-packages/colbert/training/training.py", line 55, in train
colbert = torch.nn.parallel.DistributedDataParallel(colbert, device_ids=[config.rank],
File "/Users/user/Documents/dev/clbt_env/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 705, in __init__
self._log_and_throw(
File "/Users/user/Documents/dev/clbt_env/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1089, in _log_and_throw
raise err_type(err_msg)
ValueError: DistributedDataParallel device_ids and output_device arguments only work with single-device/multiple-device GPU modules or CPU modules, but got device_ids [0], output_device 0, and module parameters {device(type='cpu')}.
I found the couple points related to the ValueError problem:
The training script from the main documentation does not work
The output is following:
I found the couple points related to the ValueError problem:
Unfortunately, I could not find how it might help to tackle the problem
python libs:
RAGatouille==0.0.8.post2
colbert-ai==0.2.19
transformers==4.41.2
torch==2.3.0
Workstation:
MacBook Pro
Chip Apple M1 Pro
MacOS 14.5 (23F79)
Could you share how it could be resolved, please?
The text was updated successfully, but these errors were encountered: