Problem in Multi card finetune #18

liuyifan22 · 2024-01-19T07:30:51Z

Hello! I am fascinated by your great idea and have been experimenting with your code, but I found that there might be some problems with your function of multicard finetuning:
if "--launcher" is set to none and set two or more GPUs like CUDA_VISIBLE_DEVICES=0,1, NaN problems will occur in the first epoch"NaN or Inf found in input tensor"
if "--launcher" is set to "pytorch", errors about environmental variables like "RANK" not defined or "WORLD_SIZE" not define will be raised. In the corresponding block, I found a "TO DO"

Have you met the problem when doing the experiment yourselves? Please tell me how it shall be solved, and how tour DDP can be used? Thanks!

CGuangyan-BIT · 2024-05-15T06:30:18Z

Hi!
Data parallel should be feasible; you can give it a try

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem in Multi card finetune #18

Problem in Multi card finetune #18

liuyifan22 commented Jan 19, 2024

CGuangyan-BIT commented May 15, 2024

Problem in Multi card finetune #18

Problem in Multi card finetune #18

Comments

liuyifan22 commented Jan 19, 2024

CGuangyan-BIT commented May 15, 2024