Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem in Multi card finetune #18

Open
liuyifan22 opened this issue Jan 19, 2024 · 1 comment
Open

Problem in Multi card finetune #18

liuyifan22 opened this issue Jan 19, 2024 · 1 comment

Comments

@liuyifan22
Copy link

Hello! I am fascinated by your great idea and have been experimenting with your code, but I found that there might be some problems with your function of multicard finetuning:
if "--launcher" is set to none and set two or more GPUs like CUDA_VISIBLE_DEVICES=0,1, NaN problems will occur in the first epoch"NaN or Inf found in input tensor"
if "--launcher" is set to "pytorch", errors about environmental variables like "RANK" not defined or "WORLD_SIZE" not define will be raised. In the corresponding block, I found a "TO DO"

Have you met the problem when doing the experiment yourselves? Please tell me how it shall be solved, and how tour DDP can be used? Thanks!

@CGuangyan-BIT
Copy link
Owner

Hi!
Data parallel should be feasible; you can give it a try

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants