Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training loss is NaN now. #17

Open
Strive21 opened this issue Dec 6, 2024 · 12 comments
Open

Training loss is NaN now. #17

Strive21 opened this issue Dec 6, 2024 · 12 comments

Comments

@Strive21
Copy link

Strive21 commented Dec 6, 2024

When will the latest version of the code and data processing code be released?

@SHYuanBest
Copy link
Member

Thanks for your interest. Is the training loss NaN at the beginning, what dataset did you use? The latest version of the code may not be released so soon. We will prioritize the release of data processing code and the integration of ConsisID into diffusers.

@Strive21
Copy link
Author

Strive21 commented Dec 6, 2024

Thanks for your interest. Is the training loss NaN at the beginning, what dataset did you use? The latest version of the code may not be released so soon. We will prioritize the release of data processing code and the integration of ConsisID into diffusers.

I downloaded your dataset and processed it appropriately, using CogvideoX-5B-I2V to initialize the weights,which bs is 5 and lr is 3e-7. It has loss in the initial training, but NaN appears after about 500 iterations. Is it because I processed the data wrong? And “fail to detect face using insightface, extract embedding on align face“ occurs during training。

@SHYuanBest
Copy link
Member

Oh, I see. This may be a problem with MM-DiT. The training is very unstable because the activation value of the middle layer may be very large, resulting in loss NaN. You can try turning on EMA, gradient accumulation, increasing batchsize, and reducing the learning rate. Another method is to add a regularization term to the output of the middle layer.

@SHYuanBest
Copy link
Member

fail to detect face using insightface, extract embedding on align face this warning cannot be avoided because facexlib may not be able to detect the face, and the code will automatically skip this training sample.

@SHYuanBest
Copy link
Member

or you can try to train only LoRA instead of all parameters.

Oh, I see. This may be a problem with MM-DiT. The training is very unstable because the activation value of the middle layer may be very large, resulting in loss NaN. You can try turning on EMA, gradient accumulation, increasing batchsize, and reducing the learning rate. Another method is to add a regularization term to the output of the middle layer.

@SHYuanBest
Copy link
Member

When will the latest version of the code and data processing code be released?

we have release the data processing code, please refer to here for more details.

@Strive21
Copy link
Author

Strive21 commented Dec 9, 2024

When will the latest version of the code and data processing code be released?

we have release the data processing code, please refer to here for more details.

Thank you!I'll give it a try.

@glimmer16
Copy link

Thanks for your interest. Is the training loss NaN at the beginning, what dataset did you use? The latest version of the code may not be released so soon. We will prioritize the release of data processing code and the integration of ConsisID into diffusers.

I downloaded your dataset and processed it appropriately, using CogvideoX-5B-I2V to initialize the weights,which bs is 5 and lr is 3e-7. It has loss in the initial training, but NaN appears after about 500 iterations. Is it because I processed the data wrong? And “fail to detect face using insightface, extract embedding on align face“ occurs during training。

Hi! Have you solved this problem? I meet the same issue and wonder which way to avoid loss NaN.

@SHYuanBest
Copy link
Member

Thanks for your interest. Is the training loss NaN at the beginning, what dataset did you use? The latest version of the code may not be released so soon. We will prioritize the release of data processing code and the integration of ConsisID into diffusers.

I downloaded your dataset and processed it appropriately, using CogvideoX-5B-I2V to initialize the weights,which bs is 5 and lr is 3e-7. It has loss in the initial training, but NaN appears after about 500 iterations. Is it because I processed the data wrong? And “fail to detect face using insightface, extract embedding on align face“ occurs during training。

Hi! Have you solved this problem? I meet the same issue and wonder which way to avoid loss NaN.

You may need to construct a higher dataset to continue finetuning ConsisID, or have a larger batch size. Since ConsisID is trained on a higher quality internal dataset, if it continues to be trained on the ConsisID-Preview-Data, it is likely to get worse.

@SHYuanBest
Copy link
Member

Or you can load the ckpt of CogVideoX-5B-I2V for training IPT2V from scratch. (Instead of load ConsisID-Preview for continue finetuning.)

@SHYuanBest
Copy link
Member

Some solutions can refer to #31.

@Strive21
Copy link
Author

Thanks for your interest. Is the training loss NaN at the beginning, what dataset did you use? The latest version of the code may not be released so soon. We will prioritize the release of data processing code and the integration of ConsisID into diffusers.

I downloaded your dataset and processed it appropriately, using CogvideoX-5B-I2V to initialize the weights,which bs is 5 and lr is 3e-7. It has loss in the initial training, but NaN appears after about 500 iterations. Is it because I processed the data wrong? And “fail to detect face using insightface, extract embedding on align face“ occurs during training。

Hi! Have you solved this problem? I meet the same issue and wonder which way to avoid loss NaN.

I try a larger batch size and solve this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants