-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: assert grad_chunk.l2_norm is not None #6102
Comments
For this kind of issue, it makes more sense to share what modifications you made, since we don't support individual changes. |
It's the problem I described. I didn't modify the source code. I used ColossalAI to train qwen2vl. The place where I modified the source code is on the 2 lines of code shown in the second picture, just to print the error details. The following is part of the training code. The error location is ‘optimizer.step()’, as shown in the first picture. ` for step, batch in enumerate(prefetcher, start=st_step):
|
Seems that your code is not the newest version. Could you pull the newest main branch and try again? |
This is a simple piece of code that will give the same error:
|
|
@botbw Any insights? Thanks |
|
Is there an existing issue for this bug?
🐛 Describe the bug
Modify the code to adapt to qwen2vl(transformers.Qwen2VLForConditionalGeneration) and find that the loss can be calculated, but partial chunk : grad_chunk.l2_norm is None.........(LLM is ok)
modified source code to print more information:
result:
Environment
No response
The text was updated successfully, but these errors were encountered: