Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x6 and 7x4096) #144

Closed
Nikitala0014 opened this issue Mar 23, 2023 · 4 comments

Comments

@Nikitala0014
Copy link

I have two NVIDIA A10 24GB. When I start the finetune.py script I recieve this error on batch size 24. Can you help me understand what is the real problem? Thank's.

@chiayisu
Copy link

I encountered this errror as well. However, in my case, after I added "os.environ["CUDA_VISIBLE_DEVICES"] = "1" to restrict to use only one gpu, the problem was resolved.

@Nikitala0014
Copy link
Author

Thank you for your response. Indeed, this solved not only this problem, but for some reason only with this approach to launching on one gpu, 24GB of memory is enough for training.

@AngainorDev
Copy link
Contributor

See this other - same - issue and answers

#8 (comment)

Training on multiple GPUs is possible with torchrun, you'll double batch size and half the training time.
Take care of veryfiying the math by hand so gradient accumulation steps is right.

@Nikitala0014
Copy link
Author

See this other - same - issue and answers

#8 (comment)

Training on multiple GPUs is possible with torchrun, you'll double batch size and half the training time. Take care of veryfiying the math by hand so gradient accumulation steps is right.

Thank you. That's what I need ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants