-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Taking too much memory on multiple GPUs #872
Comments
I have the same problem, it works with 2 GPUs and tensor-parallel-size 2, but gives OOM with 4. same model Llama-2-13b |
i have the same question too |
same here when using four gpus, any solution? |
it's a bug to be fix,#322 |
I met the same issue and figured out how to fix it. Already created a PR #1395 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I am trying to load a llama13b model on a machine with 4 16GB V100 GPUs (Combined 64 GB GPU memory), 64 GB memory and 16 CPUs.
This is the command I am using:
However I am running into OutOfMemoryError:
The funny thing is when I try to run the same model on a single 40GB A100 GPU, it runs without any issue.
Can anyone tell me whats going on?
any help is appreciated
The text was updated successfully, but these errors were encountered: