Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory Error - decrease batch size (?) #9

Closed
manucpbon opened this issue Jul 16, 2024 · 1 comment
Closed

CUDA out of memory Error - decrease batch size (?) #9

manucpbon opened this issue Jul 16, 2024 · 1 comment

Comments

@manucpbon
Copy link

Hi! I am trying to run the inference.py script with the "testing" inputs, but I'm having memory issues.
I'm running it locally, on a computer with GPU NVIDIA GeForce RTX 3060 (max 8192MiB). I'm using the llama2-7B model.
When I run
python inference.py -i ./testing/input/ -o ./testing/output/
I get the following messages

[nltk_data] Downloading package punkt to /home/manuela/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /home/manuela/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.57s/it]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
Loading BioSent2Vec model
model successfully loaded
start phenogpt
CUDA out of memory. Tried to allocate 500.00 MiB. GPU 
Cannot produce results for sample2
CUDA out of memory. Tried to allocate 172.00 MiB. GPU 
Cannot produce results for sample1

Is my GPU memory not enough to run this locally? Is there any way you can help me?

Thanks in advance and congratulations on the amazing paper and results!

@quannguyenminh103
Copy link
Contributor

Hello, thank you so much for using our tool! Unfortunately, I don't think you can be able to run our model on your setup. One way you can reduce the memory usage is to specify "load_8bit=True" in the inference.py script. This allows to run the model with lower precision. However, based on our testing, it requires at least 10GB ~ 9536.74MiB to run the inference if you set "load_8bit=True" and 30GB for "load_8bit=False".
If you have two 8GB GPUs then, the memory usage can be split and you can be able to run it.

@kaichop kaichop closed this as completed Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants