Finetuning Llama-2-13B with 1x a100 80gb? torch.cuda.OutOfMemoryError #356
-
I'm trying to finetune Llama-2-13B with 1x a100 80gb, but it gives me torch.cuda.OutOfMemoryError.
|
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 2 replies
-
Full-finetuning 13B model may be too tough for a single A100 80GB 😭 You can try QLoRA, which is optimized for low VRAM usage. To further save the memory, you can try zero3_offload, and see explanations here |
Beta Was this translation helpful? Give feedback.
-
Ah, thanks for your response! Do you know what's the minimum number of a100 80gb gpus required in order to finetune llama-2-13b? I don't have a local machine, so I'm renting a cloud gpu without a persistent boot drive. If I can, I would like to avoid 8x a100 80gb idling during the environment setup. Building and installing Flash-attn alone takes like 30 minutes. Also, could you explain when to point zero2.json vs zero3.json for deepspeed? |
Beta Was this translation helpful? Give feedback.
-
When you have enough VRAM, zero2 is slightly faster than zero3. If you do want to finetune the full model, then maybe 8 GPUs is needed, if you are okay with LoRA or QLoRA, then maybe a single GPU is also fine (I currently do not have free machine to test these).
AFAIK, google cloud and aws ec2 both can allow you to "persist" disk, so that it is not deleted after the server being terminated. You may also create custom image based on the disk state. So you can basically compile everything on a single-GPU server, make sure things are running properly, and create a custom image. Next time, create new instances from the custom image, instead of the default image. All packages will be there and you do not need to compile from scratch. You can search about this using keywords like "google cloud create custom image". |
Beta Was this translation helpful? Give feedback.
-
Thanks so much for the info. |
Beta Was this translation helpful? Give feedback.
-
That's also a great option! Thanks!!! |
Beta Was this translation helpful? Give feedback.
Full-finetuning 13B model may be too tough for a single A100 80GB 😭
You can try QLoRA, which is optimized for low VRAM usage.
To further save the memory, you can try zero3_offload, and see explanations here