Wired CUDA memory utilization #16

lwmlyy · 2023-08-24T06:52:36Z

Hi, I am using the python launch to lora-finetune Llama2-70b, the training is doing good. But, it seems a bit wired that the memory utilization is quite low, less than 18G. Also, the training speed is relatively slow compared to the codebase in llama-recipes.

The command is:

The gpu status during training is:

arielnlee · 2023-08-24T13:39:44Z

Hi! Thanks for your interest. Have you tried accelerate? That worked for us! The python way also works, but is very slow. Definitely try accelerate, but if you don’t want to I’d at least switch to 4 A100 80gb GPUs.

lwmlyy · 2023-08-24T14:45:56Z

Hi, thanks for your reply. Is there any script I can refer to if I want to try accelerate? Also, do you mean that the python mode runs faster with 4*a100 than 8*a100?---- Replied Message ----FromAriel N. ***@***.***>Date08/24/2023 21:39 ***@***.***> ***@***.***>***@***.***>SubjectRe: [arielnlee/Platypus] Wired CUDA memory utilization (Issue #16) Hi! Thanks for your interest. Have you tried accelerate? That worked for us! The python way also works, but is very slow. Definitely try accelerate, but if you don’t want to I’d at least switch to 4 A100 80gb GPUs. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***> [ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage", "potentialAction": { ***@***.***": "ViewAction", "target": "#16 (comment)", "url": "#16 (comment)", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { ***@***.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

arielnlee · 2023-08-24T14:53:25Z

First run accelerate config to set up accelerate and then replace python finetune.py with accelerate launch finetune.py. If that doesn't work, I'll be happy to get you a script.

To clarify, running python finetune.py will not run as quickly on 4 vs 8 GPUs but when we tried it the native python way, 8 GPUS seemed a bit of a waste, since, as you noticed, utilization isn't great.

lwmlyy · 2023-08-25T02:18:06Z

First run accelerate config to set up accelerate and then replace python finetune.py with accelerate launch finetune.py. If that doesn't work, I'll be happy to get you a script.

To clarify, running python finetune.py will not run as quickly on 4 vs 8 GPUs but when we tried it the native python way, 8 GPUS seemed a bit of a waste, since, as you noticed, utilization isn't great.

I just tried running the script with accelerate launch(8*a100-80gb), but it went CUDA OOM during model loading. Any advice?

The accelerate config is as follow:

The launch config is as follow:

moon-fall · 2023-10-08T04:28:05Z

same problem. I solve this by reinstall the python package with the version in requirement.txt，i think this is relate with the peft package.
but after that still CUDA memory when the cutoff_len is bigger than 1024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wired CUDA memory utilization #16

Wired CUDA memory utilization #16

lwmlyy commented Aug 24, 2023

arielnlee commented Aug 24, 2023

lwmlyy commented Aug 24, 2023 via email

arielnlee commented Aug 24, 2023

lwmlyy commented Aug 25, 2023

moon-fall commented Oct 8, 2023 •

edited

Loading

Wired CUDA memory utilization #16

Wired CUDA memory utilization #16

Comments

lwmlyy commented Aug 24, 2023

arielnlee commented Aug 24, 2023

lwmlyy commented Aug 24, 2023 via email

arielnlee commented Aug 24, 2023

lwmlyy commented Aug 25, 2023

moon-fall commented Oct 8, 2023 • edited Loading

moon-fall commented Oct 8, 2023 •

edited

Loading