You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why is the parameter count only 0.03%, yet the memory usage during training reaches over 60 GB, whereas Lora training usually requires only around 17 GB?
The text was updated successfully, but these errors were encountered:
frankaging
changed the title
GPU Memory usage issue
[P1] GPU Memory usage issue
Sep 12, 2024
Hey @TranscenderNing Thanks for your interests. What is your running arguments for both LoRA and ReFT? I think it depends on batch size per device, and whether you run with other FSDP, floating precision, etc..
@PinetreePantry ping Peter here, we also did a memory profiling for LoRA and ReFT - by using the same parameters, LoRA and ReFT have similar MEM profile while ReFT is lowering the utilization due to less FLOPs are required for performing position-based interventions.
When I was playing around with Reft I also met issues that it sometimes uses high GPU mem. I suggest not to use “padding = a fixed high value” - that will bloat GPU mem up a lot. Maybe try “padding = longest”, “padding = True”, “padding = False” gradually. You may see a reduction of GPU mem usage.
Why is the parameter count only 0.03%, yet the memory usage during training reaches over 60 GB, whereas Lora training usually requires only around 17 GB?
The text was updated successfully, but these errors were encountered: