[P1] GPU Memory usage issue #136

TranscenderNing · 2024-09-12T01:37:03Z

Why is the parameter count only 0.03%, yet the memory usage during training reaches over 60 GB, whereas Lora training usually requires only around 17 GB?

frankaging · 2024-09-12T16:51:47Z

Hey @TranscenderNing Thanks for your interests. What is your running arguments for both LoRA and ReFT? I think it depends on batch size per device, and whether you run with other FSDP, floating precision, etc..

@PinetreePantry ping Peter here, we also did a memory profiling for LoRA and ReFT - by using the same parameters, LoRA and ReFT have similar MEM profile while ReFT is lowering the utilization due to less FLOPs are required for performing position-based interventions.

PinetreePantry · 2024-09-13T18:04:03Z

When I was playing around with Reft I also met issues that it sometimes uses high GPU mem. I suggest not to use “padding = a fixed high value” - that will bloat GPU mem up a lot. Maybe try “padding = longest”, “padding = True”, “padding = False” gradually. You may see a reduction of GPU mem usage.

frankaging changed the title ~~GPU Memory usage issue~~ [P1] GPU Memory usage issue Sep 12, 2024

frankaging self-assigned this Sep 12, 2024

frankaging added the question Further information is requested label Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[P1] GPU Memory usage issue #136

[P1] GPU Memory usage issue #136

TranscenderNing commented Sep 12, 2024 •

edited

Loading

frankaging commented Sep 12, 2024

PinetreePantry commented Sep 13, 2024

[P1] GPU Memory usage issue #136

[P1] GPU Memory usage issue #136

Comments

TranscenderNing commented Sep 12, 2024 • edited Loading

frankaging commented Sep 12, 2024

PinetreePantry commented Sep 13, 2024

TranscenderNing commented Sep 12, 2024 •

edited

Loading