Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[P1] GPU Memory usage issue #136

Open
TranscenderNing opened this issue Sep 12, 2024 · 2 comments
Open

[P1] GPU Memory usage issue #136

TranscenderNing opened this issue Sep 12, 2024 · 2 comments
Assignees
Labels
question Further information is requested

Comments

@TranscenderNing
Copy link

TranscenderNing commented Sep 12, 2024

image
Why is the parameter count only 0.03%, yet the memory usage during training reaches over 60 GB, whereas Lora training usually requires only around 17 GB?

@frankaging frankaging changed the title GPU Memory usage issue [P1] GPU Memory usage issue Sep 12, 2024
@frankaging frankaging self-assigned this Sep 12, 2024
@frankaging frankaging added the question Further information is requested label Sep 12, 2024
@frankaging
Copy link
Collaborator

Hey @TranscenderNing Thanks for your interests. What is your running arguments for both LoRA and ReFT? I think it depends on batch size per device, and whether you run with other FSDP, floating precision, etc..

@PinetreePantry ping Peter here, we also did a memory profiling for LoRA and ReFT - by using the same parameters, LoRA and ReFT have similar MEM profile while ReFT is lowering the utilization due to less FLOPs are required for performing position-based interventions.

@PinetreePantry
Copy link
Collaborator

When I was playing around with Reft I also met issues that it sometimes uses high GPU mem. I suggest not to use “padding = a fixed high value” - that will bloat GPU mem up a lot. Maybe try “padding = longest”, “padding = True”, “padding = False” gradually. You may see a reduction of GPU mem usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants