Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] full finetune / qlora + ac/offload/optm in bwd #21

Closed
wants to merge 3 commits into from

Conversation

weifengpy
Copy link
Contributor

@weifengpy weifengpy commented Feb 26, 2024

Why composing FSDP with NF4Tensor

QLoRA : number of trainable parameters are reduced from xxx to xxx. parameter size are reduced by xx
Full finetuning original Llama with 4bit quantized params:

7B + QLora on FFNs: summarizing memory usage below with bf16, adamW, AC, cpu offloading

  • sharding NF4Tensor in FSDP: NF4Tensor are 4 bit quant weight from QLora
  • cpu offloading NF4Tensor in FSDP: most profitabble
  • gradient in the backward, 8bit optimizer: does not matter in QLora because of tiny gradable parameters. should priooritize in full training
Screenshot 2024-02-26 at 12 50 45 PM Screenshot 2024-02-27 at 12 39 54 PM

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@weifengpy weifengpy changed the title full finetune / qlora + ac/offload/optm in bwd [WIP] full finetune / qlora + ac/offload/optm in bwd Feb 26, 2024
@awgu
Copy link

awgu commented Feb 26, 2024

Should we call out that this table assumes that we are only applying QLoRA to the FFNs?

@weifengpy weifengpy marked this pull request as draft February 27, 2024 20:34
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
def nf4_detach(aten_op, args, kwargs=None):
# nn.Parameter need detach
quantized_data = aten_op(args[0].quantized_data, *args[1:], **kwargs)
tensor_meta = SubclassTensorArgs(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@weifengpy weifengpy closed this Jul 15, 2024
@weifengpy
Copy link
Contributor Author

weifengpy commented Jul 15, 2024

close is since QLoRA + FSDP2 and cpu offloading has been landed into torchtune: pytorch/torchtune#909

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants