Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge multiple "distributed LoRA checkpoints" #11314

Open
jolyons123 opened this issue Nov 18, 2024 · 0 comments
Open

Merge multiple "distributed LoRA checkpoints" #11314

jolyons123 opened this issue Nov 18, 2024 · 0 comments
Assignees

Comments

@jolyons123
Copy link

jolyons123 commented Nov 18, 2024

Is your feature request related to a problem? Please describe.

TensorRT-LLM only accepts a single rank .nemo LoRA checkpoint (in the case of Llama 3.1 8b). Therefore, the only way to use my fine-tuned model with TensorRT-LLM backend is to merge my distributed LoRA checkpoints into the base model using the scripts/nlp_language_modeling/merge_lora_weights/merge.py script. However, that results in a lot of big models, if I want to do that for multiple downstream tasks/fine-tuned models.

More specifically, my checkpoints after training with TP=PP=2 look like (the contents of the megatron_gpt_peft_lora_tuning.nemo LoRA checkpoint file):

./                                                                                                                                                                           
./model_config.yaml                                                                                                                                                          
./tp_rank_00_pp_rank_000/
./tp_rank_00_pp_rank_000/model_weights.ckpt 
./tp_rank_00_pp_rank_001/
./tp_rank_00_pp_rank_001/model_weights.ckpt 
./tp_rank_01_pp_rank_000/
./tp_rank_01_pp_rank_000/model_weights.ckpt 
./tp_rank_01_pp_rank_001/
./tp_rank_01_pp_rank_001/model_weights.ckpt

Describe the solution you'd like

It would be nice if we could merge the distributed LoRA weights to a .nemo LoRA checkpoint file that only contains weights for a single rank. That way, the LoRA would be compatible with TensorRT-LLM even if training on multiple GPUs.

Thanks in advance!

Best regards,
John

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants