You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
TensorRT-LLM only accepts a single rank .nemo LoRA checkpoint (in the case of Llama 3.1 8b). Therefore, the only way to use my fine-tuned model with TensorRT-LLM backend is to merge my distributed LoRA checkpoints into the base model using the scripts/nlp_language_modeling/merge_lora_weights/merge.py script. However, that results in a lot of big models, if I want to do that for multiple downstream tasks/fine-tuned models.
More specifically, my checkpoints after training with TP=PP=2 look like (the contents of the megatron_gpt_peft_lora_tuning.nemo LoRA checkpoint file):
It would be nice if we could merge the distributed LoRA weights to a .nemo LoRA checkpoint file that only contains weights for a single rank. That way, the LoRA would be compatible with TensorRT-LLM even if training on multiple GPUs.
Thanks in advance!
Best regards,
John
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
TensorRT-LLM only accepts a single rank .nemo LoRA checkpoint (in the case of Llama 3.1 8b). Therefore, the only way to use my fine-tuned model with TensorRT-LLM backend is to merge my distributed LoRA checkpoints into the base model using the scripts/nlp_language_modeling/merge_lora_weights/merge.py script. However, that results in a lot of big models, if I want to do that for multiple downstream tasks/fine-tuned models.
More specifically, my checkpoints after training with TP=PP=2 look like (the contents of the megatron_gpt_peft_lora_tuning.nemo LoRA checkpoint file):
Describe the solution you'd like
It would be nice if we could merge the distributed LoRA weights to a .nemo LoRA checkpoint file that only contains weights for a single rank. That way, the LoRA would be compatible with TensorRT-LLM even if training on multiple GPUs.
Thanks in advance!
Best regards,
John
The text was updated successfully, but these errors were encountered: