Skip to content

haozhx23/Llama2-QLoRA-on-SageMaker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Llama2-QLoRA-on-SageMaker

This experiment is based on Amazon SageMaker. Including,

1/ Training a Bloke Llama2 (TheBloke/Llama-2-7B-fp16) QLoRA Adapter

2/ Merge the freezed LLM and the trained adapter into one-model for faster inference later

3/ Use LMI (Large Model Inference) container on SageMaker Endpoint to host the adpated model saved in S3

- in inference/, use Basic Python engine (HuggingFace Accelerate)

- in inference-deepspeed/, use LMI optimized DJL-DeepSpeed engine

Infra:

Notebook Instance - CPU instance e.g. ml.c5.xlarge
Training - Single GPU Required, tested on ml.g5.xlarge ~ ml.g5.4xlarge
Hosting - Single GPU Required, tested on ml.g4dn.xlarge ~ ml.g5.4xlarge


Basics to Start Experiment on SageMaker

https://github.com/haozhx23/SageMaker-Utils/blob/main/0-How-to-start.md


Refs

https://huggingface.co/TheBloke/Llama-2-7b-chat-fp16 https://github.com/artidoro/qlora
https://arxiv.org/abs/2305.14314
https://www.philschmid.de/sagemaker-llama2-qlora
https://www.linkedin.com/pulse/enhancing-language-models-qlora-efficient-fine-tuning-vraj-routu

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published