This repository provides a simple and minimal implementation for performing inference and Low-Rank Adaptation (LoRA) fine-tuning on Llama2-7B models (need 40GB GPU memory). It is designed with minimal dependencies (only torch
and sentencepiece
) to provide a straightforward setup.
pip install torch sentencepiece
python inference.py --tokenizer_path /path_to/tokenizer.model --model_path /path_to/consolidated.00.pth
We use Alpaca dataset with only 200 samples for quick experimentation. LoRA implmenetation is under the llama
folder.
python finetune.py --tokenizer_path /path_to/tokenizer.model --model_path /path_to/consolidated.00.pth --data_path alpaca_data_200_samples.json