Minimal Implementation for Llama2 Inference and LoRA Fine-Tuning

This repository provides a simple and minimal implementation for performing inference and Low-Rank Adaptation (LoRA) fine-tuning on Llama2-7B models (need 40GB GPU memory). It is designed with minimal dependencies (only torch and sentencepiece) to provide a straightforward setup.

Download Model and Tokenizer

Install Required Dependencies

pip install torch sentencepiece

Run Inference

python inference.py --tokenizer_path /path_to/tokenizer.model --model_path /path_to/consolidated.00.pth

Run LoRA Fine-tuning

We use Alpaca dataset with only 200 samples for quick experimentation. LoRA implmenetation is under the llama folder.

python finetune.py --tokenizer_path /path_to/tokenizer.model --model_path /path_to/consolidated.00.pth --data_path alpaca_data_200_samples.json

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
llama		llama
.gitignore		.gitignore
README.md		README.md
alpaca_data_200_samples.json		alpaca_data_200_samples.json
finetune.py		finetune.py
inference.py		inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Minimal Implementation for Llama2 Inference and LoRA Fine-Tuning

Download Model and Tokenizer

Install Required Dependencies

Run Inference

Run LoRA Fine-tuning

Reference

About

Releases

Packages

Languages

leigao97/minimal-llama

Folders and files

Latest commit

History

Repository files navigation

Minimal Implementation for Llama2 Inference and LoRA Fine-Tuning

Download Model and Tokenizer

Install Required Dependencies

Run Inference

Run LoRA Fine-tuning

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages