This repo contains the popular LLaMa 7b language model, fully implemented in the rust programming language!
Uses dfdx tensors and CUDA acceleration.
This runs LLaMa directly in f16, meaning there is no hardware acceleration on CPU. Using CUDA is heavily recommended.
Here is the 7b model running on an A10 GPU:
- Install git lfs. On ubuntu you can run
sudo apt install git-lfs
- Activate git lfs with
git lfs install
. - Run the following commands to download the model weights in pytorch format (~25 GB):
- LLaMa 7b (~25 GB):
git clone https://huggingface.co/decapoda-research/llama-7b-hf
- LLaMa 13b (~75 GB):
git clone https://huggingface.co/decapoda-research/llama-13b-hf
- LLaMa 65b (~244 GB):
git clone https://huggingface.co/decapoda-research/llama-65b-hf
- LLaMa 7b (~25 GB):
- (Optional) Run
python3.x -m venv <my_env_name>
to create a python virtual environment, wherex
is your prefered python version - (Optional, requires 1.) Run
source <my_env_name>\bin\activate
(or<my_env_name>\Scripts\activate
if on Windows) to activate the environment - Run
pip install numpy torch
- Run
python convert.py
to convert the model weights to rust understandable format: a. LLaMa 7b:python convert.py
b. LLaMa 13b:python convert.py llama-13b-hf
c. LLaMa 65b:python convert.py llama-65b-hf
You can compile with normal rust commands:
With cuda:
cargo build --release -F cuda
Without cuda:
cargo build --release
With default args:
./target/release/llama-dfdx --model <model-dir> generate "<prompt>"
./target/release/llama-dfdx --model <model-dir> chat
./target/release/llama-dfdx --model <model-dir> file <path to prompt file>
To see what commands/custom args you can use:
./target/release/llama-dfdx --help