Skip to content

rhlbhatnagar/llama2.rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llama2.rs

Llama with a crab

Rust meets llama.

A mimimal Rust implementation of karpathy's llama.c.

Currently the code uses the 15M parameter model provided by Karpathy (included in the resources folder). But you should be able to replace that with any llama model. You can read the section here to download larger models.

Performance:

Right now I'm getting similiar performance on my M1 Macbook for llama.c and llama.rs (~120 tok/s). Though I think we can unlock a lot of performance benifits by parallelising some parts of the code. Left some comments in main.rs on where we can make these gains. I'm no expert on Rust, so PRs are always welcome.

Quick start

# Development
> cargo run

# Prod
> cargo build --release && ./target/release/llama2rs

TODO:

  • Support for quantized versions, 16 bit / 4 bit.
  • More parallelization.
  • Other improvements like taking in the temp / starting completion string / model path as command line args.

About

A mimimal Rust implementation of Llama.c

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages