GitHub - rhlbhatnagar/llama2.rs: A mimimal Rust implementation of Llama.c

llama2.rs

Rust meets llama.

A mimimal Rust implementation of karpathy's llama.c.

Currently the code uses the 15M parameter model provided by Karpathy (included in the resources folder). But you should be able to replace that with any llama model. You can read the section here to download larger models.

Performance:

Right now I'm getting similiar performance on my M1 Macbook for llama.c and llama.rs (~120 tok/s). Though I think we can unlock a lot of performance benifits by parallelising some parts of the code. Left some comments in main.rs on where we can make these gains. I'm no expert on Rust, so PRs are always welcome.

Quick start

# Development
> cargo run

# Prod
> cargo build --release && ./target/release/llama2rs

TODO:

Support for quantized versions, 16 bit / 4 bit.
More parallelization.
Other improvements like taking in the temp / starting completion string / model path as command line args.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
assets		assets
resources		resources
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llama2.rs

Rust meets llama.

Performance:

Quick start

TODO:

About

Releases

Packages

Languages

rhlbhatnagar/llama2.rs

Folders and files

Latest commit

History

Repository files navigation

llama2.rs

Rust meets llama.

Performance:

Quick start

TODO:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages