Notice

llama-rs beat me to the punch. I'll be contributing to that instead.

The original README is preserved below.

ggllama

ggllama is a Rust port of ggerganov's llama.cpp, so that it can be deployed with greater ease.

The current version uses ggml directly, so you will require a C compiler. PRs welcome to switch to a more Rust-y solution!

Does it work?

Not at the time of writing, no. It runs, but the inference is garbage:

23:59:53 [INFO] ℚ
23:59:54 [INFO]  Насе
23:59:54 [INFO] rsg
23:59:54 [INFO]  eredetiből
23:59:54 [INFO]  Хронологија
23:59:55 [INFO] flug
23:59:55 [INFO]  odkazy
23:59:55 [INFO] orith
23:59:55 [INFO] gior
23:59:56 [INFO]
23:59:56 [INFO]  logs
23:59:56 [INFO] BeanFactory
23:59:56 [INFO] gesamt
23:59:56 [INFO]  bezeichneter
23:59:57 [INFO] Webachiv
23:59:57 [INFO] brie
23:59:57 [INFO]  listade
23:59:57 [INFO] ⊤
23:59:58 [INFO] xtart
23:59:58 [INFO]  kallaste
23:59:58 [INFO] makeText
23:59:58 [INFO]  eredetiből
23:59:59 [INFO] daten
23:59:59 [INFO]  Мос
23:59:59 [INFO] lacht

The evaluation returns the wrong logits when given input to process. I'll need to debug this further.

Build requirements

ggml-sys is built with the target features passed into the Rust compiler, so you'll need to set your RUSTFLAGS appropriately:

RUSTFLAGS='-C target-feature=+avx2,+fma,+f16c'

Note that f16c was stabilised in Rust 1.68.0.

Model preparation

Model preparation is identical to the original repo. This initial version doesn't port quantize yet.

I used Conda to create my Python environment:

conda create --name llama python=3.10
conda activate llama
python3 -m pip install torch numpy sentencepiece

cd vendor/llama.cpp
python3 convert-pth-to-ggml.py models/7B/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Notice

ggllama

Does it work?

Build requirements

Model preparation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Notice

ggllama

Does it work?

Build requirements

Model preparation