llm-cpp

Performance

In M1, prefill 128 token and decode 128 token:

model	prefill(token/s)	decode(token/s)	RAM
Qwen1.5-0.5B-Chat-GPTQ-Int4	281	103	360M
Qwen1.5-1.8B-Chat-GPTQ-Int4	79	39	1100M

Implement & Feature detail

支持直接加载huggingface格式的模型
- 通过加载config.json文件动态构建blob和layer依赖关系并建立ncnn模型图
  - 使用nn.Module的风格进行blob和layer绑定
- 直接加载safetensor格式权重并给模型的层进行赋值
  - 支持超大模型的多safetensor加载
- 提前预处理权重，硬盘&内存占用小
支持embed和lm_head的权重共享（主要是0.5B模型）
支持GPTQ-Int4量化(偷懒了，把代码写死了)，int8&fp16混合激活
支持kv cache，使用fp16存储（没做连续对话，所以kv cache不大，不压缩了）
支持Qwen1.5-xxB-Chat-GPTQ-Int4模型
两种输出模式
- 使用argmax的确定性输出
- 使用概率采样的不确定性输出

How to use

# download model
git lfs install && git clone https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4
# convert model to save disk and ram
python make_lite_weight.py --input_model Qwen1.5-0.5B-Chat-GPTQ-Int4
# build & run
bash run.sh

References

model from : https://huggingface.co/Qwen
runtime manager from : https://github.com/Tencent/ncnn
load json file from : https://github.com/nlohmann/json/tree/develop
load safetensors file from: https://github.com/syoyo/safetensors-cpp
tokenizer implement from: https://github.com/harrisonvanderbyl/rwkv-cpp-accelerated
ProgressBar from: https://github.com/gipert/progressbar/tree/master
armpl from: https://developer.arm.com/documentation/101004/2404?lang=en

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
android		android
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
chat.cpp		chat.cpp
gemm.h		gemm.h
getmem.h		getmem.h
json.hpp		json.hpp
make_lite_weight.py		make_lite_weight.py
qwen2_layers.h		qwen2_layers.h
qwen2_model.h		qwen2_model.h
run.sh		run.sh
safetensors.hh		safetensors.hh
simdjson.h		simdjson.h
tokenizer.h		tokenizer.h
utils.h		utils.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

llm-cpp

Performance

Implement & Feature detail

How to use

References

About

Uh oh!

Releases

Packages

Languages

AraiLen/llm-cpp

Folders and files

Latest commit

History

Repository files navigation

llm-cpp

Performance

Implement & Feature detail

How to use

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages