GitHub - ModelTC/lightllm: LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance. LightLLM harnesses the strengths of numerous well-regarded open-source implementations, including but not limited to FasterTransformer, TGI, vLLM, and FlashAttention.

English Docs | 中文文档 | Blogs

News

[2025/05] LightLLM paper on constrained decoding accepted by ACL25 (Pre $^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation)
[2025/04] LightLLM paper on request scheduler published in ASPLOS’25 (Past-Future Scheduler for LLM Serving under SLA Guarantees)
[2025/02] 🔥 LightLLM v1.0.0 release, achieving the fastest DeepSeek-R1 serving performance on single H200 machine.

Get started

Performance

Learn more in the release blogs: v1.0.0 blog.

FAQ

Please refer to the FAQ for more information.

Projects using LightLLM

We welcome any coopoeration and contribution. If there is a project requires LightLLM's support, please contact us via email or create a pull request.

Projects based on LightLLM or referenced LightLLM components:

LazyLLM
LoongServe, Peking University
OmniKV, Ant Group
vLLM (some LightLLM's kernel used)
SGLang (some LightLLM's kernel used)
ParrotServe, Microsoft
Aphrodite (some LightLLM's kernel used)
S-LoRA

Also, LightLLM's pure-python design and token-level KC Cache management make it easy to use as the basis for research projects.

Academia works based on or use part of LightLLM:

Community

For further information and discussion, join our discord server. Welcome to be a member and look forward to your contribution!

License

This repository is released under the Apache-2.0 license.

Acknowledgement

We learned a lot from the following projects when developing LightLLM.

Citation

We have published a number of papers around components or features of LightLLM, if you use LightLLM in your work, please consider citing the relevant paper.

Request scheduler: accepted by ASPLOS’25:

@inproceedings{gong2025past,
  title={Past-Future Scheduler for LLM Serving under SLA Guarantees},
  author={Gong, Ruihao and Bai, Shihao and Wu, Siyu and Fan, Yunqian and Wang, Zaijun and Li, Xiuhong and Yang, Hailong and Liu, Xianglong},
  booktitle={Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2},
  pages={798--813},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 611 Commits
.github		.github
assets		assets
demos		demos
docker		docker
docs		docs
format_out		format_out
lightllm		lightllm
test		test
tools		tools
unit_tests		unit_tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
benchmark.md		benchmark.md
build_and_upload_docker.sh		build_and_upload_docker.sh
format.py		format.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

News

Get started

Performance

FAQ

Projects using LightLLM

Community

License

Acknowledgement

Citation

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors 40

Languages

License

ModelTC/lightllm

Folders and files

Latest commit

History

Repository files navigation

News

Get started

Performance

FAQ

Projects using LightLLM

Community

License

Acknowledgement

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors 40

Languages

Packages