LoongServe

This is an implementation of the paper: "LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism".

To reproduce all the main results in our paper, please check the artifact folder and follow the instructions in it.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs/artifact-eval		docs/artifact-eval
longserve_c_scheduler		longserve_c_scheduler
longserve_cuda_kernels		longserve_cuda_kernels
loongserve		loongserve
rnccl		rnccl
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_multi_nodes.sh		run_multi_nodes.sh
setup.py		setup.py

Provide feedback