This is an implementation of the paper: "LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism".
To reproduce all the main results in our paper, please check the artifact folder and follow the instructions in it.