Skip to content


Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation


The official repository for LongChat and LongEval, which supports training and evaluating long-context LLM based chatbots. Check out our post for scientific findings!


  • [2023/08] 🔥 We released LongChat v1.5 based on Llama 2 with 32K context lengths. Download weights.

Environment setup

conda create -n longeval python=3.10
conda activate longeval
pip install longchat

If you want to build from source, use:

git clone
cd LongChat/
pip install -e .

For users who want to test very long sequence length, please also install FlashAttention.


To train a LongChat model yourself, replace to the llama checkpoint director, and run:

python -m --nproc_per_node=8 \
         longchat/train/fine_tune/ \
        --model_name_or_path <path-to-llama> \
        --data_path data/dummy_conversation.json  \
        --bf16 \
        --output_dir outputs \
        --num_train_epochs 3    \
        --per_device_train_batch_size 1 \
        --per_device_eval_batch_size 4  \
        --gradient_accumulation_steps 1 \
        --evaluation_strategy no \
        --save_strategy steps \
        --save_steps 1000  \
        --save_total_limit 1 \
        --learning_rate 2e-5 \
        --weight_decay 0.  \
        --warmup_ratio 0.03  \
        --lr_scheduler_type "cosine" \
        --logging_steps 1  \
        --fsdp "full_shard auto_wrap" \
        --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
        --tf32 True  \
        --model_max_length 16384  \
        --gradient_checkpointing True  \
        --lazy_preprocess True

This script assumes 8xA100 GPUs and use the dummy data in the repository for example usage only. Please adapt to your use case. We provided models trained on conversation data in HuggingFace: LongChat-13b-16k and LongChat-7b-16k.


We provide a simple notebook to demonstrate following steps. We also provided reproduced results under longeval/evaluation folder.

To evaluate the LongChat model on the coarsed-grained topics benchmark:

cd longeval
python3 --model-name-or-path  lmsys/longchat-13b-16k --task topics --longchat_flash_attn

To evaluate new models, choose a <task> from ["topics", "lines"], and replace <your-model> with your model path:

python3 --model-name-or-path <your-model> --task <task>

Some models require memory efficient flash attention to evaluate super long test. Please add an issue if you are running into memory issue on your model. We include the commands we used in the release blog here. The output will be stored under evaluation/task/predictions/your-model. The line recall task directly outputs an accuracy. The topics recall task outputs natural languages that are hard to parse. you can manually inspect the model output and calculate an accuracy or use chatgpt-3.5-turbo to automatically calculate it. In the latter case, set OPENAI_API_KEY and run:

python  --test_file <generated_output>

Replace <generated_output> with the generated topic prediction, e.g. evaluation/topics/predictions/longchat_13b_16k/5_response.txt.

To generate new testcases:

python3 <path-to-generate-testcases-configuration>

Replace <path-to-generate-testcases-configuration> with the path to a yaml file containing the configurations for generating testcases. longeval/generate_testcases_configs.yaml is a configuration file provides default options. To customize the testcases generated, users can tune the options in the configuration file.

Warning: Please set the output_dir optionin in the configuration file to some other location that does not overlaps with longeval/evaluation/. Otherwise the original testcases could be overwritten.


If you find this repo to be useful, plese cite:

    title = {How Long Can Open-Source LLMs Truly Promise on Context Length?},
    url = {},
    author = {Dacheng Li*, Rulin Shao*, Anze Xie, Ying Sheng, Lianmin Zheng, Joseph E. Gonzalez, Ion Stoica, Xuezhe Ma, and Hao Zhang},
    month = {June},
    year = {2023}


Official repository for LongChat and LongEval







No releases published


No packages published

Contributors 4
