Skip to content
/ GoR Public

Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs

License

Notifications You must be signed in to change notification settings

ulab-uiuc/GoR

Repository files navigation

Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs

Build Build License
Build Build Build

🌐 Project Page | 📜 arXiv

GoR

News

[2024.10.16] 🌟 GoR is released.

📌Preliminary

Environment Setup

# python==3.10
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install dgl==1.0.0+cu113 -f https://data.dgl.ai/wheels/cu113/repo.html
pip install openai==0.28
pip install pandas
pip install langchain
pip install langchain-core
pip install langchain-community
pip install langchain-experimental
pip install tiktoken
pip install tqdm
pip install bert_score
pip install rouge_score
pip install networkx
pip install faiss-gpu
pip install transformers

Dataset Preparation

QMSum WCEP Booksum GovReport SQuALITY

Save the downloaded files in the ./data/[DATASET_NAME] folder.

Important

Before running the experiment, please configure your API KEY in "get_llm_response_via_api" in utils.py

⭐Experiments

Query Simulation and Graph Construction

Generate simulated queries and construct graphs. The constructed graphs are saved in the ./graph folder.

# DATASET Choices: qmsum, wcep, booksum, govreport, squality
# Training Set
python graph_construction.py --cuda 0 --dataset [DATASET] --train
# Test Set
python graph_construction.py --cuda 0 --dataset [DATASET]

Training Preparation

Pre-compute BERTScore and save training data in the ./training_data folder.

# DATASET Choices: qmsum, wcep, booksum, govreport, squality
python training_preparation.py --cuda 0 --dataset [DATASET]

Training

# DATASET Choices: qmsum, wcep, booksum, govreport, squality
python train.py --cuda 0 --dataset [DATASET]

Evaluation

# DATASET Choices: qmsum, wcep, booksum, govreport, squality
# Generate summary results
python eval.py --cuda 0 --dataset [DATASET]
# Evaluation
python sum_eval.py --cuda 0 --file_name ./result/[DATASET].json

Citation

@article{GoR,
  title={Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs},
  author={Haozhen Zhang and Tao Feng and Jiaxuan You},
  journal={arXiv preprint arXiv:2410.11001},
  year={2024}
}

About

Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages