Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs

News

[2024.10.16] 🌟 GoR is released.

📌Preliminary

Environment Setup

# python==3.10
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install dgl==1.0.0+cu113 -f https://data.dgl.ai/wheels/cu113/repo.html
pip install openai==0.28
pip install pandas
pip install langchain
pip install langchain-core
pip install langchain-community
pip install langchain-experimental
pip install tiktoken
pip install tqdm
pip install bert_score
pip install rouge_score
pip install networkx
pip install faiss-gpu
pip install transformers

Dataset Preparation

QMSum WCEP Booksum GovReport SQuALITY

Save the downloaded files in the ./data/[DATASET_NAME] folder.

Important

Before running the experiment, please configure your API KEY in "get_llm_response_via_api" in utils.py

⭐Experiments

Query Simulation and Graph Construction

Generate simulated queries and construct graphs. The constructed graphs are saved in the ./graph folder.

# DATASET Choices: qmsum, wcep, booksum, govreport, squality
# Training Set
python graph_construction.py --cuda 0 --dataset [DATASET] --train
# Test Set
python graph_construction.py --cuda 0 --dataset [DATASET]

Training Preparation

Pre-compute BERTScore and save training data in the ./training_data folder.

# DATASET Choices: qmsum, wcep, booksum, govreport, squality
python training_preparation.py --cuda 0 --dataset [DATASET]

Training

# DATASET Choices: qmsum, wcep, booksum, govreport, squality
python train.py --cuda 0 --dataset [DATASET]

Evaluation

# DATASET Choices: qmsum, wcep, booksum, govreport, squality
# Generate summary results
python eval.py --cuda 0 --dataset [DATASET]
# Evaluation
python sum_eval.py --cuda 0 --file_name ./result/[DATASET].json

Citation

@article{GoR,
  title={Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs},
  author={Haozhen Zhang and Tao Feng and Jiaxuan You},
  journal={arXiv preprint arXiv:2410.11001},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
docs		docs
figures		figures
LICENSE		LICENSE
README.md		README.md
data_process.py		data_process.py
eval.py		eval.py
graph_construction.py		graph_construction.py
prompt_pool.py		prompt_pool.py
retrieval.py		retrieval.py
sum_eval.py		sum_eval.py
train.py		train.py
training_preparation.py		training_preparation.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs

News

📌Preliminary

Environment Setup

Dataset Preparation

⭐Experiments

Query Simulation and Graph Construction

Training Preparation

Training

Evaluation

Citation

About

Releases

Packages

Languages

License

ulab-uiuc/GoR

Folders and files

Latest commit

History

Repository files navigation

Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs

News

📌Preliminary

Environment Setup

Dataset Preparation

⭐Experiments

Query Simulation and Graph Construction

Training Preparation

Training

Evaluation

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages