This repository contains the implementation for our paper:
A Multi-Document Coverage Reward for RELAXed Multi-Document Summarization
Jacob Parnell, Inigo Jauregi Unanue, Massimo Piccardi
ACL 2022
https://aclanthology.org/2022.acl-long.351/
We have adjusted Lines 240-243 in src/summarization.py
, and have also observed the following dependency issues
which we have provided workarounds for:
ERROR: Could not find a version that satisfies the requirement newsroom (unavailable) (from versions: 1.0)
ERROR: No matching distribution found for newsroom (unavailable)
We have added scripts/fragments.py
to circumvent the need to install this dependency.
There are also updates to the requirements.txt
file which contains updated packages.
As of writing this update, you may also run into the following error reported when downloading Multi-News: Checksum error when loading multi-news dataset
Create a virtual environment using Python 3.6.8. We have found that other versions of Python may not work with the
prescribed requirements.txt
file.
scripts/alternate_objectives.py
: Contains metrics used during training, and architecture for RELAX and gumbel sampling.
scripts/xyz_dataset.py
: HuggingFace Datasets loader scripts (for fine-tuning).
scripts/summarization.py
: Main file with model forward pass.
Links to the pre-trained Longformer models can be found in the Longformer
repo.
Appendix A.2. in the paper provides links to the datasets used in these experiments.
To use the first 1000 samples for fine-tuning, we simply run:
head -1000 train.jsonl > train-1000.jsonl
# Run NLL pre-training
python3 scripts/summarization.py --save_dir='multinews_pretrain' --model_path='longformer-encdec-large-16384' --grad_ckpt --max_output_len 256 --dataset='multinews' --num_dataset_examples='all' --epochs 5 --custom_method='_nll'
# Run fine-tuning (e.g. ROUGE-L + Coverage with RELAX)
python3 scripts/summarization.py --lr 0.000003 --fine_tuning --save_dir="multinews_finetune" --model_path='longformer-encdec-large-16384' --from_pretrained='multinews_pretrain/test/{name_of_ckpt}.ckpt' --grad_ckpt --max_output_len 256 --dataset='multinews' --num_dataset_examples='1000' --epochs 1 --custom_method=rougecov1_relax
# Note: remember to add in optimizer_idx into Line 265 of scripts/summarization.py
Due to the use of legacy pytorch-lightning
code, and newer torch
code, code must be changed in the backend to facilitate fine-tuning.
In {virtual-env-name}/lib64/python3.6/site-packages/pytorch_lightning/core/saving.py
, to allow fine-tuning to run, change the following:
# Line 195 - do not modify cls_args
# cls_args = (model_args,) + cls_args (CHANGE THIS TO BELOW)
cls_args = cls_args
# Line 205 - remove **cls_kwargs
model = cls(*cls_args) #, **cls_kwargs)
# Line 207 - set strict=False when loading state_dict
model.load_state_dict(checkpoint['state_dict'], strict=False)
This code is based off of the Longformer repo, and uses PyTorch Lightning We also utilise parts of the code from RELAX, and Newsroom to implement the coverage reward.
Please cite our work using the following:
@inproceedings{parnell-etal-2022-multi,
title = "A Multi-Document Coverage Reward for {RELAX}ed Multi-Document Summarization",
author = "Parnell, Jacob and
Jauregi Unanue, Inigo and
Piccardi, Massimo",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-long.351",
pages = "5112--5128",
abstract = "Multi-document summarization (MDS) has made significant progress in recent years, in part facilitated by the availability of new, dedicated datasets and capacious language models. However, a standing limitation of these models is that they are trained against limited references and with plain maximum-likelihood objectives. As for many other generative tasks, reinforcement learning (RL) offers the potential to improve the training of MDS models; yet, it requires a carefully-designed reward that can ensure appropriate leverage of both the reference summaries and the input documents. For this reason, in this paper we propose fine-tuning an MDS baseline with a reward that balances a reference-based metric such as ROUGE with coverage of the input documents. To implement the approach, we utilize RELAX (Grathwohl et al., 2018), a contemporary gradient estimator which is both low-variance and unbiased, and we fine-tune the baseline in a few-shot style for both stability and computational efficiency. Experimental results over the Multi-News and WCEP MDS datasets show significant improvements of up to +0.95 pp average ROUGE score and +3.17 pp METEOR score over the baseline, and competitive results with the literature. In addition, they show that the coverage of the input documents is increased, and evenly across all documents.",
}