Data and code for "Earlier Isn't Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization " by Taehee Jung*, Dongyeop Kang*, Lucas Mentch and Eduard Hovy (*equal contribution), EMNLP 2019. If you have any questions, please contact to Dongyeop Kang (dongyeok@cs.cmu.edu).
We provide a platform (BiasSum.com) for bias analysis of your system across different summarization corpora. Please evaluate your summarization system across differet domains of datasets and metrics, and measure general performance on robustness against the biases.
@inproceedings{jungkang19emnlp_biassum,
title = {Earlier Isn't Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization},
author = {Taehee Jung and Dongyeop Kang and Lucas Mentch and Eduard Hovy},
booktitle = {Conference on Empirical Methods in Natural Language Processing (EMNLP)},
address = {Hong Kong},
month = {November},
url = {https://arxiv.org/abs/1908.11723},
year = {2019}
}
- Some codes are still under development. We will be refactoring them soon.
- If you like to add a new dataset or a new evaluation metric, please contact to Dongyeop.
Please download the pre-processed nine summarization copora in task. Every corpora has the same format of dataset as follow:
Dataset format:
[source sentences] \t [target sentences]
or
<s> I was at home .. </s> <s> It was rainy day ..</s> ... \t <s> Sleeping at home rainy day </s> ..
An example python script for loading each dataset is provided here
python example/data_load.py --dataset AMI
Please check [task] tab for more details in BiasSum.com/task). If you like to download all the preprocessed dataset at once, please download here.
NOTE: the links are not available now. Please download the pre-processed datasets here.
Type | Name | Preprocessed Dataset | Original |
---|---|---|---|
News | CNNDM | link | link |
News | NewsRoom | link | link |
News | XSum | link | link |
Papers | PeerRead | link | link |
Papers | PubMed | link | link |
Books | BookSum | - | link |
Dialogues | AMI | link | link |
Posts | link | link | |
Script | MovieScript | link | link |
- avearged ROUGE with reference abstractive summaries (R)
- Sentence overlap score with Oracle extractive summaries (SO)
- Volume overlap score with reference abstractive summaries (VO)
- the balance across three aspects (P/D/I)
- Please contact to Dongyeop if you like to add your system to the leaderboard with your R/SO/VO/PDI scores across corpora.