New!! The dataset is now available at Hugging Face 🤗
Paper: ACL 2021, Bringing Structure into Summaries: a Faceted Summarization Dataset for Long Scientific Documents
Over 60k Emerald journal articles (long documents) with faceted summaries (purpose, method, findings, and value).
Train: 46,289 / Dev: 6,000 / Test: 6,000 / OA-Test: 2,243
pip install -r requirements.txt
- Login account and visit a emerald paper link, make sure you have access to the full paper.
- Open developer tool of the browser: Application -> Cookies
- Copy all Key:Value pairs to
cookies.py
python download.py --save_dir . --auth_by_cookie True
python download.py --save_dir .
python csv2jsonl.py --csv_dir . --jsonl_filename emerald.jsonl
For fine tune code and model output, please visit this repository Finetuning_BART_for_FACET_Summarization
@inproceedings{meng2021facetsum,
title={Bringing Structure into Summaries: a Faceted Summarization Dataset for Long Scientific Documents},
author={Meng, Rui and Thaker, Khushboo and Zhang, Lei and Dong, Yue and Yuan, Xingdi and Wang, Tong and He, Daqing},
booktitle={Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
pages={1080--1089},
year={2021}
}