SportsSum2.0 is a Chinese sports game summarization dataset which is based on SportsSum. In short, SportsSum2.0 is the cleaned version of SportsSum. Sports Game Summarization is a challenging task, which aims to generate sports summaries (i.e., news articles) from corresponding live commentaries.
For more details pls refer to the following papers:
- SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary Jiaan Wang et al. In Proceedings of CIKM (short), 2021.
- Generating Sports News from Live Commentary: A Chinese Dataset for Sports Game Summarization Kuan-Hao Huang et al. In Proceedings of AACL (long), 2020.
BTW, our new paper Knowledge Enhanced Sports Game Summarization has been accepted by WSDM 2022 as a long paper. In this paper, we provide K-SportsSum dataset which contains more data (~1.45 times) than SportsSum / SportsSum2.0. K-SportsSum also offers a large-scale knowledge corpus containing information of games as well as players. More details can be found at K-SportsSum.
You can download the data here
Each Game has four related files:
news.txt
: Original news article from SportsSum.[League]_[id].txt
: Cleaned news article.[league]
indicates the which league did the game take place in, such as, Bundesliga, CSL, Europa, La Liga, etc.[id]
is the identifier of game.live.json
: Live commentary document which contains commentary sentences, timeline information and real time scores.linesup.json
: Metadata file (contains rosters, starting lineups, player positions, etc.).
If you find this data is useful or use the data in your work, please cite our paper and original SportsSum.
@article{Wang2021SportsSum20GH,
title={SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary},
author={Jiaan Wang and Zhixu Li and Qiang Yang and Jianfeng Qu and Zhigang Chen and Qingsheng Liu and Guoping Hu},
journal={Proceedings of the 30th ACM International Conference on Information \& Knowledge Management},
year={2021}
}
@inproceedings{Huang2020sportssum,
author = {Kuan-Hao Huang and
Chen Li and
Kai-Wei Chang},
title = {Generating Sports News from Live Commentary: A Chinese Dataset for Sports Game Summarization},
booktitle = {Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (AACL)},
year = {2020},
}
Please contact Jiaan Wang (jawang1[at].stu.suda.edu.cn) for questions and suggestions.