SubSumE Dataset

This repository contains the SubSumE dataset for subjective document summarization. See the paper and the talk for details on dataset creation. Also check out our work SuDocu on example-based document summarization.

Dataset Files

Download the dataset from here.

The dataset contains :

Simplified text from 48 Wikipedia pages of the states in the US. Additionally, all the sentences in these documents are put together in a single file processed_state_sentences.csv and are assigned a unique sentence id that is used in summary json files.
Intent-based summaries created by human annotators.

Each datapoint file in the directory user_summary_jsons contains a json containing summaries of Wikipedia pages of eight states with following keys:

intent : Summarization intent provided to human annotators for generating the summary
summaries: List of summary jsons for eight states assigned to the annotator. Each json in the list contains following keys:
- state_name: Name of the state
- sentence_ids: Global ids of sentences (wrt processed_state_sentences.csv) present in the summary
- sentences: List of sentences present in the summary
- use_keywords: Keywords used by the annotator to search the document when creating summaries

Acknowledgements

This work was supported by the NSF under grants IIS-1453543, IIS1943971, and CCF-1763423, and a Microsoft Research Dissertation Grant.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SubSumE Dataset

Dataset Files

Acknowledgements

About

Releases

Packages

License

afariha/SubSumE

Folders and files

Latest commit

History

Repository files navigation

SubSumE Dataset

Dataset Files

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages