Data for the paper "The Power of Summary-Source Alignments" presented at ACL Findings 2024.
We denote this data suite as ``SPARK'', for Summary Proposition Alignment for Reconstructive Knowledgebases.
The alignment annotation can be found in manual_alignments.csv
.
All derived datasets, were derived automatically by using createSubDatasets.py
script, and can be found in the derived_datasets
directory.
- Download MultiNews train and dev datasets here.
- Parse the data:
python parseMultiNews.py -data_path <MULTINEWS_PATH>
-
SuperPAL alignments of MultiNews train and val datasets can be found here.
-
Cluster the data and add query:
python add_query.py -alignment_path <ALIGNMENTS_PATH> -summaries_path <PARSED_SUMMARY_DIR_PATH>
- Generate derived datasets out of an alignment file use:
python createSubDatasets.py -alignments_path <ALIGNMENTS_PATH> -out_dir_path <OUT_DIR_PATH> -doc_path <PARSED_DOCUMENT_DIR_PATH> -summ_path <PARSED_SUMMARY_DIR_PATH>