Skip to content
/ SPARK Public

Data for paper "The Power of Summary-Source Alignments" presented at ACL Findings 2024

License

Notifications You must be signed in to change notification settings

oriern/SPARK

Repository files navigation

SPARK

Data for the paper "The Power of Summary-Source Alignments" presented at ACL Findings 2024.

We denote this data suite as ``SPARK'', for Summary Proposition Alignment for Reconstructive Knowledgebases.

Manual test set

The alignment annotation can be found in manual_alignments.csv.

All derived datasets, were derived automatically by using createSubDatasets.py script, and can be found in the derived_datasets directory.

Derive SPARK datasets for train and val sets

  1. Download MultiNews train and dev datasets here.
  2. Parse the data:
  python parseMultiNews.py -data_path <MULTINEWS_PATH>
  1. SuperPAL alignments of MultiNews train and val datasets can be found here.

  2. Cluster the data and add query:

  python add_query.py -alignment_path <ALIGNMENTS_PATH>  -summaries_path <PARSED_SUMMARY_DIR_PATH>
  1. Generate derived datasets out of an alignment file use:
  python createSubDatasets.py -alignments_path <ALIGNMENTS_PATH>  -out_dir_path <OUT_DIR_PATH> -doc_path <PARSED_DOCUMENT_DIR_PATH> -summ_path <PARSED_SUMMARY_DIR_PATH>

About

Data for paper "The Power of Summary-Source Alignments" presented at ACL Findings 2024

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages