SPARK

Data for the paper "The Power of Summary-Source Alignments" presented at ACL Findings 2024.

We denote this data suite as ``SPARK'', for Summary Proposition Alignment for Reconstructive Knowledgebases.

Manual test set

The alignment annotation can be found in manual_alignments.csv.

All derived datasets, were derived automatically by using createSubDatasets.py script, and can be found in the derived_datasets directory.

Derive SPARK datasets for train and val sets

Download MultiNews train and dev datasets here.
Parse the data:

  python parseMultiNews.py -data_path <MULTINEWS_PATH>

SuperPAL alignments of MultiNews train and val datasets can be found here.
Cluster the data and add query:

  python add_query.py -alignment_path <ALIGNMENTS_PATH>  -summaries_path <PARSED_SUMMARY_DIR_PATH>

Generate derived datasets out of an alignment file use:

  python createSubDatasets.py -alignments_path <ALIGNMENTS_PATH>  -out_dir_path <OUT_DIR_PATH> -doc_path <PARSED_DOCUMENT_DIR_PATH> -summ_path <PARSED_SUMMARY_DIR_PATH>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPARK

Manual test set

Derive SPARK datasets for train and val sets

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
derived_datasets		derived_datasets
LICENSE		LICENSE
README.md		README.md
add_query.py		add_query.py
createSubDatasets.py		createSubDatasets.py
manual_alignments.csv		manual_alignments.csv
parseMultiNews.py		parseMultiNews.py

License

oriern/SPARK

Folders and files

Latest commit

History

Repository files navigation

SPARK

Manual test set

Derive SPARK datasets for train and val sets

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages