SuperPAL

Data, Code and Model for the paper "Summary-Source Proposition-level Alignment: Task, Datasets and Supervised Baseline".

If you find the code useful, please cite the following paper.

@inproceedings{ernst-etal-2021-summary,
  title = "Summary-Source Proposition-level Alignment: Task, Datasets and Supervised Baseline",
  author = "Ernst, Ori  and Shapira, Ori  and Pasunuru, Ramakanth  and Lepioshkin, Michael  and Goldberger, Jacob  and Bansal, Mohit  and Dagan, Ido", booktitle = "Proceedings of the 25th Conference on Computational Natural Language Learning", month = nov, year = "2021", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.conll-1.25", pages = "310--322",}

You can use our huggingface model or check our demo here.

run_glue.py script was forked from huggingface v2.5.1, and edited for our purpose.

supervised_oie_wrapper directory is a wrapper over AllenNLP's (v0.9.0) pretrained Open IE model that was implemented by Gabriel Stanovsky. It was forked from here, and edited for our purpose.

In this repository we used python-3.6. Please refer to environment_superPAL.yml for other requirements.

Manual Datasets

All manual datasets are under manual_datasets repository, including crowdsourced dev and test sets, and Pyramid-based train set.

As DUC-based datasets are limited to LDC agreement, we provide here only the character index of all propositions or sentences.

To restore the text alignments please use:

  python manual_datasets/restore_alignments.py -indx_csv_path <PATH_TO_THE_CSV_WITH_ALIGNMENTS_INDEXES>  -documents_path <PATH_TO_THE_DOCUMENTS_ARANGED_BY_TOPIC_DIRECTORIES> -summaries_path <SUMMARIES_PATH> -output_file <ALIGNMENTS_OUTPUT_FILE_PATH>

If you have any issue regarding the DUC alignment regeneration, please contact via email.

MultiNews alignments are released in full.

Data generation

Predicted alignments of MultiNews and CNN/DailyMail train and val datasets can be found here.

Alignment model

To apply aligment model on your own data, follow the following steps:

Download the trained model here.
Run

python main_predict.py -data_path <DATA_PATH>  -output_path <OUT_DIR_PATH>  -alignment_model_path  <ALIGNMENT_MODEL_PATH>

<DATA_PATH> should contain the following structure where a summary and its related document directory share the same name:

  - <DATA_PATH>
    - summaries
      - A.txt
      - B.txt
      - ...
    - A
      - doc_A1
      - doc_A2
      - ...
    - B
      - doc_B1
      - doc_B2
      - ...

It will create two files in <OUT_DIR_PATH>:

- 'dev.tsv' - contains all alignment candidate pairs.

- a '.csv' file - contains all predicted aligned pairs with their classification score.

To use the alignment model with your own data with different properties, you can inherent from the docSum2MRPC_Aligner class and overload the relevant functions.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
manual_datasets		manual_datasets
supervised_oie_wrapper		supervised_oie_wrapper
Aligner.py		Aligner.py
LICENSE		LICENSE
README.md		README.md
annotation2MRPC_Aligner.py		annotation2MRPC_Aligner.py
createSubDatasets.py		createSubDatasets.py
docSum2MRPC_Aligner.py		docSum2MRPC_Aligner.py
environment_superPAL.yml		environment_superPAL.yml
filterContained.py		filterContained.py
finalAlignmentPred.py		finalAlignmentPred.py
main_predict.py		main_predict.py
run_glue.py		run_glue.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SuperPAL

Manual Datasets

Data generation

Alignment model

About

Releases

Packages

Languages

License

oriern/SuperPAL

Folders and files

Latest commit

History

Repository files navigation

SuperPAL

Manual Datasets

Data generation

Alignment model

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages