Name	Name	Last commit message	Last commit date
Latest commit oriern Update README.md May 31, 2024 cb91393 · May 31, 2024 History 66 Commits
manual_datasets	manual_datasets	Delete train_full_details_with_oies_no_duplications_only_index.csv	Mar 2, 2023
supervised_oie_wrapper	supervised_oie_wrapper	add alignment model	Dec 23, 2020
Aligner.py	Aligner.py	Update Aligner.py	May 7, 2023
LICENSE	LICENSE	Create LICENSE	Jun 2, 2022
README.md	README.md	Update README.md	May 31, 2024
annotation2MRPC_Aligner.py	annotation2MRPC_Aligner.py	Update annotation2MRPC_Aligner.py	May 7, 2023
createSubDatasets.py	createSubDatasets.py	Update createSubDatasets.py	Oct 21, 2020
docSum2MRPC_Aligner.py	docSum2MRPC_Aligner.py	bug fix	Jan 4, 2022
environment_superPAL.yml	environment_superPAL.yml	Rename environment_superPAL (1).yml to environment_superPAL.yml	May 21, 2023
filterContained.py	filterContained.py	add alignment model	Dec 23, 2020
finalAlignmentPred.py	finalAlignmentPred.py	add run_glue script	Feb 26, 2023
main_predict.py	main_predict.py	Update main_predict.py	May 7, 2023
run_glue.py	run_glue.py	Update run_glue.py	Mar 1, 2023
utils.py	utils.py	Update utils.py	Feb 27, 2023

Repository files navigation

SuperPAL

Data, Code and Model for the paper "Summary-Source Proposition-level Alignment: Task, Datasets and Supervised Baseline".

If you find the code useful, please cite the following paper.

@inproceedings{ernst-etal-2021-summary,
  title = "Summary-Source Proposition-level Alignment: Task, Datasets and Supervised Baseline",
  author = "Ernst, Ori  and Shapira, Ori  and Pasunuru, Ramakanth  and Lepioshkin, Michael  and Goldberger, Jacob  and Bansal, Mohit  and Dagan, Ido", booktitle = "Proceedings of the 25th Conference on Computational Natural Language Learning", month = nov, year = "2021", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.conll-1.25", pages = "310--322",}

You can use our huggingface model or check our demo here.

run_glue.py script was forked from huggingface v2.5.1, and edited for our purpose.

supervised_oie_wrapper directory is a wrapper over AllenNLP's (v0.9.0) pretrained Open IE model that was implemented by Gabriel Stanovsky. It was forked from here, and edited for our purpose.

In this repository we used python-3.6. Please refer to environment_superPAL.yml for other requirements.

Manual Datasets

All manual datasets are under manual_datasets repository, including crowdsourced dev and test sets, and Pyramid-based train set.

As DUC-based datasets are limited to LDC agreement, we provide here only the character index of all propositions or sentences.

To restore the text alignments please use:

  python manual_datasets/restore_alignments.py -indx_csv_path <PATH_TO_THE_CSV_WITH_ALIGNMENTS_INDEXES>  -documents_path <PATH_TO_THE_DOCUMENTS_ARANGED_BY_TOPIC_DIRECTORIES> -summaries_path <SUMMARIES_PATH> -output_file <ALIGNMENTS_OUTPUT_FILE_PATH>

If you have any issue regarding the DUC alignment regeneration, please contact via email.

MultiNews alignments are released in full.

Data generation

Predicted alignments of MultiNews and CNN/DailyMail train and val datasets can be found here.

Alignment model

To apply aligment model on your own data, follow the following steps:

Download the trained model here.
Run

python main_predict.py -data_path <DATA_PATH>  -output_path <OUT_DIR_PATH>  -alignment_model_path  <ALIGNMENT_MODEL_PATH>

<DATA_PATH> should contain the following structure where a summary and its related document directory share the same name:

  - <DATA_PATH>
    - summaries
      - A.txt
      - B.txt
      - ...
    - A
      - doc_A1
      - doc_A2
      - ...
    - B
      - doc_B1
      - doc_B2
      - ...

It will create two files in <OUT_DIR_PATH>:

- 'dev.tsv' - contains all alignment candidate pairs.

- a '.csv' file - contains all predicted aligned pairs with their classification score.

To use the alignment model with your own data with different properties, you can inherent from the docSum2MRPC_Aligner class and overload the relevant functions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SuperPAL

Manual Datasets

Data generation

Alignment model

About

Releases

Packages

Languages

License

oriern/SuperPAL

Folders and files

Latest commit

History

Repository files navigation

SuperPAL

Manual Datasets

Data generation

Alignment model

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages