ExtraPhrase

This repository contains the code for generating pseudo summary data using the ExtraPhrase method as proposed in the paper:

ExtraPhrase: Efficient Data Augmentation for Abstractive Summarization

Mengsay Loem, Sho Takase, Masahiro Kaneko, and Naoaki Okazaki

In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, pages 16–24, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics.

Sample Data

Gigaword: data

CNN/DailyMail: data

How to Use the Code

Step 1: Extractive Summarization

Installation: Make sure you have Python 3.x and Spacy installed. You can install Spacy by running pip install spacy.
Download Language Model: Download the English language model for Spacy by running python -m spacy download en_core_web_sm.

Example Script

Here is an example script to run the extractive summarization:

python extractive_summarization.py \
--input_file dummy_data/input.json \
--depth_ratio 0.5 \
--group_tokens \
--output_file dummy_data/step1_output.json

--input_file: Path to the input text file in JSON format.
--depth_ratio: Ratio of tree depth to prune.
--group_tokens: Optional flag to group some token nodes before pruning.
--output_file: Path to the output file.

Step 2: Paraphrasing

Installation: Make sure you have the Hugging Face Transformers library installed. You can install it by running pip install transformers.

Example Script

Here is an example script to run the paraphrasing:

python paraphrasing.py \
--input_file dummy_data/step1_output.json \
--src_tgt_model facebook/wmt19-en-de \
--tgt_src_model facebook/wmt19-de-en \
--output_file dummy_data/step2_output.json \
--use_gpu

--input_file: Path to the input text file generated from Step 1.
--src_tgt_model: Path or name of the model for translation from source language to target language.
--tgt_src_model: Path or name of the model for translation from target language back to source language.
--output_file: Path to the output file.
--use_gpu: Optional flag to enable GPU usage for inference.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
dummy_data		dummy_data
README.md		README.md
extractive_summarization.py		extractive_summarization.py
paraphrasing.py		paraphrasing.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ExtraPhrase

Sample Data

How to Use the Code

Step 1: Extractive Summarization

Example Script

Step 2: Paraphrasing

Example Script

About

Releases

Packages

Languages

loem-ms/ExtraPhrase

Folders and files

Latest commit

History

Repository files navigation

ExtraPhrase

Sample Data

How to Use the Code

Step 1: Extractive Summarization

Example Script

Step 2: Paraphrasing

Example Script

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages