This repository presents the implementation of the ACL 2022 paper:
Unsupervised Extractive Opinion Summarization Using Sparse Coding,
Somnath Basu Roy Chowdhury, Chao Zhao and Snigdha Chaturvedi
The implementation of SemAE is based on the open-source framework of Quantized Transformer.
Download the SPACE corpus from this link. Amazon dataset is publicly available here.
For Amazon dataset, the data was processed using instruction from here.
To directly access the data used in our experiments, use the files in this link as the data/
folder. Please cite the respective papers if you are using the above datasets.
-
Python version:
python3.6
-
Dependencies: Use the
requirements.txt
file and conda/pip to install all necessary dependencies. E.g., for pip:pip install -U pip pip install -U setuptools pip install -r requirements.txt
To train SemAE on a subset of the training set using a GPU, go to the ./src
directory and run the following:
python3 train.py --max_num_entities 500 --run_id space_run --gpu 0
This will train a SemAE model with default hyperparameters (for general
summarization), store tensorboard logs under ./logs
and save a
model snapshot after every epoch under ./models
(filename:
space_run_<epoch>_model.pt
).
For training the full model on SPACE, run the following:
cd scripts/
chmod +x train_space.sh
./train_space.sh
For training the model on full Amazon dataset, please run scripts/train_amazon.sh
bash script in a similar manner.
To perform general opinion summarization with a trained SemAE model, go to the ./src
directory and run the following:
python3 inference.py \
--model ../models/space_run_10_model.pt \
--run_id space_run \
--gpu 0
This will store the summaries under ./outputs/space_run
and also the output of ROUGE evaluation in ./outputs/eval_space_run.json
.
For aspect opinion summarization, run:
python3 aspect_inference.py \
--model ../models/space_run_10_model.pt \
--sample_sentences --run_id aspects_run \
--gpu 0
The summarization scripts for SPACE and Amazon are: scripts/evaluate_*.sh
@inproceedings{chowdhury2022unsupervised,
title = {Unsupervised Extractive Opinion Summarization Using Sparse Coding},
author = {Basu Roy Chowdhury, Somnath and
Zhao, Chao and
Chaturvedi, Snigdha},
booktitle = {ACL},
year = {2022},
}