Code for Convex Aggregation for Opinion Summarization.
The codebase provides an easy-to-use framework that enables the user to train and use text VAE models with different configurations.
You can also easily configure the architecture of the text VAE model without changing the code at all. You need to use a different Jsonnet file (perhaps with some modification) to train and use a model.
@inproceedings{iso21emnlpfindings,
title = {{C}onvex {A}ggregation for {O}pinion {S}ummarization},
author = {Hayate Iso and
Xiaolan Wang and
Yoshihiko Suhara and
Stefanos Angelidis and
Wang{-}Chiew Tan},
booktitle = {Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)},
month = {November},
year = {2021}
}
conda create -n coop python=3.7
conda activate coop
conda install -c conda-forge jsonnet sentencepiece # If needed
pip install git+https://github.com/megagonlabs/coop.git
or
git clone https://github.com/megagonlabs/coop.git
cd coop
pip install -e . # or python setup.py develop
Our unsupervised opinion summarization model can generate a summary by decoding the aggregated latent vectors of inputs.
The proposed framework, coop
will find the best summary based on the input-output overlap.
Here you can firstly encode the input reviews, reviews
, into the latent vectors, z_raw
:
from typing import List
import torch
from coop import VAE, util
model_name: str = "megagonlabs/bimeanvae-yelp" # or "megagonlabs/bimeanvae-amzn", "megagonlabs/optimus-yelp", "megagonlabs/optimus-amzn"
vae = VAE(model_name)
reviews: List[str] = [
"I love this ramen shop!! Highly recommended!!",
"Here is one of my favorite ramen places! You must try!"
]
z_raw: torch.Tensor = vae.encode(reviews) # [num_reviews * latent_size]
Given the latent vectors for input reviews, the model generates summaries from all combinations of latent vectors:
# All combinations of input reviews
idxes: List[List[int]] = util.powerset(len(reviews))
# Taking averages for all combinations of latent vectors
zs: torch.Tensor = torch.stack([z_raw[idx].mean(dim=0) for idx in idxes]) # [2^num_reviews - 1 * latent_size]
outputs: List[str] = vae.generate(zs)
outputs
Then, the output looks like this:
['I love this restaurant!! Highly recommended!!',
'Here is one of my favorite ramen places! You must try this place!',
'I love this place! Food is amazing!!']
Finally, our framework, Coop, selects the summary based on the input-output overlap:
# Input-output overlap is measured by ROUGE-1 F1 score.
best: str = max(outputs, key=lambda x: util.input_output_overlap(inputs=reviews, output=x))
best
Then, the selected summary based on the input-output overlap looks like this:
'Here is one of my favorite ramen places! You must try this place!'
You can easily get the generated examples and evaluate their performance with only 30 lines of code! Before doing so, you need to download the dev/test set by running the following command.
# Download dev and test set for evaluation
python scripts/get_summ.py yelp data/yelp
python scripts/get_summ.py amzn data/amzn
Then, you can get the generated examples as follows!
import json
from typing import List
import pandas as pd
import torch
import rouge
from coop import VAE, util
task = "yelp" # or "amzn"
split = "dev" # or "test"
data: List[dict] = json.load(open(f"./data/{task}/{split}.json"))
model_name: str = f"megagonlabs/bimeanvae-{task}" # or f"megagonlabs/optimus-{task}"
vae = VAE(model_name)
hypothesis = []
for ins in data:
reviews: List[str] = ins["reviews"]
z_raw: torch.Tensor = vae.encode(reviews)
idxes: List[List[int]] = util.powerset(len(reviews))
zs: torch.Tensor = torch.stack([z_raw[idx].mean(dim=0) for idx in idxes]) # [2^num_reviews - 1 * latent_size]
outputs: List[str] = vae.generate(zs, bad_words=util.BAD_WORDS) # First-person pronoun blocking
best: str = max(outputs, key=lambda x: util.input_output_overlap(inputs=reviews, output=x))
hypothesis.append(best)
reference: List[List[str]] = [ins["summary"] for ins in data]
evaluator = rouge.Rouge(metrics=["rouge-n", "rouge-l"], max_n=2, limit_length=False, apply_avg=True,
stemming=True, ensure_compatibility=True)
scores = pd.DataFrame(evaluator.get_scores(hypothesis, reference))
scores
All models are hosted on huggingface 🤗 model hub (https://huggingface.co/megagonlabs/).
Model name | Training Data | Encoder | Decoder |
---|---|---|---|
megagonlabs/bimeanvae-yelp | Yelp | BiLSTM + Mean Pooling | LSTM |
megagonlabs/optimus-yelp | Yelp | bert-base-cased | gpt2 |
megagonlabs/bimeanvae-amzn | Amazon | BiLSTM + Mean Pooling | LSTM |
megagonlabs/optimus-amzn | Amazon | bert-base-cased | gpt2 |
VAE
automatically downloads model checkpoints from the model hub.
Yelp dataset (Chu and Liu, 2019)
Model name | Aggregation | ROUGE-1 F1 | ROUGE-2 F1 | ROUGE-L F1 |
---|---|---|---|---|
megagonlabs/bimeanvae-yelp | SimpleAvg | 32.87 | 6.93 | 19.89 |
megagonlabs/bimeanvae-yelp | Coop | 35.37 | 7.35 | 19.94 |
megagonlabs/optimus-yelp | SimpleAvg | 31.23 | 6.48 | 18.27 |
megagonlabs/optimus-yelp | Coop | 33.68 | 7.00 | 18.95 |
Amazon dataset (Bražinskas et al., 2020)
Model name | Aggregation | ROUGE-1 F1 | ROUGE-2 F1 | ROUGE-L F1 |
---|---|---|---|---|
megagonlabs/bimeanvae-amzn | SimpleAvg | 33.60 | 6.64 | 20.87 |
megagonlabs/bimeanvae-amzn | Coop | 36.57 | 7.23 | 21.24 |
megagonlabs/optimus-amzn | SimpleAvg | 33.54 | 6.18 | 19.34 |
megagonlabs/optimus-amzn | Coop | 35.32 | 6.22 | 19.84 |
$ unzip coop.zip && cd coop
$ conda create -n coop python=3.7
$ conda activate coop
$ conda install -c conda-forge jsonnet sentencepiece # If needed
$ pip install -r requirements.txt
Download the Yelp dataset from this link.
You only need the JSON file (yelp_dataset.tar
).
Move the file to data/yelp
and uncompress it. You only need yelp_academic_dataset_review.json
$ tar -xvf yelp_dataset.tar
$ YELP_RAW=$(pwd)/yelp_academic_dataset_review.json
Run the following preprocessing scripts. This may take several hours, depending on your machine spec.
$ mkdir -p ./data/yelp
$ python scripts/preprocess.py yelp $YELP_RAW > ./data/yelp/train.jsonl
Additionally, you need to download the reference summaries from this link provided by MeanSum
Run the following command to download and preprocess it.
This will create dev.json
and test.json
, which follow the dev/test splits
defined in the original MeanSum paper.
$ python scripts/get_summ.py yelp data/yelp
$ ls data/yelp
train.jsonl
dev.json
test.json
Download the Amazon dataset from this link. You only need the following files for 4 categories:
- Clothing_Shoes_and_Jewelry.json.gz
- Electronics.json.gz
- Health_and_Personal_Care.json.gz
- Home_and_Kitchen.json.gz
Run the script to download the datasets. You don't need to uncompress them.
$ mkdir amzn_raw && cd amzn_raw
$ wget -P data/amazon http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Clothing_Shoes_and_Jewelry.json.gz
$ wget -P data/amazon http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Electronics.json.gz
$ wget -P data/amazon http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Health_and_Personal_Care.json.gz
$ wget -P data/amazon http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Home_and_Kitchen.json.gz
$ AMZN_RAW=$(pwd)
$ ls $AMZN_RAW
Clothing_Shoes_and_Jewelry.json.gz
Electronics.json.gz
Health_and_Personal_Care.json.gz
Home_and_Kitchen.json.gz
$ cd -
Run the following preprocessing script. This may take several hours, depending on your machine spec.
$ mkdir -p ./data/amzn
$ python scripts/preprocess.py amzn $AMZN_RAW > ./data/amzn/train.jsonl
Download the reference summaries from this link provided by CopyCat.
Run the following command to download and preprocess it. This will create dev.json and test.json, which follow the dev/test splits defined in the original CopyCat paper.
$ python scripts/get_summ.py amzn data/amzn
$ ls data/amzn
train.jsonl
dev.json
test.json
config
directory contains the configuration files used for the experiments. You can copy it and edit the configuration file to run experiments in different settings.
local lib = import '../utils.libsonnet';
local data_type = "yelp";
local latent_dim = 512;
local free_bit = 0.25;
local num_steps = 100000;
local checkout_step = 1000;
local batch_size = 256;
local lr = 1e-3;
{
"data_dir": "./data/%s" % data_type,
"spm_path": "./data/sentencepiece/%s.model" % data_type,
"model": lib.BiMeanVAE(latent_dim, free_bit),
"trainer": lib.VAETrainer(num_steps, checkout_step, batch_size, lr)
}
To train the model, you can run the following script with config
file and the directory to save checkpoints.
$ python train.py <config filepath> -s <model dir path>
For example,
$ python train.py config/bimeanvae/yelp.jsonnet -s log/bimeanvae/yelp/ex1
To evaluate the model with our proposed framework, coop
, you can simply run the following:
$ python coop/search.py <model dir path>
For example,
$ python coop/search.py log/bimeanvae/yelp/ex1