Code for EMNLP2021 paper “Transductive Learning for Unsupervised Text Style Transfer”.
- pytorch == 1.2.0
- kenlm
- transformers == 3.5.0
- nltk == 3.4.5
- Elasticsearch
Generated results (outputs) of our model are in the ./outputs/
directory.
First of all, you should rename the data file to the following format. Each line in the file is an original sentence.
-
Yelp
- The dataset can be downloaded here. You need to format the file names as
{train/dev/test/reference}.{pos/neg}
, and put them in the./data/yelp/
.
- The dataset can be downloaded here. You need to format the file names as
-
GYAFC
- Please refer to the repo. You need to format the file names as
{train/dev/test/reference}.{informal/formal}
, and put them in the./data/gyafc/
. Multiple references are distinguished by adding a number at the end, likereference.informal.0
,reference.informal.1
,reference.informal.2
,reference.informal.3
.
- Please refer to the repo. You need to format the file names as
python preprocess.py --data_dir ./data/ --dataset [yelp/gyafc]
The detailed training parameters of each dataset are in config.py
.
python main.py --dataset [yelp/gyafc]
The generated sentences of each epoch are saved in the ./outputs/[yelp/gyafc]/
.
- Dense retriever. modify the
config.retriever = dense
. - Sparse retriever. modify the
config.retriever = sparse
, and set the elasticsearch server and index inconfig.py
. Please note that theget_query
inmodel/SparseRetriever
should return the relevant sentences.
You can download pretrained classifiers and language models (KenLM) here, or train them by yourself as follows.
- Train the BERT classifiers using
evaluator/train_classifier.py
.
python evaluator/train_classifier.py --dataset yelp --pretrained_dir_or_name bert-base-uncased --max_len 18 --save_model evaluator/classifier/
# or
python evaluator/train_classifier.py --dataset gyafc --pretrained_dir_or_name bert-base-uncased --max_len 32 --save_model evaluator/classifier/
- Train the KenLM
export kenlm=$YOUR_KENLM_DIR/build/bin/
chmod +x evaluator/train_lm.sh
bash evaluator/train_lm.sh yelp
# or
bash evaluator/train_lm.sh gyafc
- Evaluate the results
python evaluator/evaluator.py --dataset yelp --file outputs/yelp/epoch_10
It will report the automatic metrics of outputs/yelp/epoch_10.pos
and outputs/yelp/epoch_10.neg
@inproceedings{xiao2021transductive,
title = "Transductive Learning for Unsupervised Text Style Transfer",
author = "Fei, Xiao and
Liang, Pang and
Yanyan, Lan and
Yan, Wang and
Huawei Shen and
Xueqi Cheng",
booktitle = "Proceedings of the 2021 Conference on Empirical
Methods in Natural Language Processing (EMNLP)",
month = nov,
year = "2021",
publisher = "Association for Computational Linguistics",
}