GRACE

This code is for the Findings of ACL 2023 paper: GRACE: Gradient-guided Controllable Retrieval for Augmenting Attribute-based Text Generation. If you use this code or results from our paper, please cite:

@inproceedings{GRACE,
    title = "GRACE: Gradient-guided Controllable Retrieval for Augmenting Attribute-based Text Generation",
    author = "Zhihua Wen, Zhiliang Tian, Zhen Huang, Yuxin Yang, Zexin Jian, Changjian Wang and Dongsheng Li",
    booktitle = "Findings of ACL 2023",
}

Setup

Generator

Please download GPT2-medium for generation, which can be replaced by any type of auto-regressive generation models (e.g., GPT2-xl and gpt3).
Convert the downloaded pytorch_model.bin into checkpoint_best.pt file required by Fairseq and save the checkpoint in models/gpt2-medium.

Discriminator

We split the attribute classification datasets (IMDB and AG news in our work) and use only the half of each dataset to train the discriminator and build the retrieval repository. (another half is for training evaluator)
train_GPT2_imdb_evaluator.sh and train_GPT2_agnews_evaluator.sh fine-tune GPT2-medium to obtain sentiment and topic discriminator respectively. They are used to extract attribute-augmented context representations for unlabelled corpus.

We will release our fine-tuned checkpoints soon.

Retrieval Repository

Please follow urvashik/knnlm to build the retrieval repository. We make minor modifications in build_dstore.py (we support sub-sentence level representation extraction instead of a fix-length window over the whole document).
Use extract_gpt2_vecs.sh to extract attribute-augmented context representations.

Evaluation

train_imdb_evaluator.sh and train_agnews_evaluator.sh use another half of the attribute datasets to train BERT-based evaluators to evaluate the attribute expressed by GRACE.
We also use existing Huggingface sentiment classifier to evaluate sentiment accuracy.

Usage

Run sentimen_gen.sh and topic_gen.sh to generate sentences for different attributes, where --refine is to allow gradient-based generation, –-k controls the number of retrieval results, --similar-condition-prob echoes the threshold $p$ in our paper, and --max-control-step defines the maximum number of retrieval steps.
For sentiment-controlled generation, we support positive and negative sentiment. For topic-controlled generation, we support business, polities, technology, and world news (world).

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
discriminator		discriminator
models/gpt2-medium		models/gpt2-medium
repo		repo
scripts		scripts
README.md		README.md
__init__.py		__init__.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GRACE

Setup

Generator

Discriminator

Retrieval Repository

Evaluation

Usage

About

Releases

Packages

Languages

araloak/grace

Folders and files

Latest commit

History

Repository files navigation

GRACE

Setup

Generator

Discriminator

Retrieval Repository

Evaluation

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages