pf-decoding

Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs

Introduction

We propose a new decoding method called Permute-and-Flip (PF) decoder. It enjoys robustness properties similar to the standard sampling decoder, but is provably up to 2x better in its quality-robustness tradeoff than sampling and never worse than any other decoder. We also design a cryptographic watermarking scheme analogous to Aaronson's Gumbel watermark, but naturally tailored for PF decoder. The watermarking scheme does not change the distribution to sample, while allowing arbitrarily low false positive rate and high recall whenever the generated text has high entropy. Our experiments show that the PF decoder (and its watermarked counterpart) significantly outperform(s) naive sampling (and it's Gumbel watermarked counterpart) in terms of perplexity, while retaining the same robustness (and detectability), hence making it a promising new approach for LLM decoding.

Algorithm

The PF decoder is a simple and efficient algorithm that can be used to decode any LLM. It is based on the idea of sampling from the distribution of the LLM, but with a twist. The algorithm is as follows:

Watermarking

We also propose a watermarking scheme for the PF decoder. The watermarking scheme is as follows:

Code

The code is written in Python and uses PyTorch. You can run the code using the following command:

python run.py --model_name 'NousResearch/Llama-2-7b-hf' --prompt_path 'data/c4.jsonl' --temperature 0.9 --top_p 1.0 --ngram 8 --max_gen_len 256 --nsamples 600 --batch_size 8

You can set the parameters in the run.py file.

Acknowledgements

We thank the authors of the following research works and open-source projects:

Three Bricks to Consolidate Watermarks for Large Language Models

Citation

If you find this work useful, please consider citing our paper:

@article{zhao2024permute,
  title={Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs},
  author={Zhao, Xuandong and Li, Lei and Wang, Yu-Xiang},
  journal={arXiv preprint arXiv:2402.05864},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
fig		fig
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
detect.py		detect.py
generate.py		generate.py
run.py		run.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pf-decoding

Introduction

Algorithm

Watermarking

Code

Acknowledgements

Citation

About

Releases

Packages

Languages

License

XuandongZhao/pf-decoding

Folders and files

Latest commit

History

Repository files navigation

pf-decoding

Introduction

Algorithm

Watermarking

Code

Acknowledgements

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages