Watermarking for language models

Description

Re-implementation of the watermarking technique proposed in A Watermark for Large Language Models by Kirchenbauer & Geiping et. al. (Original repo).

Usage

Generating a (soft) watermarked text with your language model is as easy as:

from watermark import generate

# Loading the model
model = load_my_model().eval().to(device)

# Creating prior text
prior = torch.randint(0, vocab_size, (batch_size, 1)).to(device)

# Generating the watermarked text
watermarked = generate(model, prior, max_length=200, watermarked=True, gamma=0.5, delta=2)

Verfiying if a text was watermarked can be done as follows:

from watermarking import detect_watermark

# Text is a (B, T) tensor of idxs
z_score = detect_watermark(text, vocabulary_size, gamma=0.5)

if (z_score >= threshold):
    print("Text has been AI-generated.")

Optionally, you can check a model's own perplexity of its generated text as follows:

from watermarking import get_perplexities

n_perplexities = get_perplexities(model, normal_text)
w_perplexities = get_perplexities(model, watermarked_text)

For more information, refer to this example.

Plotting

With the plot.py script, you can plot the perplexity of the model against the Z-score for watermarked and non-watermarked sentences.

This image was generated by sampling 1'000 nonwatermarked and watermarked sentences using HuggingFace's GPT2 pre-trained model and multinomial sampling, a $seq_{len}=200$, $\gamma = 0.5$ and $\delta=2$ for watermarking. The hash function is the default python hash function applied to the "stringyfied" tensor of the word index in the vocabulary.

By the image, we see that watermarked sentences have a much higher Z-score on average despite their relatively low perplexity.

License

The code is released with the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github		.github
src		src
.gitignore		.gitignore
LICENSE		LICENSE
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Watermarking for language models

Description

Usage

Plotting

License

About

Releases 1

Packages

Languages

License

BrianPulfer/LMWatermark

Folders and files

Latest commit

History

Repository files navigation

Watermarking for language models

Description

Usage

Plotting

License

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages