Re-implementation of the watermarking technique proposed in A Watermark for Large Language Models by Kirchenbauer & Geiping et. al. (Original repo).
Generating a (soft) watermarked text with your language model is as easy as:
from watermark import generate
# Loading the model
model = load_my_model().eval().to(device)
# Creating prior text
prior = torch.randint(0, vocab_size, (batch_size, 1)).to(device)
# Generating the watermarked text
watermarked = generate(model, prior, max_length=200, watermarked=True, gamma=0.5, delta=2)
Verfiying if a text was watermarked can be done as follows:
from watermarking import detect_watermark
# Text is a (B, T) tensor of idxs
z_score = detect_watermark(text, vocabulary_size, gamma=0.5)
if (z_score >= threshold):
print("Text has been AI-generated.")
Optionally, you can check a model's own perplexity of its generated text as follows:
from watermarking import get_perplexities
n_perplexities = get_perplexities(model, normal_text)
w_perplexities = get_perplexities(model, watermarked_text)
For more information, refer to this example.
With the plot.py script, you can plot the perplexity of the model against the Z-score for watermarked and non-watermarked sentences.
This image was generated by sampling 1'000 nonwatermarked and watermarked sentences using HuggingFace's GPT2 pre-trained model and multinomial sampling, a
By the image, we see that watermarked sentences have a much higher Z-score on average despite their relatively low perplexity.
The code is released with the MIT license.