Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

temperature in kirch #4

Open
nivancat opened this issue Apr 19, 2024 · 9 comments
Open

temperature in kirch #4

nivancat opened this issue Apr 19, 2024 · 9 comments

Comments

@nivancat
Copy link

Hi! I had a question regarding order of the operations in your implementation of the maryland scheme. Why do you first add delta and then do temperature? Doesnt it contradict 4.2 in the original paper about effect of delta on the perplexity (see first two paragraphs)

@pierrefdz
Copy link
Contributor

Hi!
The temperature is usually applied during the softmax, so the delta needs to be applied before, no? You suggest to first divide the logits by the temperature, then add the delta, then softmax?

@nivancat
Copy link
Author

nivancat commented May 7, 2024

The scheme you describe breaks the 4.2 of the original paper in that it changes the perplexity

@pierrefdz
Copy link
Contributor

Hi, could you be more precise?

@nivancat
Copy link
Author

nivancat commented May 7, 2024

Quote from the paper Sec 4.2 page 5 (https://arxiv.org/pdf/2301.10226)

"A soft watermark has very little impact on the perplexity of
tokens with extremely high or low entropy. When the distribution produced by the language model is uniform (maximal
entropy), the randomness of the green list results in tokens
being uniformly sampled, and the perplexity remains untouched. Conversely, in the case of minimal entropy, where
all probability mass is concentrated on a single token, the
soft watermark rule has no effect and there is once again no
impact on perplexity."

@pierrefdz
Copy link
Contributor

Yes, I've read this, but I don't understand your comment. Can you explain what you mean by "Doesnt it contradict 4.2 in the original paper about effect of delta on the perplexity (see first two paragraphs)"

@nivancat
Copy link
Author

nivancat commented May 7, 2024

If you were to compare both constructions which one would have no impact on the final perplexity

@sdathath
Copy link

Yes, I've read this, but I don't understand your comment. Can you explain what you mean by "Doesnt it contradict 4.2 in the original paper about effect of delta on the perplexity (see first two paragraphs)"

Consider a scenario when you first watermark and then do top-k, and you measure perplexity with respect to the original model (which includes top-k as part of the model). If a non top-k token gets pushed into the top-k as a consequence of watermarking, your perplexity will be infinite.... so that theorem does not hold, and the implementation directly contradicts 4.3 in the original paper, which bounds the change in perplexity under the original unwatermarked model.

@pierrefdz
Copy link
Contributor

Yes, I've read this, but I don't understand your comment. Can you explain what you mean by "Doesnt it contradict 4.2 in the original paper about effect of delta on the perplexity (see first two paragraphs)"

Consider a scenario when you first watermark and then do top-k, and you measure perplexity with respect to the original model (which includes top-k as part of the model). If a non top-k token gets pushed into the top-k as a consequence of watermarking, your perplexity will be infinite.... so that theorem does not hold, and the implementation directly contradicts 4.3 in the original paper, which bounds the change in perplexity under the original unwatermarked model.

You can look at the implementation of the original authors here

https://github.com/jwkirchenbauer/lm-watermarking

@sdathath
Copy link

Yes, I think that the implementation is inconsistent with the theory presented as far I understand (which is what the OP seems to be hinting towards).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants