-
-
Notifications
You must be signed in to change notification settings - Fork 675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved Sampling (Nucleus Sampling) #51
Comments
Here's a sample implementation of top-k and nucleus sampling (top-p) in Pytorch: https://gist.github.com/thomwolf/1a5a29f6962089e871b94cbd09daf317 |
Although neat, that's beyond the scope of this package. |
Never mind, Neil Sheppard added an implementation for it: nshepperd/gpt-2@87fe3d7 I can merge that. |
Added in 0.5 as a |
I am trying to understand this example, now that v0.5 includes nucleus sampling. I need to read about the meaning of However, you might be able to tell me more about the temperature for top-k sampling and nucleus sampling. Is it set to 1.0? 0.7? 0.9? Or doesn't it matter too much? For top-k sampling, temperature should influence the sampling (because probabilities change with temperature), but the top-k tokens (and their order) remain the same. For nucleus sampling, the parameter |
I could not find the info regarding the value of the temperature in the paper. I guess the authors chose temperature = 1, so that it has no effect on the probabilities, according to formula (4). However, I am not sure because Figure 8 relies on a temperature of 0.8. In the code of this Python module, I see that the temperature is used at this line, even for nucleus sampling. So, it is good to keep in mind that the number of top tokens depends on two parameters ( logits = next_outputs['logits'][:, -1, :] / tf.to_float(temperature)
if top_p > 0.0:
logits = top_p_logits(logits, p=top_p)
else:
logits = top_k_logits(logits, k=top_k)
samples = tf.multinomial(
logits, num_samples=1, output_dtype=tf.int32) tl;dr: if you change |
My interpretation is that like |
Do we have a beam search implementation in this package? It is not performing as good as top-k, but maybe it's good to also include it for completeness? |
@bob80333 Excuse me, i'm a freshman in text generation domain. I've just read this paper and i have a question that what's the meaning of 'Gold sampling' and how is it implemented? |
As I understand it, 'Gold' was the actual continuation of the text sample, aka ground truth. |
@bob80333 I get it, thx! |
In a recent paper, they find that beam search actually does even worse than greedy search for GPT-2 117M, and they propose a new sampling method to improve text output.
Neural Text Degradation
From what I understand, it's top-k sampling, except instead of
k
tokens, it takes all tokens until probability of these tokens combined is > some parameterp
.Here's an example from the paper:
🗣Context:
On days when he woke early, the president
liked to have coffee in the oval office. There was
something about watching the sky grow lighter and
lighter as you sat your pajama'd behind in the most
powerful chair in the free world, sipping marine strength
coffee, that you just couldn't achieve anywhere else.
💰Greedy: He'd been in the oval office for over a year
now, and he'd never been so happy to be there. He'd
been in the oval office for almost a year, and he'd never
been so happy to be there.
🚦BeamSearch (k=4): The President was a man of his
word. He was a man of his word. He was a man of his
word. He was a man of his word. He was a man of his
word. He was a man of his word.
🥄Sampling (t=1.0): You couldn't be sure if that's what
you were really doing, and If you decided to take the day
off. The president wanted you to take the day off, but he
wanted to maintain a curfew and use his influence wisely.
🇰Top-k Sampling (k=8): "I'm sorry, sir." "No, it's okay,
son," the president said. "I understand." "You're going
to have to make a special trip down there to get that kid.
He has no idea where he's going."
⚛️Nucleus Sampling (p=0.9): But that wasn't what drew
the president's attention. He'd been seated for maybe a
minute when he noticed the other man. What was the
guy doing here?
🗣Gold: He was therefore disagreeably surprised to find
a man in an understated grey suit sitting in that selfsame
chair sipping tea. The president turned around and went
looking for his chief of staff.
The text was updated successfully, but these errors were encountered: