Improved Sampling (Nucleus Sampling) #51

bob80333 · 2019-05-16T01:56:54Z

In a recent paper, they find that beam search actually does even worse than greedy search for GPT-2 117M, and they propose a new sampling method to improve text output.

Neural Text Degradation

From what I understand, it's top-k sampling, except instead of k tokens, it takes all tokens until probability of these tokens combined is > some parameter p.

Here's an example from the paper:

🗣Context:
On days when he woke early, the president
liked to have coffee in the oval office. There was
something about watching the sky grow lighter and
lighter as you sat your pajama'd behind in the most
powerful chair in the free world, sipping marine strength
coffee, that you just couldn't achieve anywhere else.

💰Greedy: He'd been in the oval office for over a year
now, and he'd never been so happy to be there. He'd
been in the oval office for almost a year, and he'd never
been so happy to be there.

🚦BeamSearch (k=4): The President was a man of his
word. He was a man of his word. He was a man of his
word. He was a man of his word. He was a man of his
word. He was a man of his word.

🥄Sampling (t=1.0): You couldn't be sure if that's what
you were really doing, and If you decided to take the day
off. The president wanted you to take the day off, but he
wanted to maintain a curfew and use his influence wisely.

🇰Top-k Sampling (k=8): "I'm sorry, sir." "No, it's okay,
son," the president said. "I understand." "You're going
to have to make a special trip down there to get that kid.
He has no idea where he's going."

⚛️Nucleus Sampling (p=0.9): But that wasn't what drew
the president's attention. He'd been seated for maybe a
minute when he noticed the other man. What was the
guy doing here?

🗣Gold: He was therefore disagreeably surprised to find
a man in an understated grey suit sitting in that selfsame
chair sipping tea. The president turned around and went
looking for his chief of staff.

The text was updated successfully, but these errors were encountered:

bob80333 · 2019-05-16T02:02:45Z

Here's a sample implementation of top-k and nucleus sampling (top-p) in Pytorch:

https://gist.github.com/thomwolf/1a5a29f6962089e871b94cbd09daf317

minimaxir · 2019-05-16T03:53:43Z

Although neat, that's beyond the scope of this package.

minimaxir · 2019-05-16T18:24:03Z

Never mind, Neil Sheppard added an implementation for it: nshepperd/gpt-2@87fe3d7

I can merge that.

minimaxir · 2019-05-20T03:55:11Z

Added in 0.5 as a top_p parameter.

woctezuma · 2019-05-20T10:16:19Z

I am trying to understand this example, now that v0.5 includes nucleus sampling.

I need to read about the meaning of k=4 in BeamSearch.
Edit: I guess the likelihood is computed for 4-grams.

However, you might be able to tell me more about the temperature for top-k sampling and nucleus sampling. Is it set to 1.0? 0.7? 0.9? Or doesn't it matter too much?

For top-k sampling, temperature should influence the sampling (because probabilities change with temperature), but the top-k tokens (and their order) remain the same.

For nucleus sampling, the parameter p might need to be changed if the temperature is changed, since it is checked against the probabilities of the tokens, which have different values depending on the temperature. It is like a top-k sampling where k takes different values during generation, and depends on both p and the temperature.

woctezuma · 2019-05-20T15:35:53Z

I could not find the info regarding the value of the temperature in the paper. I guess the authors chose temperature = 1, so that it has no effect on the probabilities, according to formula (4). However, I am not sure because Figure 8 relies on a temperature of 0.8.

In the code of this Python module, I see that the temperature is used at this line, even for nucleus sampling. So, it is good to keep in mind that the number of top tokens depends on two parameters (p and temperature) for nucleus sampling!

            logits = next_outputs['logits'][:, -1, :] / tf.to_float(temperature)
            if top_p > 0.0:
                logits = top_p_logits(logits, p=top_p)
            else:
                logits = top_k_logits(logits, k=top_k)
            samples = tf.multinomial(
            logits, num_samples=1, output_dtype=tf.int32)

tl;dr: if you change p, you change the number of top tokens ; if you change the temperature with nucleus sampling, you change both the number of top tokens and the probability for the top tokens.

minimaxir · 2019-05-20T22:42:20Z

My interpretation is that like top_k, top_p is a constraint on the craziness of the output. (which is why it might be less effective on fine-tuned datasets, but we'll see.)

alexanderhanboli · 2019-09-18T21:13:55Z

Do we have a beam search implementation in this package? It is not performing as good as top-k, but maybe it's good to also include it for completeness?

ty5491003 · 2019-09-26T03:11:48Z

@bob80333 Excuse me, i'm a freshman in text generation domain. I've just read this paper and i have a question that what's the meaning of 'Gold sampling' and how is it implemented?
Thx.

bob80333 · 2019-09-26T03:21:04Z

As I understand it, 'Gold' was the actual continuation of the text sample, aka ground truth.

ty5491003 · 2019-09-26T03:42:00Z

@bob80333 I get it, thx!

minimaxir closed this as completed May 16, 2019

minimaxir reopened this May 16, 2019

minimaxir added this to the v0.5 milestone May 16, 2019

minimaxir closed this as completed May 20, 2019

jchwenger mentioned this issue Jan 24, 2020

Beam search decoding during inference doesn't generate good text. asyml/texar#265

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved Sampling (Nucleus Sampling) #51

Improved Sampling (Nucleus Sampling) #51

bob80333 commented May 16, 2019

bob80333 commented May 16, 2019

minimaxir commented May 16, 2019

minimaxir commented May 16, 2019

minimaxir commented May 20, 2019

woctezuma commented May 20, 2019 •

edited

Loading

woctezuma commented May 20, 2019 •

edited

Loading

minimaxir commented May 20, 2019 •

edited

Loading

alexanderhanboli commented Sep 18, 2019

ty5491003 commented Sep 26, 2019 •

edited

Loading

bob80333 commented Sep 26, 2019

ty5491003 commented Sep 26, 2019

Improved Sampling (Nucleus Sampling) #51

Improved Sampling (Nucleus Sampling) #51

Comments

bob80333 commented May 16, 2019

bob80333 commented May 16, 2019

minimaxir commented May 16, 2019

minimaxir commented May 16, 2019

minimaxir commented May 20, 2019

woctezuma commented May 20, 2019 • edited Loading

woctezuma commented May 20, 2019 • edited Loading

minimaxir commented May 20, 2019 • edited Loading

alexanderhanboli commented Sep 18, 2019

ty5491003 commented Sep 26, 2019 • edited Loading

bob80333 commented Sep 26, 2019

ty5491003 commented Sep 26, 2019

woctezuma commented May 20, 2019 •

edited

Loading

woctezuma commented May 20, 2019 •

edited

Loading

minimaxir commented May 20, 2019 •

edited

Loading

ty5491003 commented Sep 26, 2019 •

edited

Loading