Speculative Sampling

A simple implementation of Accelerating Large Language Model Decoding with Speculative Sampling in NumPy for GPT-2. See main.py. I also wrote a blog post for this implementation.

Install Dependencies:

pip install -r picoGPT/requirements.txt

Tested on Python 3.9.10.

Usage:

python main.py \
    --prompt "Alan Turing theorized that computers would one day become" \
    --n_tokens_to_generate 40 \
    --draft_model_size "124M" \
    --target_model_size "1558M" \
    --K 4 \
    --temperature 0 # 0 for greedy sampling

Which outputs:

Autoregressive Decode
---------------------
Time = 60.64s
Text = Alan Turing theorized that computers would one day become so powerful that they would be able to think like humans.

In the 1950s, he proposed a way to build a computer that could think like a human. He called it the "T

Speculative Decode
------------------
Time = 27.15s
Text = Alan Turing theorized that computers would one day become so powerful that they would be able to think like humans.

In the 1950s, he proposed a way to build a computer that could think like a human. He called it the "T

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Speculative Sampling

Files

README.md

Latest commit

History

README.md

File metadata and controls

Speculative Sampling