llama : lookahead decoding example #4157

wsxiaoys · 2023-11-21T22:26:47Z

Claim providing 1.5~2x decoding speedup without a speculative model

Blog post: https://lmsys.org/blog/2023-11-21-lookahead-decoding/
Twitter thread: https://twitter.com/lmsysorg/status/1727056892671950887
Reference implementation: https://github.com/hao-ai-lab/LookaheadDecoding/tree/main

bobqianic · 2023-11-22T02:03:06Z

How does Medusa differ from this method?
https://github.com/FasterDecoding/Medusa

someone13574 · 2023-11-22T02:11:21Z

How does Medusa differ from this method? https://github.com/FasterDecoding/Medusa

I believe (correct me if I'm wrong) that it doesn't require extra training of the model.

KerfuffleV2 · 2023-11-22T03:21:44Z

How does Medusa differ from this method?

The blog link actually mentions Medusa specifically and then talks about how their approach is different.

SlyEcho · 2023-11-22T22:12:21Z

It certainly seems a little faster to me. From 30 t/s to 40 t/s on the LLaMA2-7B-chat example.

shermansiu · 2023-11-25T04:08:07Z

Also, lookahead decoding (LADE) seems to be constrained by the number of FLOPS available in consumer GPUs. I'm not sure how this will translate to CPU/RAM requirements, but whether LADE delivers improvements in performance seems to depend on how powerful your hardware is and whether the LADE parameters are optimized for your hardware.

huggingface/transformers#27649 (comment)

ggerganov · 2023-11-26T18:33:40Z

Example in #4207

wsxiaoys added the enhancement New feature or request label Nov 21, 2023

ggerganov changed the title ~~Lookahead decoding example~~ llama : lookahead decoding example Nov 23, 2023

ggerganov self-assigned this Nov 23, 2023

ggerganov added this to ggml : roadmap Nov 23, 2023

ggerganov moved this to In Progress in ggml : roadmap Nov 23, 2023

ggerganov mentioned this issue Nov 24, 2023

lookahead : add example for lookahead decoding #4207

Merged

2 tasks

ggerganov closed this as completed Nov 26, 2023

ggerganov moved this from In Progress to Done in ggml : roadmap Nov 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : lookahead decoding example #4157

llama : lookahead decoding example #4157

wsxiaoys commented Nov 21, 2023 •

edited

Loading

bobqianic commented Nov 22, 2023

someone13574 commented Nov 22, 2023

KerfuffleV2 commented Nov 22, 2023

SlyEcho commented Nov 22, 2023

shermansiu commented Nov 25, 2023

ggerganov commented Nov 26, 2023

llama : lookahead decoding example #4157

llama : lookahead decoding example #4157

Comments

wsxiaoys commented Nov 21, 2023 • edited Loading

bobqianic commented Nov 22, 2023

someone13574 commented Nov 22, 2023

KerfuffleV2 commented Nov 22, 2023

SlyEcho commented Nov 22, 2023

shermansiu commented Nov 25, 2023

ggerganov commented Nov 26, 2023

wsxiaoys commented Nov 21, 2023 •

edited

Loading