-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add document for speculative decoding #9492
Conversation
Thanks for your contribution! |
llm/docs/predict/inference.md
Outdated
|
||
- `speculate_max_ngram_size`: ngram 匹配 draft tokens 时的最大窗口大小,默认值为`1`。inference_with_reference 算法中会先从 prompt 中使用 ngram 窗口滑动匹配 draft tokens,窗口大小和输入输出重叠程度共同决定了产生 draft tokens 的开销从而影响 inference_with_reference 算法的加速效果。 | ||
|
||
- `speculate_verify_window`: 投机解码 verify 策略默认采用 TopP + TopK 验证中的 K,默认值为`2`。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的window含义不是K,是指在这个window中的所有draft tokens,需要被topk策略同时接收,否则被同时拒绝
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的
llm/docs/predict/inference.md
Outdated
|
||
- `speculate_verify_window`: 投机解码 verify 策略默认采用 TopP + TopK 验证中的 K,默认值为`2`。 | ||
|
||
- `speculate_max_candidate_len`: 产生的最大候选 tokens 数目,根据候选 tokens 与 draft tokens 比较来进行 verify,默认值为`5`。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个需要讲清楚,仅在topp + window verify策略下生效。我觉得可能有必要在这个文档里面单开一个小节讲述一下我们现在支持的top-1验证和top-p + window verify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #9492 +/- ##
===========================================
+ Coverage 52.91% 52.94% +0.03%
===========================================
Files 688 688
Lines 109331 109379 +48
===========================================
+ Hits 57848 57913 +65
+ Misses 51483 51466 -17 ☔ View full report in Codecov by Sentry. |
PR types
Others
PR changes
Docs
Description
Add document for speculative decoding.