Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add document for speculative decoding #9492

Merged
merged 3 commits into from
Nov 28, 2024

Conversation

Wanglongzhi2001
Copy link
Contributor

PR types

Others

PR changes

Docs

Description

Add document for speculative decoding.

Copy link

paddle-bot bot commented Nov 25, 2024

Thanks for your contribution!


- `speculate_max_ngram_size`: ngram 匹配 draft tokens 时的最大窗口大小,默认值为`1`。inference_with_reference 算法中会先从 prompt 中使用 ngram 窗口滑动匹配 draft tokens,窗口大小和输入输出重叠程度共同决定了产生 draft tokens 的开销从而影响 inference_with_reference 算法的加速效果。

- `speculate_verify_window`: 投机解码 verify 策略默认采用 TopP + TopK 验证中的 K,默认值为`2`。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的window含义不是K,是指在这个window中的所有draft tokens,需要被topk策略同时接收,否则被同时拒绝

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的


- `speculate_verify_window`: 投机解码 verify 策略默认采用 TopP + TopK 验证中的 K,默认值为`2`。

- `speculate_max_candidate_len`: 产生的最大候选 tokens 数目,根据候选 tokens 与 draft tokens 比较来进行 verify,默认值为`5`。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个需要讲清楚,仅在topp + window verify策略下生效。我觉得可能有必要在这个文档里面单开一个小节讲述一下我们现在支持的top-1验证和top-p + window verify

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的

Copy link

codecov bot commented Nov 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 52.94%. Comparing base (8fd33a9) to head (d4dfb65).
Report is 15 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9492      +/-   ##
===========================================
+ Coverage    52.91%   52.94%   +0.03%     
===========================================
  Files          688      688              
  Lines       109331   109379      +48     
===========================================
+ Hits         57848    57913      +65     
+ Misses       51483    51466      -17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yuanlehome yuanlehome merged commit 049b0b5 into PaddlePaddle:develop Nov 28, 2024
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants