Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Sampler #3068

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

Conversation

huangzhengxiang
Copy link

@huangzhengxiang huangzhengxiang commented Oct 31, 2024

Change

  • implement an independent Sampler Module.
  • implement 8 individual basic samplers: greedy, temperature, topK, topP, minP, tfs, typical, penalty. (can be configured through config.json)
  • implement mixed sampler, whose sampling order (Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature). one can change the samplers through configuring mixed_samplers. field in config.json
  • implement PromptLib to enable llm_demo for all LLM.
  • remove the seq_len control in Llm to Sampler and higher level modules to migrate design complexity.
  • implement Chat to organize the workflow of a chatbot APP.
  • change #define FP16_QSCALE 0.25 in CPUAttention to ensure Llama3.2 FP16 correctness.
  • llm_demo tested on ubuntu 22.04, android(including ARM64, ARM82, OPENCL backend).
  • llm_demo supports visual model tasks (support Qwen2-VL demo).
  • add wikitext ppl test
  • add shareGPT ppl test
  • add shareGPT time & space test
  • technical analysis of reuse_kv

@wangzhaode wangzhaode self-assigned this Oct 31, 2024
@@ -25,7 +25,7 @@
#endif

// reduce the value of 'query' to 'query * FP16_QSCALE', avoid fp16 overflow
#define FP16_QSCALE 0.5
#define FP16_QSCALE 0.25
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是否会影响计算精度?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

从测试效果来看不会,已在Llama3.2, Qwen2.5, Qwen2-VL上进行了测试

// }

// std::string prompt_template() const {
// return llm_config_.value("prompt_template", "");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

无效代码删除

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除

@@ -58,6 +58,7 @@ MNN_PUBLIC VARP _Relu(VARP x, float slope = 0.0f);
MNN_PUBLIC VARP _Relu6(VARP x, float minValue = 0.0f, float maxValue = 6.0f);
MNN_PUBLIC VARP _PRelu(VARP x, std::vector<float> &&slopes);
MNN_PUBLIC VARP _Softmax(VARP logits, int axis = -1);
MNN_PUBLIC VARP _TempratureSoftmax(VARP logits, float temperature, int axis = -1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加到这里会影响 mnn express 大小,只是 llm 使用的话,移到 llm 模块里面

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已移动

TODO.md Outdated
@@ -0,0 +1,59 @@
## Change Log
- [x] implement an independent `Sampler` Module.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件移到 transformer/llm 里面吧

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants