-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Sampler #3068
base: master
Are you sure you want to change the base?
New Sampler #3068
Conversation
@@ -25,7 +25,7 @@ | |||
#endif | |||
|
|||
// reduce the value of 'query' to 'query * FP16_QSCALE', avoid fp16 overflow | |||
#define FP16_QSCALE 0.5 | |||
#define FP16_QSCALE 0.25 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是否会影响计算精度?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
从测试效果来看不会,已在Llama3.2, Qwen2.5, Qwen2-VL上进行了测试
// } | ||
|
||
// std::string prompt_template() const { | ||
// return llm_config_.value("prompt_template", ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
无效代码删除
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已删除
include/MNN/expr/NeuralNetWorkOp.hpp
Outdated
@@ -58,6 +58,7 @@ MNN_PUBLIC VARP _Relu(VARP x, float slope = 0.0f); | |||
MNN_PUBLIC VARP _Relu6(VARP x, float minValue = 0.0f, float maxValue = 6.0f); | |||
MNN_PUBLIC VARP _PRelu(VARP x, std::vector<float> &&slopes); | |||
MNN_PUBLIC VARP _Softmax(VARP logits, int axis = -1); | |||
MNN_PUBLIC VARP _TempratureSoftmax(VARP logits, float temperature, int axis = -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
加到这里会影响 mnn express 大小,只是 llm 使用的话,移到 llm 模块里面
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已移动
TODO.md
Outdated
@@ -0,0 +1,59 @@ | |||
## Change Log | |||
- [x] implement an independent `Sampler` Module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个文件移到 transformer/llm 里面吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已删除
Change
Sampler
Module.greedy
,temperature
,topK
,topP
,minP
,tfs
,typical
,penalty
. (can be configured through config.json)mixed
sampler, whose sampling order (Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature). one can change the samplers through configuringmixed_samplers
. field in config.jsonPromptLib
to enablellm_demo
for all LLM.seq_len
control inLlm
toSampler
and higher level modules to migrate design complexity.Chat
to organize the workflow of a chatbot APP.#define FP16_QSCALE 0.25
inCPUAttention
to ensure Llama3.2 FP16 correctness.llm_demo
tested on ubuntu 22.04, android(including ARM64, ARM82, OPENCL backend).llm_demo
supports visual model tasks (support Qwen2-VL demo).