add rwkv5 model and fix rwkv4 init prompt bug #1275

BBuf · 2023-11-16T10:36:04Z

q8fp16_1 display:

support rwkv5

junrushao · 2023-11-17T19:47:10Z

Hzfengsy · 2023-11-18T08:24:30Z

There is a confirmed bug on Cuda devices, due to an all-reduce compilation error. However, the model works well on other platforms. Should we get it in right now, or wait for the cuda fix?

tqchen · 2023-11-18T13:52:42Z

would be good to confirm the cuda-error and suggest a quick solution, to ensure things work across the board

Hzfengsy · 2023-11-19T11:15:14Z

I confirmed two bugs on cuda devices. Unfortunately, I have no idea about either of them.

cuBLAS BYOC

  File "tvm/src/relax/transform/fuse_ops.cc", line 882
InternalError: Check failed: (depgroup != cur_group) is false: A cyclic dependency detected between the groups lv2261 and lv2260 are in.

To reproduce the issue, please follow the instructions. Please make sure the cublas is enabled.

git clone git@github.com:GiantPandaCV/mlc-llm.git mlc-llm-rwkv && cd mlc-llm-rwkv
python -m mlc_llm.build --hf-path RWKV/rwkv-5-world-1b5  --target cuda --quantization q0f16 --use-cache=0 --build-model-only

Cross Thread Reduction Codegen

The current codegen failed on group_norm, and the minimal reproduce script is https://gist.github.com/Hzfengsy/74366154dfe51ea640de8b5bbe41ea4a

tqchen · 2023-11-27T14:37:22Z

cpp/llm_chat.cc

@@ -615,7 +615,7 @@ class LLMChat {
    std::vector<int32_t> encoded = this->tokenizer_->Encode(all_prompt);
    tokens.insert(tokens.end(), encoded.begin(), encoded.end());
    if (this->sliding_window_ != -1 ||  // There is no max window size if we use sliding window
-        this->total_seq_len_ + tokens.size() + gen_mean_gen_len < this->max_window_size_) {
+        this->total_seq_len_ + (int)tokens.size() + gen_mean_gen_len < this->max_window_size_) {


use static_cast<int64_t>(tokens.size())

ok. I think this is a quite serious bug that has troubled me for more than two weeks. The quantized int4 version of rwkv5 seems to give very unintelligent responses, and it was only today that I thought to print out the prompt. Then I discovered that all the code after line 618 was ineffective, and finally pinpointed this issue. Now the quantized int4 version of rwkv5 can also generate text normally. The performance has also improved in other modes for rwkv.

can you elaboratae a bit why static cast int is needed here? do we involve some negative numbers in computing this ?

This error might occur whenever the input prompt is relatively long. Before using static_cast<int64_t> to convert tokens.size(), the expression (this->total_seq_len_ + tokens.size() + gen_mean_gen_len) might have experienced integer overflow at some stage, causing the result to be incorrectly interpreted as a negative number, which in turn erroneously returned true for the comparison operation.

OK, I feel this is a strange way to think about it, given max_window_size_ == -1, we should specially check it, and that means there is no out of bound and we do not need to re-encode (aka running the code after), would be good for @Hzfengsy to take a loo as well

Okay, there seems to be a bug in the handling of the rwkv system prompts. I expect that each interaction with the rwkv model should include the system prompt along with the current text. This is because its series of models(rwkv4,5,6) have higher requirements for prompts. Currently, only the first round of dialogue includes the system's prompt, and the system prompt is forgotten in subsequent dialogues

BBuf · 2023-11-27T14:52:31Z

71eefb6 fix before

fix after

@Hzfengsy @tqchen I think it's time to announce proper support for the rwkv model.

vinx13 · 2023-11-29T01:45:49Z

I found some var to var bindings in the model, for example, in decode function,

            lv2259: R.Tensor((1, 2048), dtype="float16") = R.matmul(lv2257, lv2258, out_dtype="void")
            lv2260: R.Tensor((1, 2048), dtype="float16") = lv2259

This issue is fixed by apache/tvm#16175, it can compile successfully however such bindings can still break fusion and pattern matching, for better performance, I'd recommend update the model definition to eliminate such bindings.

BBuf and others added 24 commits October 27, 2023 09:32

fix llm_chat.cc rwkv bug

9a151b5

fix llm_chat.cc rwkv bug

28cf751

refine

a48762c

fix format erroe

dfd6eeb

fix ci error

44c73a5

Merge branch 'mlc-ai:main' into main

47a6101

Merge branch 'mlc-ai:main' into main

107d95d

Merge branch 'mlc-ai:main' into main

5f8a1b3

init

9519e75

fix rwkv5 model error

62a8518

refine

0c5a140

refine

003056b

wkv5 tir in fp32

8e7b02d

refine

980b22c

fix metal wkv tir bug by fsy

58ddb80

fix groupnorm bug

33d51dc

run success in metal

6f739c6

fix mlc-llm rwkv5 bug

aac52e1

Merge branch 'mlc-ai:main' into main

56cc017

Merge pull request #2 from GiantPandaCV/support_rwkv5_2

6b655bc

support rwkv5

update submodule

542cde9

refine

b96f421

delete build.py

1629f3c

revert

1dc0833

tqchen requested a review from Hzfengsy November 16, 2023 13:46

Merge branch 'mlc-ai:main' into main

6b0855c

BBuf added 4 commits November 27, 2023 14:06

fix model bug

dae253b

fix prompt

1aeb3e2

refine config

ddb3908

fix int4 bug

71eefb6

tqchen reviewed Nov 27, 2023

View reviewed changes

refine

b32109e

tqchen closed this Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add rwkv5 model and fix rwkv4 init prompt bug #1275

add rwkv5 model and fix rwkv4 init prompt bug #1275

BBuf commented Nov 16, 2023 •

edited

Loading

junrushao commented Nov 17, 2023

Hzfengsy commented Nov 18, 2023

tqchen commented Nov 18, 2023

Hzfengsy commented Nov 19, 2023

tqchen Nov 27, 2023

BBuf Nov 27, 2023

tqchen Nov 27, 2023

BBuf Nov 28, 2023

BBuf Nov 28, 2023

tqchen Nov 28, 2023

BBuf Nov 28, 2023

BBuf commented Nov 27, 2023 •

edited

Loading

vinx13 commented Nov 29, 2023

add rwkv5 model and fix rwkv4 init prompt bug #1275

add rwkv5 model and fix rwkv4 init prompt bug #1275

Conversation

BBuf commented Nov 16, 2023 • edited Loading

junrushao commented Nov 17, 2023

Hzfengsy commented Nov 18, 2023

tqchen commented Nov 18, 2023

Hzfengsy commented Nov 19, 2023

cuBLAS BYOC

Cross Thread Reduction Codegen

tqchen Nov 27, 2023

Choose a reason for hiding this comment

BBuf Nov 27, 2023

Choose a reason for hiding this comment

tqchen Nov 27, 2023

Choose a reason for hiding this comment

BBuf Nov 28, 2023

Choose a reason for hiding this comment

BBuf Nov 28, 2023

Choose a reason for hiding this comment

tqchen Nov 28, 2023

Choose a reason for hiding this comment

BBuf Nov 28, 2023

Choose a reason for hiding this comment

BBuf commented Nov 27, 2023 • edited Loading

vinx13 commented Nov 29, 2023

BBuf commented Nov 16, 2023 •

edited

Loading

BBuf commented Nov 27, 2023 •

edited

Loading