Bug: Decoding special tokens in T5 #8938

cyanic-selkie · 2024-08-08T16:32:39Z

What happened?

I have a T5/lora model trained to output some text separated by the <extra_id_0> special token (the tokenizer properly works after following instructions in #8872) .

When running the model using Huggingface's transformers/peft, it generates the expected output. However, when I use llama-cli, what happens instead is that the moment the first such token is reached, it's actually decoded into an EOG token instead of the extra token and generation is stopped.

I might be simply doing something wrong in using the library.

Name and Version

version: 3549 (afd27f0)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

What operating system are you seeing the problem on?

No response

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

fairydreaming · 2024-08-09T09:42:36Z

If you upload your transformers model somewhere I can take a look.

cyanic-selkie · 2024-08-09T10:04:11Z

@fairydreaming Sure, thanks.

Base model: https://huggingface.co/repetitio/flan-t5-small

LoRA: https://huggingface.co/repetitio/distilled-simplifier

Any of the gguf variants give the same result. The input is just any wikipedia-like paragraph.

fairydreaming · 2024-08-09T12:45:20Z

The problem is that the current T5 model implementation ignores LORA in attention matrices. But fixing this is easy, try this patch:

diff --git a/src/llama.cpp b/src/llama.cpp
index a7b1c9eb..33b53e60 100644
--- a/src/llama.cpp
+++ b/src/llama.cpp
@@ -13178,13 +13178,13 @@ struct llm_build_context {
 
                 // self-attention
                 {
-                    struct ggml_tensor * Qcur = ggml_mul_mat(ctx0, model.layers[il].wq_enc, cur);
+                    struct ggml_tensor * Qcur = llm_build_lora_mm(lctx, ctx0, model.layers[il].wq_enc, cur);
                     cb(Qcur, "Qcur", il);
 
-                    struct ggml_tensor * Kcur = ggml_mul_mat(ctx0, model.layers[il].wk_enc, cur);
+                    struct ggml_tensor * Kcur = llm_build_lora_mm(lctx, ctx0, model.layers[il].wk_enc, cur);
                     cb(Kcur, "Kcur", il);
 
-                    struct ggml_tensor * Vcur = ggml_mul_mat(ctx0, model.layers[il].wv_enc, cur);
+                    struct ggml_tensor * Vcur = llm_build_lora_mm(lctx, ctx0, model.layers[il].wv_enc, cur);
                     cb(Vcur, "Vcur", il);
 
                     Qcur = ggml_reshape_3d(ctx0, Qcur, n_embd_head, n_head, n_tokens);
@@ -13218,7 +13218,7 @@ struct llm_build_context {
 
                     ggml_build_forward_expand(gf, cur);
 
-                    cur = ggml_mul_mat(ctx0, model.layers[il].wo_enc, cur);
+                    cur = llm_build_lora_mm(lctx, ctx0, model.layers[il].wo_enc, cur);
                     cb(cur, "kqv_out", il);
                 }
 
@@ -13292,13 +13292,13 @@ struct llm_build_context {
 
                 // self-attention
                 {
-                    struct ggml_tensor * Qcur = ggml_mul_mat(ctx0, model.layers[il].wq, cur);
+                    struct ggml_tensor * Qcur = llm_build_lora_mm(lctx, ctx0, model.layers[il].wq, cur);
                     cb(Qcur, "Qcur", il);
 
-                    struct ggml_tensor * Kcur = ggml_mul_mat(ctx0, model.layers[il].wk, cur);
+                    struct ggml_tensor * Kcur = llm_build_lora_mm(lctx, ctx0, model.layers[il].wk, cur);
                     cb(Kcur, "Kcur", il);
 
-                    struct ggml_tensor * Vcur = ggml_mul_mat(ctx0, model.layers[il].wv, cur);
+                    struct ggml_tensor * Vcur = llm_build_lora_mm(lctx, ctx0, model.layers[il].wv, cur);
                     cb(Vcur, "Vcur", il);
 
                     llm_build_kv_store(ctx0, hparams, cparams, kv_self, gf, Kcur, Vcur, n_tokens, kv_head, cb, il);
@@ -13345,7 +13345,7 @@ struct llm_build_context {
 
                     ggml_build_forward_expand(gf, cur);
 
-                    cur = ggml_mul_mat(ctx0, model.layers[il].wo, cur);
+                    cur = llm_build_lora_mm(lctx, ctx0, model.layers[il].wo, cur);
                     cb(cur, "kqv_out", il);
                 }
 
@@ -13362,13 +13362,13 @@ struct llm_build_context {
 
                 // cross-attention
                 {
-                    struct ggml_tensor * Qcur = ggml_mul_mat(ctx0, model.layers[il].wq_cross, cur);
+                    struct ggml_tensor * Qcur = llm_build_lora_mm(lctx, ctx0, model.layers[il].wq_cross, cur);
                     cb(Qcur, "Qcur", il);
 
-                    struct ggml_tensor * Kcur = ggml_mul_mat(ctx0, model.layers[il].wk_cross, embd_enc);
+                    struct ggml_tensor * Kcur = llm_build_lora_mm(lctx, ctx0, model.layers[il].wk_cross, embd_enc);
                     cb(Kcur, "Kcur", il);
 
-                    struct ggml_tensor * Vcur = ggml_mul_mat(ctx0, model.layers[il].wv_cross, embd_enc);
+                    struct ggml_tensor * Vcur = llm_build_lora_mm(lctx, ctx0, model.layers[il].wv_cross, embd_enc);
                     cb(Vcur, "Vcur", il);
 
                     Qcur = ggml_reshape_3d(ctx0, Qcur, n_embd_head, n_head,    n_tokens);
@@ -13397,7 +13397,7 @@ struct llm_build_context {
 
                     ggml_build_forward_expand(gf, cur);
 
-                    cur = ggml_mul_mat(ctx0, model.layers[il].wo_cross, cur);
+                    cur = llm_build_lora_mm(lctx, ctx0, model.layers[il].wo_cross, cur);
                     cb(cur, "kqv_out", il);
                 }

Basically you have to replace every multiplication of a matrix ggml_mul_mat(ctx0, ...) that has lora adapter with corresponding llm_build_lora_mm(lctx, ctx0, ...). After this (I think I added this in all places, but check) the model generates special tokens as (hopefully) expected:

[1723207411] last: [ '<extra_id_0>':32099, ' Artificial':24714, ' intelligence':6123, ' is':19, ' ':3, 'exhibited':21102, ' by':57, ' machines':4096, ',':6, ' particularly':1989, ' computer':1218, ' systems':1002, '.':5, '<extra_id_0>':32099, ' Artificial':24714, ' intelligence':6123, ' is':19, ' ':3, 'a':9, ' field':1057, ' of':13, ' research':585, ' in':16, ' computer':1218, ' science':2056, '.':5, '<extra_id_0>':32099, ' Artificial':24714, ' intelligence':6123, ' is':19, ' ':3, 'a':9, ' field':1057, ' of':13, ' research':585, ' in':16, ' computer':1218, ' science':2056, '.':5, '<extra_id_0>':32099, ' Artificial':24714, ' intelligence':6123, ' is':19, ' ':3, 'a':9, ' field':1057, ' of':13, ' research':585, ' in':16, ' computer':1218, ' science':2056, '.':5, '<extra_id_0>':32099, ' Artificial':24714, ' intelligence':6123, ' is':19, ' ':3, 'a':9, ' field':1057, ' of':13, ' research':585, ' in':16, ' computer':1218, ' science':2056 ]

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

cyanic-selkie · 2024-08-09T17:09:41Z

@fairydreaming Sorry, could you share the command you used, please? I'm not able to reproduce your results. I'm running

./llama-cli --model <path_to_flan_t5_small.gguf> -p <prompt> -- lora <path_to_adapter.gguf>

fairydreaming · 2024-08-09T17:52:25Z

Sure thing:

./llama-cli --numa distribute -t 32 -m models/repetitio/flan-t5-small-f32-2.gguf --lora models/repetitio/adapter_model-f32-2.gguf -p "Artificial intelligence (AI), in its broadest sense, is intelligence exhibited by machines, particularly computer systems. It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals.[1] Such machines may be called AIs." --temp 0.01 -s 42 --special

I reconverted both ggufs from safetensors:

./convert_hf_to_gguf.py /mnt/md0/huggingface/hub/models--repetitio--flan-t5-small/snapshots/d456a1448a27c26bfebf9cd864056e1b9a993576/ --outfile models/repetitio/flan-t5-small-f32-2.gguf --outtype "f32"

./convert_lora_to_gguf.py --outfile models/repetitio/adapter_model-f32-2.gguf --outtype f32 --base /mnt/md0/huggingface/hub/models--repetitio--flan-t5-small/snapshots/d456a1448a27c26bfebf9cd864056e1b9a993576/ /mnt/md0/huggingface/hub/models--repetitio--distilled-simplifier/snapshots/b54eff46f3a8405a5ec1ecc3d84ac4f4f3c69234/

With these I get the same layer output values as in transformers.

The output is:

<pad><extra_id_0> Artificial intelligence is a field of research in computer science.<extra_id_0> Artificial intelligence is a field of research in computer science.<extra_id_0> Artificial intelligence is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.<extra_id_0> The research in computer science is a field of research in computer science.</s> [end of text]

cyanic-selkie · 2024-08-10T09:01:11Z

Awesome, thanks!

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

cyanic-selkie added bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow) labels Aug 8, 2024

fairydreaming mentioned this issue Aug 9, 2024

Add support for lora adapters in T5 model #8951

Merged

4 tasks

fairydreaming closed this as completed in #8951 Aug 9, 2024

fairydreaming added a commit that referenced this issue Aug 9, 2024

llama : add support for lora adapters in T5 model (#8938)

6afd1a9

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

arthw pushed a commit to arthw/llama.cpp that referenced this issue Nov 15, 2024

llama : add support for lora adapters in T5 model (ggerganov#8938)

a1b432b

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

arthw pushed a commit to arthw/llama.cpp that referenced this issue Nov 18, 2024

llama : add support for lora adapters in T5 model (ggerganov#8938)

26bbb97

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Decoding special tokens in T5 #8938

Bug: Decoding special tokens in T5 #8938

cyanic-selkie commented Aug 8, 2024

fairydreaming commented Aug 9, 2024

cyanic-selkie commented Aug 9, 2024 •

edited

Loading

fairydreaming commented Aug 9, 2024

cyanic-selkie commented Aug 9, 2024

fairydreaming commented Aug 9, 2024 •

edited

Loading

cyanic-selkie commented Aug 10, 2024

Bug: Decoding special tokens in T5 #8938

Bug: Decoding special tokens in T5 #8938

Comments

cyanic-selkie commented Aug 8, 2024

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

fairydreaming commented Aug 9, 2024

cyanic-selkie commented Aug 9, 2024 • edited Loading

fairydreaming commented Aug 9, 2024

cyanic-selkie commented Aug 9, 2024

fairydreaming commented Aug 9, 2024 • edited Loading

cyanic-selkie commented Aug 10, 2024

cyanic-selkie commented Aug 9, 2024 •

edited

Loading

fairydreaming commented Aug 9, 2024 •

edited

Loading